首页 > 师资情况 > 信息科学与技术学部 > 信息科学与技术学部列表 > 内容
吴志勇 副研究员 从事表现力语音处理、情感计算等研究

 

姓名:吴志勇

职称:副研究员,博士生导师

 

【联系方式】

通信地址:深圳市南山区西丽大学城清华校区F楼203B,邮政编码:518055

办公电话:0755-2603 6870

传  真:0755-2603 6443

电子邮件:zywu at sz.tsinghua.edu.cn

 

【个人简历】

1995年7月~1999年7月,清华大学计算机科学与技术系,获学士学位

1999年7月~2005年6月,清华大学计算机科学与技术专业,获工学博士学位

2005年8月~2007年8月,香港中文大学博士后研究员

2007年8月~2008年12月,清华大学深圳研究生院讲师

2008年5月~今,香港中文大学荣誉副研究员

2008年12月~今,清华大学深圳研究生院副研究员

 

【研究领域】

研究工作主要围绕构建和谐的人机交互环境所需的语音及视觉处理技术等展开,具体的研究内容包括:语音处理、表现力可视语音合成、自然语言理解与生成、音视频双模态联合建模等。

近年共发表学术论文40余篇,其中被SCI收录5篇,被EI收录27篇,被ISTP收录10篇。参与撰写和翻译著作各1部。获得教育部科学技术进步奖二等奖1项(第三完成人)。负责或参加国家自然科学基金、863、973、香港政府研究资助局基金、粤港科技合作计划、深港创新圈等多项科研项目。

 

【主持或参加项目】

1. 国家自然科学基金-面上项目:“面向自然口语对话的深层次信息感知与表达方法研究”

2. 国家社会科学基金-重大项目:“社会情感的语音生成与认知的跨语言跨文化研究”

3. 广东省科技计划-粤港关键领域重点突破项目:“基于云计算可管理的实时视听平台研究和产业化”

4. 国家自然科学基金-青年科学基金项目:“音视融合的韵律模式的个性化研究”

5. 教育部博士点新教师基金:“语音生成中表达要素的层级建模”

6. 国家自然科学基金-海外及港澳学者合作研究基金:“具有多模态发音模型及矫正性认知反馈的交互式在线语言学习平台”

7. 国家863重点专题项目子课题:“便捷交互界面管理技术—普适计算基础软硬件关键技术及系统”

8. 香港政府创新及科技支持计划之粤港科技合作计划项目:“面向固定及移动设备应用的汉语双语(普通话和广东话)可视语音合成系统”

9. 香港政府研究资助局基金项目:“面向语音合成的音视频时序相关性建模”

 

【主要荣誉】

1. 教育部科技进步二等奖(2009):第三完成人,获奖项目“多模态的多语种语音、语言交互的研究与应用”

2. 清华大学深圳研究生院先进个人(2013)

3. 清华大学深圳研究生院先进工作者(2010)

4. 清华大学深圳研究生院科技工作先进个人(2009)

 

【发表论文】

2014年度:

1. Xin ZHENG, Zhiyong WU, Helen MENG, Lianhong CAI, "Contrastive Auto-encoder for Phoneme Recognition," [in] Proc. ICASSP. Florence, Italy, 4-9 May 2014. (EI)

2. Xin ZHENG, Zhiyong WU, Helen MENG, Lianhong CAI, "Learning Dynamic Features with Neural Networks for Phoneme Recognition," [in] Proc. ICASSP. Florence, Italy, 4-9 May 2014. (EI)

3. 孟凡博, 吴志勇, 贾珈, 蔡莲红, "汉语重音的凸显度分析与合成," 声学学报, 2014. [已录用] (EI)

4. 孟凡博, 吴志勇, 蒙美玲, 贾珈, 蔡莲红, "基于决策树的英语焦点语音转换," 清华大学学报(自然科学版), 2014. [已录用] (EI)

2013年度:

5. Fanbo MENG, Zhiyong WU, Jia JIA, Helen MENG, Lianhong CAI, "Synthesizing English Emphatic Speech for Multimodal Corrective Feedback in Computer-Aided Pronunciation Training," Mulmedia Tools and Applications, Springer, DOI: 10.1007/s11042-013-1601-y. (SCI)

6. Jia JIA, Zhiyong WU, Shen ZHANG, Helen MENG, Lianhong CAI, "Head and Facial Gestures Synthesis using PAD Model for an Expressive Talking Avatar," Multimedia Tools and Applications, Springer, DOI: 10.1007/S11042-013-1604-8. (SCI)

7. Xin ZHENG, Zhiyong WU, Binbin SHEN, Helen MENG, Lianhong CAI, "Investigation of Tandem Deep Belief Network Approach for Phoneme Recognition," [in] Proc. ICASSP. Vancouver, Canada, 26-31 May 2013. (EI)

8. Jianbo JIANG, Zhiyong WU, Mingxing XU, Jia JIA, Lianhong CAI, "Comparing Feature Dimension Reduction Algorithms for GMM-SVM based Speech Emotion Recognition," [in] Proc. APSIPA ASC. Taiwan, China, 29 October-1 November 2013. (EI)

9. Kai ZHAO, Zhiyong WU, Lianhong CAI, "A Real-time Speech Driven Talking Avatar based on Deep Neural Network," [in] Proc. APSIPA ASC. Taiwan, China, 29 October-1 November 2013. (EI)

2012年度:

10. Jianbo JIANG, Zhiyong WU, Mingxing XU, Jia JIA, Lianhong CAI, "Comparison of Adaptation Methods for GMM-SVM based Speech Emotion Recognition," [in] Proc. IEEE Workshop on SLT, pp. 269-273. Miami, Florida, USA, 2-5 December 2012. (EI: 20130916065166)

11. Tao JIANG, Zhiyong WU, Jia JIA, Lianhong CAI, "Perceptual Clustering based Unit Selection Optimization for Concatenative Text-to-Speech Synthesis," [in] Proc. ISCSLP, pp. 64-68. Hong Kong, 5-8 December 2012. (EI: 20131016084519)

12. Chunrong LI, Zhiyong WU, Fanbo MENG, Helen MENG, Lianhong CAI, "Detection and Emphatic Realization of Contrastive Word Pairs for Expressive Text-to-Speech Synthesis," [in] Proc. ISCSLP, pp. 93-97. Hong Kong, 5-8 December 2012. (EI: 20131016084523)

13. Xixin WU, Zhiyong WU, Jia JIA, Lianhong CAI, "Adaptive Named Entity Recognition based on Conditional Random Fields with Automatic Updated Dynamic Gazetteers," [in] Proc. ISCSLP, pp. 363-367. Hong Kong, 5-8 December 2012. (EI: 20131016084525)

14. Jia JIA, Xiaohui WANG, Zhiyong WU, Lianhong CAI, Helen MENG, "Modeling the Correlation between Modality Semantics and Facial Expressions," [in] Proc. APSIPA. Hollywood, USA, 3-6 December 2012. (EI: 20131016079234)

15. Zhang ZHANG, Zhiyong WU, Jia JIA, Lianhong CAI, "Modeling Prosody Pattern of Chinese Expressive Speech and Its Application in Personalized Speech Conversion," [in] Proc. TAL, Nanjing, China, 26-29 May 2012.

16. Fanbo MENG, Zhiyong WU, Helen MENG, Jia JIA, Lianhong CAI, "Generating Emphasis from Neutral Speech using Hierarchical Perturbation Model by Decision Tree and Support Vector Machine," [in] Proc. ICALIP, pp. 442-448. Shanghai, China, 16-18 July 2012. (EI: 20130315907216)

17. Kai ZHAO, Zhiyong WU, Jia JIA, Lianhong CAI, "An Online Speech Driven Talking Head System," [in] Proc. GHTCE, pp. 186-187. Shenzhen, China, 18-20 November 2012. (EI: 20131716244276)

18. Xin WANG, Zhiyong WU, "An HMM-based Cantonese Speech Synthesis System," [in] Proc. GHTCE, pp. 141-142. Shenzhen, China, 18-20 November 2012. (EI: 20131716244264)

19. 姜涛, 吴志勇, 蔡莲红, "语音合成自然度的客观度量实验研究," [in] 第十届中国语音学学术会议(PCC). 上海, 18-20 May 2012.

2011年度:

20. Binbin SHEN, Zhiyong WU, Yongxin WANG, Lianhong CAI, "Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis," [in] Proc. Interspeech 2011, pp. 2165-2168. Florence, Italy, 27-31 August, 2011. (EI: 20123715411045)

21. Hui PANG, Zhiyong WU, Lianhong CAI, "Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model," [in] Proc. NCMMSC2011. Xi'an, 16-18 October, 2011. Also published in Tsinghua Science and Technology (清华大学学报英文版), 2012, 17(2): 218-224. (EI: 20123215322698) (Best Paper Award/优秀论文)

22. 陈龙, 吴志勇, 袁春, 蒙美玲, 蔡莲红, "面向数字版权管理的声纹辅助认证系统," [in] Proc. NCMMSC2011. Xi'an, 16-18 October, 2011.

2010年度:

23. Zhiyong WU, Lianhong CAI, Helen MENG, "Modeling Prosody Patterns for Chinese Expressive Text-to-Speech Synthesis," [in] Proc. 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 148-152. Tainan, 29 November-3 December 2010. (EI: 20110713663203)

24. Fanbo MENG, Helen MENG, Zhiyong WU, Lianhong CAI, "Synthesizing Expressive Speech to Convey Focus using a Perturbation Model for Computer-Aided Pronunciation Training," [in] Proc. Second Language Studies: Acquisition, Learning, Education and Technology (L2WS). Tokyo, Japan, 22-27 September 2010.

25. Quansheng DUAN, Shiyin KANG, Zhiyong WU, Lianhong CAI, Zhiwei SHUANG, Yong QIN, "Comparison of Syllable/Phone HMM Based Mandarin TTS," [in] Proc. 20th International Conference on Pattern Recognition (ICPR), pp. 4496-4499. Istanbul, Turkey, 23-26 August 2010. (ISTP: 11580140, EI: 20104613390878)

26. Shen ZHANG, Zhiyong WU, Helen MENG, Lianhong CAI, "Facial expression synthesis based on emotion dimensions for affective talking avatar," T. Nishida et al. (Eds.): Modeling Machine Emotions for Realizing Intelligence, SIST (Smart Innovation, Systems and Technologies), vol. 2010, no. 1, pp. 109-132, 2010. (EI: 20123715421851)

27. 张章, 贾珈, 蔡莲红, 吴志勇, "汉语音高模式及参数化描述的研究," [in] 第九届中国语音学学术会议(PCC). 天津, 28-30 May 2010.

2009年度:

28. Zhiyong WU, Helen MENG, Hongwu YANG, Lianhong CAI, "Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System," IEEE Transaction on Audio, Speech and Language Processing, vol. 17, no. 8, pp. 1567-1577, November, 2009. (SCI: 482QK, EI: 20093612281690)

29. Zhiyong WU, Quanqqi CAO, Helen M. MENG, Lianhong CAI, "A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface," [in] Proc. NCMMSC2009. Urumqi, Xinjiang, 14-16 August, 2009. Also published in Tsinghua Science and Technology (清华大学学报英文版), vol. 14, no. 5, pp. 623-630, October 2009. (EI: 20094012358727)

30. 段全盛, 康世胤, 双志伟, 吴志勇, 蔡莲红, 秦勇, "一种适合HMM汉语语音合成的建模单元挑选算法," [in] 第十届全国人机语音通讯学术会议 (NCMMSC2009), 434-439. 甘肃, 兰州, 2009.8.14-16.

2008年度:

31. Zhiyong WU, Jiying WU, Helen M. MENG, "The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes," [in] Proc. 6th International Symposium on Chinese Spoken Language Processing (ISCSLP 2008), pp. 370-373. Kunming, China, 16-19 December 2008. (ISTP: BJA83, EI: 20091011939107)

32. Honglei CONG, Zhiyong WU, Lianhong CAI, Helen M. MENG, "A New Prosodic Strength Calculation Method for Prosody Reduction Modeling," [in] Proc. 6th International Symposium on Chinese Spoken Language Processing (ISCSLP 2008), pp. 53-56. Kunming, China, 16-19 December 2008. (ISTP: BJA83, EI: 20091011939031)

33. Xinxin ZHOU, Zhiyong WU, Chun YUAN, Yuzhuo ZHONG, "Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis," [in] Proc. 2nd International Symposium on Intelligent Information Technology Application (IITA 2008), vol. 1, pp. 477-481. Shanghai, China, 20-22 December 2008. (ISTP: BIY14, EI: 20091411996990)

34. Yu WANG, Zhiyong WU, Lianhong CAI, Helen M. MENG, "Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification," [in] Proc. 8th Phonetic Conference of China and the International Symposium on Phonetic Frontiers (PCC2008). Beijing, China, 18-20 April 2008.

2007年度:

35. Shen ZHANG, Zhiyong WU, Helen M. MENG, Lianhong CAI, "Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar," [in] Proc. ACII2007, LNCS 4738, pp. 24-35. Lisbon, Portugal, 12-14 September 2007. (ISTP: BGW33, EI: 080311024879)

36. Shen ZHANG, Zhiyong WU, Helen M. MENG, Lianhong CAI, "Head Movement Synthesis based on Semantic and Prosodic Features for a Chinese Expressive Avatar," [in] Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP2007), vol. IV, pp. 837-840. Hawai'i Convention Center, Honolulu, Hawaii, USA, April 15-20 2007. (ISTP: BGO04, EI: 073210745929)

37. Hongwu YANG, Helen M. MENG, Zhiyong WU, Lianhong CAI, "Modelling the Global Acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis," [in] IEEE/ACL 2006 Workshop on Spoken Language Technology. Aruba, 10-13 December 2006. (ISTP: BGB23, EI: 083311451167)

2006年度:

38. Zhiyong WU, Lianhong CAI, Helen MENG, "Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification," [in] D. Huang, K. Li and G.W. Irwin (Eds.): Intelligent Computing in Signal Processing and Pattern Recognition (ICIC2006), LNCIS 345, pp. 1107-1112. Kunming, China, 16-19 August 2006. (SCI: BEZ63, ISTP: BEZ63)

39. Zhiyong WU, Lianhong CAI, Helen MENG, "Multi-level Fusion of Audio and Visual Features for Speaker Identification," [in] D. Zhang and A.K. Jain (Eds.): Advances in Biometrics (ICB2006), LNCS 3832, pp. 493-499, 2005. Hong Kong, 5-7 January 2006. (SCI: BDW04, ISTP: BDW04, EI: 06249940530)

40. Zhiyong WU, Helen M. MENG, Hui NING, Sam C. TSE, "A Corpus-based Approach for Cooperative Response Generation in a Dialog System," [in] Qiang Huo, Bin Ma, Eng-Siong Chng and Haizhou Li (Eds.): Chinese Spoken Language Processing (ISCSLP2006), LNAI 4274, pp. 614-626. Kent Ridge, Singapore, 13-16 December 2006. (ISTP: BFV54, EI: 20100912736133)

41. Zhiyong WU, Shen ZHANG, Lianhong CAI, Helen MENG, "Real-time Synthesis of Chinese Visual Speech and Facial Expressions using MPEG-4 FAP Features in a Three-dimensional Avatar," [in] The International Conference on Spoken Language Processing (Interspeech 2006, ICSLP), pp. 1802-1805. Pittsburgh, USA, 17-21 September 2006. (EI: 082511324456)

42. 吴志勇, 蔡莲红, 马磊, 贾珈, "多生物特征识别平台的设计与实现," 小型微型计算机系统, 2006: 27(2), 375-379.

43. 吴志勇, 蔡莲红, "基于动态贝叶斯网络的音视频双模态说话人识别," 计算机研究与发展, 2006: 43(3), 470-475.

2005年度及之前:

44. 吴志勇, 蔡莲红, 蔡锐, "语音合成中基于听辨指导的权重训练算法," 清华大学学报(自然科学版), 2005: 45(1), 52-56.

45. 吴志勇, 蔡莲红, 蒙美玲, "可视语音合成中基于音视频关联模型的视位参数优化," [In] 第六届全国人机语音通讯学术会议(NCMMSC2005), 334-337. 北京, 2005.10.22-24.

46. 吴志勇, 蔡莲红, "语音合成中的韵律关联模型," 中文信息学报, 2004: 18(2), 44-50.

47. 王志明, 蔡莲红, 吴志勇, 陶建华, "汉语文本-可视语音转换的研究," 小型微型计算机系统, 2002: 23(4), 474-477.

撰写和翻译著作:

48. 语音识别 (著作章节). 见: 蔡莲红, 黄德智, 蔡锐等著. 现代语音技术基础与应用. 北京: 清华大学出版社, 2003.11. 第六章, 232-281.

49. 语音合成 (Progress in Speech Synthesis). V. Santen, R. Sproat, J. Hirschberg, J. Olive著. 蔡莲红, 杨鸿武, 吴志勇等译. 北京: 机械工业出版社, 2005.3.

 

【兼职情况】

2006- 国际语音通讯协会(ISCA) 会员

2007- IEEE计算智能协会智能系统应用委员会(CIS ISATC) 委员

2005- 国际互联网联盟(W3C)语音合成标记语言(SSML)工作组 成员

2005- IEEE Trans. on Speech, Audio and Language Processing 期刊审稿人

2011- ACM Trans. on Asian Language Processing 期刊审稿人

2013- Speech Communication 期刊审稿人

2013- Multimedia Tools and Applications 期刊审稿人

2006- InterSpeech; ICASSP; ISCSLP; NCMMSC 会议审稿人

2008 ISCSLP 2008 分会场主席

2009- 中国声学学会:语言、音乐和听觉声学分会 委员

2009- 全国人机语音通讯学术会议(NCMMSC)常设机构 委员

2012 ISCSLP 2012 Publication Co-Chair

版权所有@清华大学深圳研究生院 地址:深圳大学城清华园区 邮编:518055 Email:info@sz.tsinghua.edu.cn