返回首页
您的位置:首页 > 新闻 > CCF新闻 > CNCC

语音对话与听觉前沿技术与社会使命 | CNCC2021技术论坛预告

阅读量:843 2021-10-06 收藏本文

CNCC2021将汇聚国内外顶级专业力量、专家资源,为逾万名参会者呈上一场精彩宏大的专业盛宴。别缺席,等你来,欢迎参会报名!


640


CNCC 2021语音对话与听觉前沿技术与社会使命论坛


【论坛背景介绍】

语音、语言和听觉是人类沟通和获取信息最自然便捷的手段,言语交互带来了人机交互的根本性变革。在IT产业已经进入了全球第五次浪潮——移动互联网时代的今天,语音、语言及听觉信号与信息的处理已经成为智能移动终端、智能教育、智能家电等众多产业发展所依赖的共性核心技术,是大数据和认知计算时代信息技术产业未来发展的制高点,不仅具有广阔的市场前景,还在国家信息安全和民族文化传播方面具有重要的战略意义。

本次论坛邀请到数位国内外知名专家学者围绕语音、语言与听觉研究的技术发展和社会使命开展主题报告,同时设置了Panel环节就语音对话与听觉技术发展中面临的挑战与机遇。力求达到凝练领域内前沿科学问题、增进学术界与工业界交流、展望学科发展未来的目的。


论坛日程安排


CNCC 2021语音对话与听觉前沿技术与社会使命论坛

时间

活动内容

题目

讲者

单位

16:00-16:45

主旨报告1

Providing Speech Technology to Under-Resourced Communities

Mark Hasegawa-Johnson

University of Illinois

16:45-17:30

主旨报告2

Calibration and Uncertainty: Knowing What You Don't Know

Mark Gales

Cambridge University, England

17:30-18:15

主旨报告3

How the temporal amplitude envelope of speech contributes  to nonlinguistic information

Masashi Unoki

Japan Advanced Institute of Science and Technology (JAIST)

18:15-19:00

PANEL

———


讲者介绍


Mark Hasegawa-Johnson


图片

Mark Hasegawa-Johnson has been on the faculty at the University of Illinois since 1999, where he is currently a Professor of Electrical and Computer Engineering.  He received his Ph.D. in 1996 at MIT, with a thesis titled "Formant and Burst Spectral Measures with Quantitative Error Models for Speech Sound Classification," after which he was a post-doc at UCLA from 1996-1999. Prof. Hasegawa-Johnson is a Fellow of the Acoustical Society of America, and a Senior Member of IEEE and ACM.  He is currently Treasurer of ISCA, and Senior Area Editor of the IEEE Transactions on Audio, Speech and Language.  He has published 280 peer-reviewed journal articles and conference papers in the general area of automatic speech analysis, including machine learning models of articulatory and acoustic phonetics, prosody, dysarthria, non-speech acoustic events, audio source separation, and under-resourced languages.


题目:Providing Speech Technology to Under-Resourced Communities


报告摘要:In this talk I'll present results of two manuscripts currently in preparation: one on the subject of automatic discovery of phoneme inventories, one on the subject of counterfactually fair automatic speech recognition.  


The first paper addresses the problem of developing an ASR (automatic speech recognizer) that can be used to produce meaningful, useful transcriptions of languages that the ASR has never previously encountered.  


End-to-end neural ASR can be trained to listen to audio in a large number of training languages, and to generate output transcriptions in the international phonetic alphabet.  When the ASR is presented with a previously unknown language, error rates skyrocket, but not without pattern: errors tend to replace each phoneme with a phoneme from some other language that has similar articulation.  Results suggest that usable transcriptions in a previously unknown language could be obtained in this way.  


The second paper addresses demographic disparities in the accuracy of ASR: women tend to have higher error rates than men, blacks than whites, high school than college graduates, and younger than older speakers.  One of the most stringent criteria for fairness in artificial intelligence is the criterion of counterfactual fairness: counterfactual modification of the gender, race, education or age of a person should not modify the outcome of the classifier. It's possible to train a voice conversion algorithm to counterfactually modify the gender, race, education or age of a speaker.  Simply adding counterfactual data to the training set does not reduce gender, race, education and age disparities, but by training the network to ignore counterfactual differences, it is possible to reduce gender, race, education and age disparities in the accuracy of ASR.

Mark Gales 


图片

Mark Gales is a tenured professor of engineering at Cambridge University, England, IEEE Fellow, International Voice Communication Union (ISCA)Fellow. He graduated from the engineering department of Cambridge University. He was a researcher at IBM T.J.Watson in the United States, and a lecturer and associate professor at Cambridge University. He has been engaged in acoustic modeling of speech recognition for a long time, and has made innovative contributions in many fields of speech recognition and analysis such as adaptive algorithms. He is a senior editorial board member of IEEE/ACM transactions on audio speech and language processing and a member of IEEE speech and language processing technical Committee. He is an outstanding speaker of the International Voice Communication Union (ISCA).


题目:Calibration and Uncertainty: Knowing What You Don't Know


报告摘要:Machine learning, and particularly deep learning, has dramatically improved the performance of automated systems on a range of tasks including spoken language applications. One of the issues with these deep learning approaches is that they tend to be over confident in the decisions that they make, with possibly serious implications for safety critical and sensitive applications. This talk is split into two parts.  


The first discusses calibration, the probability that the predicted class or score is accurate. Initially this is examined from the the perspective of "static" data, such as images, and how ensemble and distillation approaches impact calibration. Sequence data, which is encountered in spoken language applications, yields additional challenges to computing and measuring calibration.  


The second part of the talk discusses the limitations of just considering whether a system is calibrated as the measure of prediction accuracy. Calibration only examines the total uncertainty of a prediction. This uncertainty can be partitioned into uncertainty into data (or aleatoric) uncertainty and knowledge (related to epistemic) uncertainty. The former is an inherent attribute of the data, whereas the latter relates to mismatches between the training and test distributions. Again, applying uncertainty to sequence data tasks yields interesting challenges and opportunities.

Masashi Unoki  


图片

Masashi Unoki received his M.S. and Ph.D. in Information Science from the Japan Advanced Institute of Science and Technology (JAIST) in 1996 and 1999. His main research interests are in auditory motivated signal processing and the modeling of auditory systems. He was a Japan Society for the Promotion of Science (JSPS) research fellow from 1998 to 2001. He was associated with the ATR Human Information Processing Laboratories as a visiting researcher from 1999-2000, and he was a visiting research associate at the Centre for the Neural Basis of Hearing (CNBH) in the Department of Physiology at the University of Cambridge from 2000 to 2001. He has been on the faculty of the School of Information Science at JAIST since 2001 and a full professor. Now, he is a dean of school of Information Science, JAIST. Dr. Unoki received the Sato Prize from the Acoustical Society of Japan (ASJ) in 1999, 2010, and 2013 for Outstanding Papers and Best Paper Award from the Institute of Electronics, Information and Communication Engineers in 2017. Currently, he is an associate editor of Applied Acoustics.


题目:How the temporal amplitude envelope of speech contributes to nonlinguistic 


报告摘要:Speech communicates non-linguistic and para-linguistic information as well as linguistic information. Our current studies on noise-vocoded speech (NVS) showed that temporal modulation cues provided by the temporal amplitude envelope (TAE) affect how vocal emotion and speaker individuality are perceived. Moreover, these studies also showed temporal modulation cues affect the perception of urgency. In this talk, we introduce that temporal modulation cues in the TAE play an important role in perception of nonlinguistic and para-linguistic information such as emotion and urgency perception.


Panel主持人


贾珈


图片

清华大学计算机科学与技术系博士生导师、副教授,教育部长江学者奖励计划青年学者。她分别于2003年和2008年在清华大学计算机科学与技术系获得学士学位和博士学位。主要研究兴趣包括情感计算和人机语音交互。目前是中国计算机学会高级会员、IEEE、ACM、International Speech Communication Association (ISCA)会员,担任中文信息学会语音专业委员会秘书长、中国中文信息学会青年工作委员会委员、中国计算机学会青年工作委员会通讯委员、全国人机语音通讯学术会议常设机构委员、中国图象图形学会多媒体专委会委员、以及全国信标委用户界面分委会语音交互工作组副组长。曾获得2012年ACM Multimedia Grand Challenge Prize,2009年教育部科技进步奖和2016年教育部科技进步奖(第一完成人)。她在IEEE Transactions on Affective Computing, IEEE Transactions on Audio Speech and Language Processing, IEEE Transactions on Multimedia, ACM Multimedia, AAAI, IJCAI等领域内主流学术刊物和会议上发表论文60余篇,并与腾讯、SOGOU、华为、西门子、MSRA、BOSCH等国内外同领域企业保持密切合作。


Panel嘉宾


何晓冬


图片

IEEE Fellow,京东集团技术副总裁,京东人工智能研究院常务副院长,深度学习及语音和语言实验室的负责人。他还担任香港中文大学(深圳),华盛顿大学(西雅图),和同济大学(上海)兼职教授,以及中央美术学院(北京)的荣誉教授。在加入京东集团之前,他曾担任微软雷德蒙德研究院深度学习技术中心的首席研究员和负责人。他的研究主要集中在人工智能领域,包括深度学习,自然语言处理,语音识别,计算机视觉,信息检索和多模态智能。他发表100余篇论文, 谷歌学术统计引用数过万次。他的工作包括深层结构化语义模型(DSSM),分层注意力网络(HAN),AttnGAN等,广泛应用于语言,视觉,IR和知识表示等任务。他于2019年入选国际电气和电子工程师协会院士(IEEE Fellow)。他于1996年获得清华大学(北京)学士学位,1999年获得中国科学院(北京)硕士学位,2003年获得美国密苏里大学哥伦比亚分校博士学位。

陶建华


图片

中科院自动化所模式识别国家重点实验室副主任,国家杰出青年基金获得者。1993年和1996年分别获得南京大学电子系学士和硕士学位,2001年获清华大学计算机系博士学位。目前担任IEEE Trans. on Affective Computing Steering Committee Member、ISCA SIG-CSLP副主席、HUMAINE学会执行理事、中国计算机学会常务理事、中国人工智能学会理事、中国中文信息学会理事、中国声学学会理事等职务。先后负责和参与国家级项目(863重点、国家自然科学基金、发改委、科技部国际合作)20余项。

陈景东


图片

西北工业大学教授,曾任WeVoice公司(美国新泽西)首席科学家(Chief Scientist)。主要从事语音分析及合成。在信号处理领域的国际著名学术刊物及会议上发表了九十余篇论文。2009年荣获IEEE信号处理学会最佳论文奖(IEEE SPS Best Paper Award),于2007和2009年两次获得贝尔实验室模范团队奖(Bell Labs Role Model Team Award), 于2009和2010年两次获得美国国家宇航局(NASA)的技术创新奖 (NASA Tech Brief Award)。现任IEEE Trans. Audio, Speech, and Language Processing的副主编及IEEE信号处理学会电声技术委员会委员。曾担任The Open Signal Processing Journal的编委会委员,美国及新加坡等多个科学基金会的评审专家,参与组织了多个国际学术会议,并长期担任数十个著名学术刊物的审稿人。

郑方


图片

清华大学语音和语言技术中心主任、研究员,博士生导师,清华大学信息技术研究院副院长。现任国际中文语言资源联盟理事长、中文语音交互技术标准工作组声纹识别专题组组长、全国人机语音通讯学术会议常设机构委员会主席、中国计算机学会人工智能与模式识别专业委员会委员、中文信息学报编委、Speech Communication编委、IEEE高级会员、中国计算机学会(CCF)高级会员、东方COCOSDA核心成员、ISCA会员、APSIPA会员、中国中文信息学会理事、中国声学学会理事等。

俞凯


图片

上海交通大学计算机系教授,思必驰公司首席科学家。清华大学本科、硕士,剑桥大学博士。长期从事对话式人工智能、语音及语言处理的研究和产业化工作。入选国家级人才工程,NSFC优青,上海市“东方学者”特聘教授,IEEE Speech and Language Processing Technical Committee委员(2017-2019)。中国人工智能产业发展联盟学术和知识产权组组长,CCF语音对话及听觉专业组副主任。发表论文170余篇,获得ISCA 《Computer Speech and Language》 ,《Speech Communication》,InterSpeech Best Paper Award等多篇国际期刊和会议优秀论文奖,以及多个国际研究评测冠军。曾获吴文俊人工智能科技进步奖,《科学中国人》年度人物,CCF青竹奖。


640



640


CNCC2021将于10月28-30日在深圳举行,今年大会主题是“计算赋能加速数字化转型”。CNCC是计算领域学术界、产业界、教育界的年度盛会,宏观探讨技术发展趋势,今年预计参会人数将达到万人。每年特邀报告的座上嘉宾汇聚了院士、图灵奖得主、国内外名校学者、名企领军人物、各领域极具影响力的业内专家,豪华的嘉宾阵容凸显着CNCC的顶级行业水准及业内影响力。


今年的特邀嘉宾包括ACM图灵奖获得者John Hopcroft教授和Barbara Liskov教授,南加州大学计算机科学系和空间研究所Yolanda Gil教授,陈维江、冯登国、郭光灿、孙凝晖、王怀民等多位院士,及众多深具业内影响力的专家。今年的技术论坛多达111个,无论从数量、质量还是覆盖,都开创了历史之最,将为参会者带来学术、技术、产业、教育、科普等方面的全方位体验。大会期间还将首次举办“会员之夜”大型主题狂欢活动,让参会者畅快交流。


CNCC2021将汇聚国内外顶级专业力量、专家资源,为逾万名参会者呈上一场精彩宏大的专业盛宴。别缺席,等你来,欢迎参会报名!


图片

CNCC2021参会报名