第五期多模态具身智能|CCF-TF 智能媒体计算国际研讨会

阅读量:663 2024-01-25 收藏本文

CCF-TF International Symposium on Intelligent Media Computing is co-hosted by China Computer Federation (CCF) and Temasek Foundation (TF), and is jointly organized by CCF Multimedia Technical Committee and Nanyang Technological University, Singapore. The symposium comprises a series of monthly research seminars and will be held online. In each seminar, well-known researchers from China, Singapore and USA will introduce the frontier advances in various aspects of artificial intelligence, including but not limited to future intelligent media, robotics, multimedia analysis and retrieval, media coding and transmission, intelligent media and health, artificial intelligence in healthcare, FinTech, etc. The fifth phase is Multimodal Embodied Intelligence.

CCF-TF智能媒体计算国际研讨会是由中国计算机学会（CCF）和淡马锡基金会（TF）联合主办，CCF多媒体专委会和新加坡南洋理工大学共同承办的系列性学术活动。会议拟设7个专题分论坛，以线上的形式每月举办一次。每个专题包含三个特邀报告，邀请来自中国、新加坡、美国等国家的知名专家学者分享人工智能领域的前沿进展。会议议题涵盖未来智能媒体、机器人、多媒体检索与分析、信息编码与传输、智能媒体与健康、智慧医疗、金融科技等。第五期为多模态具身智能。

schedule
Time	Speaker	Topics	Host
14:00-15:00 pm	Dr. Li Yi, Tsinghua University, China	Empowering Generalizable Human-Robot-Interaction via Human Simulation	Prof. ShuqiangJiang, Institute of Computing Technology,CAS, China
15:00-16:00 pm	Dr. Wenguan Wang, Zhejiang University, China	Social-Cognitive Embodied AI
16:00-17:00 pm	Prof. Si Liu， Beihang University, China	Research on Key Technologies of Multimodal Embodied Navigation

All times are at January 25 (Thursday), 2024, China Standard Time (CST), UTC +8

Invited Speakers

Dr. Li Yi

Li Yi is a tenure-track assistant professor in the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. He received his Ph.D. from Stanford University, advised by Professor Leonidas J. Guibas. And he was previously a Research Scientist at Google. Before joining Stanford, he got his B.E. in Electronic Engineering from Tsinghua University. His recent research focuses on 3D computer vision, embodied perception, and human-robot interaction, and his mission is to equip robotic agents with the ability of understanding and interacting with the 3D world. He has published papers at top-tier computer vision, computer graphics, and machine learning conferences with more than 18000 citations. And he has served as an Area Chair for CVPR, IJCAI, and NeurIPS. His representative work includes ShapeNet-Part, SyncSpecCNN, PointNet++.

Title:Empowering Generalizable Human-Robot-Interaction via Human Simulation

Abstract:The embodied AI research community strives to enable robots to interact and collaborate with humans. However, while significant progress has been made in teaching robots human-free manipulation skills, scalably learning human-robot interaction skills that generalize to various tasks and behaviors has lagged behind. Real-world training of robots for human interaction is costly and risky, making it inherently impractical for scalability. Therefore, there is a need to simulate human behaviors and train robots in virtual environments before deploying them in the real world. In this talk, we will discuss our recent efforts in curating large-scale human interaction datasets, synthesizing realistic human behaviors that can generalize to new environments and tasks, and utilizing scalable human simulation for generalizable human-robot interaction. Through simulating human interactions in diverse scenes, we create human-centric robot simulators. By employing dynamic task and motion planning to generate high-quality demonstrations, we can train transferrable human-robot interaction skills. We believe that this approach presents a powerful paradigm for advancing real-world human-robot interaction.

Dr. Wenguan Wang

Dr. Wenguan Wang is a ZJU100 Young Professor at Zhejiang University, China, and the recipient of National Outstanding Youth Program (Overseas). His research interests span Machine Perception, Human-Centered AI, and Embodied AI. He has published over 80 papers in prestigious journals and conferences such as TPAMI, IJCV, ICLR, ICML, NeurIPS, CVPR,ICCV, ECCV, AAAI, and Siggraph Asia, including one CVPR Best Paper Finalist, one CVPR workshop Best Paper, and 18 top-conference Spotlight/Oral papers. He has more than 15,000 citations on Google Scholar with an H-index of 66. He serves as an Associate Editor for Information Fusion, TCSVT, and Neurocomputing. He won awards in 15 international academic competitions. He is the recipient of Clarivate Highly Cited Researchers (2023), Australian Research Council (ARC) – Discovery Early Career Researcher Award (DECRA) (2021), Top 2% Scientists Worldwide by Stanford University (2022, 2023), Elsevier Highly Cited Chinese Researchers (2020, 2021, 2022), World Artificial Intelligence Conference Youth Outstanding Paper Award (2020), China Association of Artificial Intelligence Doctoral Dissertation Award (2019), and ACM China Doctoral Dissertation Award (2018).

Title:Social-Cognitive Embodied AI

Abstract:The modern AI technologies, represented by large-scale models, have become the core driving force leading the next generation industrial revolution. These technologies will greatly promote the social application of intelligent machines and provide vast development space for embodied intelligence research, which emphasizes the interaction between AI agents and the physical world. However, current neural network based embodied AI systems, while owning strong learning capability, are facing challenges such as the lack of knowledge reasoning, difficulty in interacting with humans, and the inability to explain their inherent decision-making modes. Against the backdrop of new changes in the field of AI and the era of large AI models, this talk will introduce our recent research towards social-cognitive embodied AI, based on a data-guided and knowledge-driven framework. This talk will discuss how to integrate human symbolic knowledge and large AI models with embodied intelligent systems to enhance their intelligence levels in perception, planning, reasoning, and interaction.

Prof. Si Liu

Si Liu is currently a professor at Beihang University. She received her Ph.D. degree from Institute of Automation, Chinese Academy of Sciences. She has been a research assistant and Postdoc in National University of Singapore. Her research interest includes computer vision, multimedia analysis and embodied AI. She has published over 80 cutting-edge papers, with more than 12000 citations. She was the recipient of Best Paper Award of ACM MM 2021 and 2013, Best Demo Award of ACM MM 2012. She was the organizer of the ECCV 2018, ICCV 2019, CVPR 2021 and ACM MM 2022 “Person in Context” Challenges. She served multiple times as Area Chair (AC) for top conferences like ICCV, CVPR, ECCV, ACM MM, NeurIPS, etc. She is the Associate Editor of IEEE TMM and TCSVT.

Title:Research on Key Technologies of Multimodal Embodied Navigation

Abstract:Embodied AI aims to study how to enable agents to learn intelligence in the real world. By autonomous perceiving, reasoning, and decision-making, agents interact with their surrounding environment, thereby accomplishing a variety of complex tasks. Multimodal embodied navigation is a key aspect for embodied agents to gain basic movement decision-making capabilities, holding important research value and significance. The speech will start from a brief introduction of embodied AI and multimodal navigation task, and then talk about a series of academic works on multimodal navigation that involve hierarchical planning, target prediction, knowledge-based reasoning and sound navigation, etc. She will also talk about the ideas and insights behind these works.

Organizers