TF58视觉基础模型研究及应用-中国计算机学会

您的位置：首页 > 活动 > 培训 > TF > 讲者专栏 > TF58视觉基础模型研究及应用

会议主席

何中军

CCF TF算法与AI SIG主席，百度人工智能技术委员会主席

个人简介：长期从事机器翻译研究与开发，研发了全球首个互联网神经网络机器翻译系统及语义单元驱动的机器同传系统。曾获国家科技进步二等奖、中国电子学会科技进步一等奖、北京市科技进步一等奖、中国专利银奖等多项奖励。

段亦涛

网易有道首席科学家

个人简介：本科与硕士毕业于北京航空航天大学，于2007年获UC Berkeley计算机科学专业博士学位，研究方向包括大规模分布式计算，数据挖掘，机器学习，密码学以及安全和隐私。在博士期间加入有道，参与完成有道底层架构，目前任网易有道首席科学家，负责有道技术创新与相关实践工作。主要关注以深度学习为代表的最新AI技术在互联网各个领域的应用，包括机器翻译，图像识别等。主导了有道神经网络机器翻译YNMT等核心技术的研究和开发。

特邀讲者

王井东

百度计算机视觉首席科学家

主题：《Context Autoencoder for Scalable Self-Supervised Representation Pretraining》

主题简介：Self-supervised representation pretraining aims to learn an encoder from unlabeled images, such that the encoded representations take on semantics and benefit downstream tasks. In this talk, I present a novel masked image modeling approach, context autoencoder (CAE), for scalable self-supervised representation training. The core ideas include that predictions are made in the latent representation space from visible patches to masked patches and that the encoder is only for representation learning and representation learning is only by the encoder. I also discuss why masked image modeling potentially outperforms contrastive pretraining (e.g., SimCLR, MoCo) and why contrastive learning performs on par with supervised pretraining on ImageNet. In addition, I show that linear probing and the extended version, attentive probing, are more suitable than fine-tuning on ImageNet for pretraining evaluation.

个人简介：Jingdong Wang is a Chief Scientist for computer vision with Baidu. His team is focusing on conducting product-driven and cutting-edge computer vision/deep learning/AI research and developing practical computer vision applications. Before joining Baidu, he was a Senior Principal Researcher at Microsoft Research Asia. His areas of interest are computer vision, deep learning, and multimedia search. His representative works include deep high-resolution network (HRNet), discriminative regional feature integration (DRFI) for supervised saliency detection, neighborhood graph search (NGS, SPTAG) for large scale similarity search. He has been serving/served as an Associate Editor of IEEE TPAMI, IJCV, IEEE TMM, and IEEE TCSVT, and an area chair of leading conferences in vision, multimedia, and AI, such as CVPR, ICCV, ECCV, ACM MM, IJCAI, and AAAI. He was elected as an ACM Distinguished Member, a Fellow of IAPR, and a Fellow of IEEE, for his contributions to visual content understanding and retrieval.

肖斌

微软Cloud & AI计算机视觉研究组高级研究员

主题：《Florence: A New Foundation Model for Computer Vision》

主题简介：在多模态的大规模数据集上进行训练，通过少量的数据微调可以适应各种下游任务的计算机视觉基础模型，对于现实世界的计算机视觉应用至关重要。2021年底，微软发布Florenc基础模型，通过结合来自 Web的大规模图像 - 文本数据训练，可以轻松地适应各种计算机视觉任务，包括分类、检索、目标检测、视觉问答（VQA）、图像描述、视频检索和动作识别。模型发布时，在44个表征基准测试中多数都取得了新的SOTA结果，例如ImageNet-1K 零样本分类任务，top-1 准确率为85.7，ImageNet-1k微软后获得90.45 top-1准确率，COCO微调任务获得62.4 mAP，VQA任务获得80.36 mAP。

个人简介：现任微软Cloud & AI计算机视觉研究组高级研究员。主要研究方向为计算机视觉，大规模数据/语言多模态模型训练，物体检测/分割，人体姿态识别等。在CVPR/ECCV/ICCV/ICLR/AAAI等顶尖学术会议发表论文20余篇。他的多项研究技术成果已经开源并且应用到微软Azure等产品。

崔崟

Senior Research Scientist at Google

主题：《Label-Efficient Visual Perception via Multimodal Supervision and Distillation》

主题简介：In this talk, I will focus on two of our recent work (VATT and ViLD) towards building label-efficient computer vision models. In VATT, we learn multimodal representations from unlabeled raw video, audio and text using a unified Transformer encoder. In ViLD, we distill from pre-trained vision-language models such as CLIP to enable strong open-vocabulary detection using off-the-shelf Mask R-CNN.

个人简介：Yin Cui is a Senior Research Scientist at Google. Yin's research focuses on multimodal and label-efficient visual perception. Before joining Google, he received a Ph.D. in Computer Science from Cornell University in 2019, advised by Professor Serge Belongie. Yin also co-organized COCO Visual Recognition Workshops and Fine-Grained Visual Categorization Workshops at major computer vision conferences.

张磊

Chair Scientist of Computer Vision and Robotics at IDEA

个人简介：Lei Zhang is currently a Chair Scientist of Computer Vision and Robotics at International Digital Economy Academy(IDEA) and an Adjunct Professor of Hong Kong University of Science and Technology (Guangzhou). Prior to this, he was a Principal Researcher and Research Manager at Microsoft, where he has worked since 2001 in Microsoft Research Asia (MSRA), Microsoft Research(MSR, Redmond), and other computer vision-related product teams. He has led research teams for years, conducting research on computer vision with applications in large-scale image analysis, object detection, and vision-language understanding. His research has led to many practical impacts in Bing Multimedia Search and Microsoft Cognitive Services. He has published more than 150 papers in top conferences and journals and holds more than 60 US-granted patents. He was named as IEEE Fellow for his contribution in large-scale visual recognition and multimedia information retrieval.

屠卓文

Professor of Computer Science and Engineering, University of California San Diego

个人简介：Zhuowen Tu is a full professor of Cognitive Science and also affiliated with the Department of Computer Science and Engineering, University of California San Diego. Before joining UCSD in 2013 as an assistant professor, he was a faculty member at UCLA. Between 2011 and 2013, he took a leave to work at Microsoft Research Asia. He received his Ph.D. from the Ohio State University and his M.E. from Tsinghua University. He is a recipient of the David Marr Prize award 2003 and a recipient of the David Marr Prize Honorable Mention award 2015. He is a Fellow of the IEEE.