項目一

視覺智能

RP1-5

Universal Representations for exchanging and integrating visual knowledges

The recent developments in deep learning, while having great impacts in various application domains. There are some clear advantages of model-agnostic approaches: domain knowledge is no longer necessary for data processing, yet good performance can often be achieved due to a holistic view of the data. However, there are plenty of side information and domain knowledge can hardly be utilized in the current DNN solutions. The relatively low efficiency of using data makes the sample complexity and computation complexity quickly grow to and beyond the capacity of supercomputers.

We envision that the next generation of data processing infrastructure, which fundamentally solves these problems, by focusing on processing knowledge instead of the raw data. Knowledge represented in a compact form should typically have orders of magnitude lower dimensionality than the raw data, and thus can be stored, exchanged, and managed efficiently.

We believe that understanding, exchanging, and integrating such knowledge is the key step towards building scalable, multi-purpose, and secure data infrastructure of the next generation. We aim at developing theories and algorithms for 1) modeling a unified knowledge representation, which allows knowledge learned from one data source to be used for other inference tasks and facilitates knowledge exchange between multiple tasks, 2) allows interpretations to the knowledge learned from data, 3) can perform multi-domain learning by jointly processing multiple data sources, 4) can incorporate domain knowledge, and 4) tolerates noise and quantization errors in the storage and communications and thus allows scalable system implementations.

Towards learning the unified knowledge representation, we conduct research activities in the directions for learning knowledge for understanding 3D point cloud data, 3D object pose from 2D images, understanding action localization from videos, and modeling noise-free signals from noisy raw signals.