Tightly Connecting Vision and Language

Remarkable progress has been made at the intersection of vision and language. While showing great promise, current vision and language models may only weakly “connect” the two modalities and often fail in the wild.

In this talk, Goggle’s Soravit Changpinyo will present recent efforts aiming to bridge this gap along two dimensions: informativeness and controllability.

In particular, I will describe how we can leverage large-scale datasets, including our recently-released CC12M and Localized Narratives, to benefit existing vision-and-language tasks as well as to enable new applications.

Frank

#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv . Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan. I blog to help you become a better data scientist/ML engineer Opinions are mine. All mine.