This is a list of public datatasets containing multiple modalities.

Images+text

Audio+text

Images+Depth

Audio+video+text

Bigdata