← Back to Home
★ 39

MM-VID

MM-Vid: Video Understanding with GPT-4V(ision). Leveraging multimodal large language models for comprehensive video analysis and understanding.

GPT-4VVideo UnderstandingMultimodal

An early exploration of using GPT-4V's vision capabilities for end-to-end video understanding tasks.