Multi-modal input

The capability of an AI system to interpret and process various data types—such as text, images, audio, and video—within a unified framework or workflow.

Key Components