Multi-modal input
The capability of an AI system to interpret and process various data types—such as text, images, audio, and video—within a unified framework or workflow.
Key Components
- natural-language-processing (Textual data)
- Computer Vision (Visual data)
- Audio Processing (Aural data)
Related Implementations
- 2026 04 14 Kombai for Design of Front ends