Dokumentation (english)
AI DictionaryPart 0: AI Tasks

Multimodal

Models that process or generate multiple modalities at once

Multimodal tasks work with multiple types of data at the same time.

Examples: text + images, text + audio, text + video.

Common Multimodal Tasks

  • Image-Text-to-Text: Generate text from a combination of images and text prompts
  • Visual Question Answering: Answer questions about images
  • Document Question Answering: Answer questions from documents or PDFs
  • Audio-to-Text: Convert audio or transcripts into coherent text outputs
  • Video-to-Text: Generate text based on video content
  • Visual Document Retrieval: Retrieve documents or visuals based on multimodal queries
  • Any-to-Any: General multimodal conversion between arbitrary input and output types

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 15 Tagen
Release: v4.0.0-production
Buildnummer: master@6fbd7b5
Historie: 13 Items