Multimodal inputs to an agent
Let’s create an agent that can understand images and make tool calls as neededImage Agent
image_agent.py
Audio Agent
audio_agent.py
Video Agent
Currently Agno only supports video as an input for Gemini models.
video_agent.py
Multimodal outputs from an agent
Similar to providing multimodal inputs, you can also get multimodal outputs from an agent.Image Generation
The following example demonstrates how to generate an image using DALL-E with an agent.image_agent.py
Audio Response
The following example demonstrates how to obtain both text and audio responses from an agent. The agent will respond with text and audio bytes that can be saved to a file.audio_agent.py
Multimodal inputs and outputs together
You can create Agents that can take multimodal inputs and return multimodal outputs. The following example demonstrates how to provide a combination of audio and text inputs to an agent and obtain both text and audio outputs.Audio input and Audio output
audio_agent.py