demoTrack B

Multimodal Agents Demo (Text, Voice, Video)

Day 1
15:45-17:00
Track B

About This Session

Demonstration of cutting-edge multimodal AI agents that can process and generate text, speech, images, and video, opening new interaction possibilities.

The future of AI agents goes beyond text. This demo showcases agents that can seamlessly work across multiple modalities - understanding spoken commands, analyzing images and video, generating visual content, and communicating through voice.

See practical applications including video analysis agents, voice-interactive assistants, visual content generation, and multimodal reasoning. Learn about the technical challenges of multimodal AI and emerging patterns for building agents that leverage multiple input and output channels.

Speakers

ST

Speaker TBA

Role to be confirmed

Speaker details will be announced closer to the conference date.

Learning Objectives

  • Understand multimodal AI capabilities and limitations
  • See practical applications of multimodal agents
  • Learn integration patterns for vision and speech
  • Explore emerging multimodal agent architectures

Who Should Attend

AI EngineersProduct DesignersInnovation LeadersTechnical Architects

Prerequisites

  • Basic understanding of AI agents

Don't Miss This Session

Register now to secure your spot at Agentica 2026 and get access to this session and all conference content.