Talks AI

How to Build Your First Real-Time Voice Agent in Python (Without Losing Your Mind)

Friday, May 15th, 2026 5:15 p.m.–5:45 p.m. in Grand Ballroom A

Presented by

Experience Level:

Some experience

Description

Voice agents powered by LLMs look amazing in demos… until you try to build one yourself and the problems start: slow responses, overlapping audio, and a bot that either hallucinates non-stop or falls into eternal silence. In this talk, I’ll share how I built an “English Speaking Buddy” voice agent: an English tutor you can talk to, that adapts to your level and helps you lose the fear of speaking. It’s built with Python, AWS services (for speech and LLM), and Pipecat, an open source Python framework for creating real-time conversational voice agents. We’ll walk through the core pipeline audio → text → LLM → audio, and how to build a clear, maintainable architecture for real-time voice agents. We’ll also look at the less glamorous but critical parts that make this actually work: handling errors, configuring and protecting API keys, logging what’s going on, and making good design decisions so the conversation with the bot feels natural. Even though the examples use AWS and Pipecat, the patterns we’ll discuss can be applied with other providers and libraries. You’ll leave with a clear mental model of how these agents work, concrete Python code examples you can adapt to your own projects, and a small checklist to turn a demo into something people can actually talk to.