Blog header background
    AI Strategy

    Build AI Voice Agents With ElevenLabs

    Hatty AI
    March 7, 2026
    18 min read
    🤖

    Featured Article

    AI Strategy

    Build AI Voice Agents With ElevenLabs

    Build production-ready AI voice agents using ElevenLabs — WebRTC setup, voice cloning, prompt engineering, and deployment patterns.

    Hatty AI
    March 7, 2026
    18 min read

    Why Voice Agents Are the Next Interface

    Text-based chatbots had their moment. But in 2026, the businesses winning customer engagement are the ones letting users talk to their AI — literally. Voice agents combine the power of large language models with natural-sounding speech synthesis to create experiences that feel like speaking with a real person.

    ElevenLabs has emerged as the leading platform for building these agents. Their Conversational AI platform handles the hard parts — ultra-low-latency speech synthesis, WebRTC audio transport, voice activity detection, and turn-taking — so developers can focus on business logic.

    This guide walks you through building a production-grade AI voice agent from scratch: architecture decisions, implementation patterns, prompt engineering for voice, and deployment best practices.

    🎯 What You'll Build

    • A real-time voice agent with WebRTC audio streaming
    • Server-side token authentication (no exposed API keys)
    • Client-side tool execution (book appointments, look up data)
    • Custom voice selection and personality tuning

    Architecture Overview

    A voice agent system has three layers:

    1. Client (React) — Captures microphone audio, plays agent responses, handles UI state. Uses the @elevenlabs/react SDK.
    2. Auth Server (Edge Function) — Generates short-lived conversation tokens so your API key never touches the browser.
    3. ElevenLabs Platform — Handles speech-to-text, LLM reasoning, text-to-speech, and audio transport via WebRTC.

    Client (mic audio) → WebRTC → ElevenLabs STT → LLM → TTS → WebRTC → Client (speaker)

    The entire round-trip typically takes 500ms–1.2s, making conversations feel natural.

    Step 1: Server-Side Token Generation

    Never expose your ElevenLabs API key in client-side code. Instead, create a server endpoint that generates short-lived conversation tokens:

    // Edge Function: elevenlabs-conversation-token

    1. Receive the agentId from the client request
    2. Call ElevenLabs' token endpoint with your server-side API key
    3. Return the short-lived token to the client
    4. Optionally fetch a WebSocket signed URL as fallback

    This pattern is critical for production deployments. The token expires quickly, limiting exposure even if intercepted.

    Step 2: React Client Implementation

    The useConversation hook from @elevenlabs/react manages the entire WebRTC connection lifecycle:

    • Connection management — Handles WebRTC negotiation, ICE candidates, and reconnection
    • Audio capture — Requests microphone access and streams audio to ElevenLabs
    • Playback — Receives and plays synthesized speech through the browser
    • State tracking — Exposes status, isSpeaking, and volume levels

    The basic flow: request mic permission → fetch token from your server → call startSession() with the token → the user starts talking.

    Step 3: Client Tools — Making Your Agent Do Things

    Voice agents become powerful when they can take actions, not just talk. ElevenLabs supports "client tools" — functions the agent can invoke during conversation:

    • Book an appointment — Agent collects date/time preferences, calls your scheduling API
    • Look up order status — Agent asks for order number, queries your database
    • Navigate the user — Agent directs to a specific page based on conversation context
    • Submit a lead form — Agent gathers name, email, needs — submits to your CRM

    ⚠️ Important

    Client tools must be configured in the ElevenLabs web UI before they'll work in your code. Define the tool name, description, and parameter schema in the agent settings — the SDK handles the rest.

    Step 4: Choosing and Customizing Voices

    Voice selection is a brand decision, not just a technical one. ElevenLabs offers 30+ pre-built voices and the ability to clone custom voices.

    Pre-Built Voice Selection Guide

    Use CaseRecommended VoiceWhy
    Sales AgentChris / SarahWarm, conversational tone that builds trust
    Tech SupportDaniel / AliceClear, authoritative, patient delivery
    Customer ServiceLaura / LiamFriendly, empathetic, natural cadence
    Executive BriefingGeorge / MatildaProfessional, polished, confident

    Step 5: Prompt Engineering for Voice

    Writing prompts for voice agents is fundamentally different from text chatbots:

    • Keep responses short — Aim for 1-3 sentences. Users can't "scan" voice like they scan text.
    • Use conversational language — "Got it!" beats "I understand your request."
    • Handle interruptions — Instruct the agent to gracefully yield when interrupted.
    • Confirm actions verbally — "I've booked that for 3pm Tuesday. Sound good?"
    • Avoid lists — Don't read off 5 options. Offer 2-3 and ask which direction to go.

    Example System Prompt Structure:

    You are [Name], a [role] for [Company]. Your personality is [traits]. Keep responses under 3 sentences unless the user asks for detail. When you need information, ask one question at a time. Always confirm before taking actions.

    Step 6: Expressive Mode (New in 2026)

    ElevenLabs launched Expressive Mode for ElevenAgents in February 2026. This isn't just better TTS — it's a fundamentally different approach to agent voice:

    • Emotional awareness — The agent adapts tone based on conversation context (empathetic when a customer is frustrated, enthusiastic when closing a deal)
    • Natural disfluencies — Subtle "um"s and breath patterns that make the voice feel human
    • Dynamic pacing — Speeds up for excitement, slows down for important information

    To enable Expressive Mode, toggle it in the ElevenLabs agent configuration panel. No code changes required — it enhances the existing voice pipeline.

    Production Deployment Checklist

    • ☐ API key stored as server-side environment variable (never in client code)
    • ☐ Token generation endpoint rate-limited
    • ☐ Microphone permission requested with clear UX explanation
    • ☐ Graceful fallback for browsers without WebRTC support
    • ☐ Error handling for network drops and reconnection
    • ☐ Analytics tracking for conversation starts, duration, and completion
    • ☐ Volume controls accessible to users
    • ☐ Mobile-responsive agent UI tested on iOS and Android

    Common Pitfalls and How to Avoid Them

    PitfallSolution
    Agent talks too muchAdd "keep responses under 3 sentences" to system prompt
    Echo/feedback loopsEnable echo cancellation in microphone config
    High latency on mobileUse WebRTC (not WebSocket) and turbo model
    Agent ignores interruptionsTune VAD sensitivity; add interruption handling to prompt
    Exposing API keysAlways use server-side token generation

    Real-World Use Cases We're Seeing

    • After-hours receptionist — Voice agent handles calls when the office is closed, books callbacks for morning
    • Website sales concierge — Embedded voice widget qualifies leads through natural conversation
    • IT help desk tier-1 — Agent troubleshoots common issues (password resets, connectivity) before escalating
    • Appointment scheduling — Patients or clients book time slots through voice instead of clicking through calendars
    • Multilingual support — Single agent handles conversations in 29 languages using ElevenLabs' multilingual models

    Ready to Build Your Voice Agent?

    Whether you need a sales agent, support bot, or custom voice interface — our team builds production-ready voice agents that integrate with your existing systems.

    From Concept to Deployed Voice Agent

    We handle the architecture, voice selection, prompt engineering, and deployment — you get an AI agent that sounds like your brand.

    Start Your Voice Agent Project

    Related: ChatGPT for Business Guide · Agentic AI for Business · Managed IT Services

    Frequently Asked Questions

    🍪 We Value Your Privacy

    We use cookies and similar technologies to enhance your experience, analyze site traffic, and understand where our visitors are coming from. You can customize your preferences at any time.