AI capabilities / Voice and audio

AI voice agents that take a real call.

Phone agents for after hours, intake, scheduling, and FAQ. Real time speech to text, LLM grounded response, text to speech. Built for businesses where the phone rings off the hook and the staff is busy.

01What it does

What an AI voice agent does.

An AI voice agent answers the phone, captures the lead, schedules the callback, deflects routine questions, and escalates real issues to the owner. The customer gets a real interaction, not a voicemail.

Most small businesses lose calls during work hours and after hours. The crew is on the job, the front desk is busy, the owner is at home. An AI voice agent answers in real time, identifies what the caller needs, and either resolves the question or schedules the callback with full context for the owner.

The agent can do four things well: take a lead with name plus phone plus scope plus address, answer FAQ questions like address plus hours plus accepted insurance plus services, schedule callbacks or appointments, and escalate emergencies to the owner phone immediately.

02How it works

The architecture, real time.

A voice agent is a streaming pipeline: phone audio enters via Twilio, streams to a real time speech to text model, which feeds text to an LLM. The LLM generates the response which streams to a text to speech model, which streams audio back to the caller. The whole loop runs in under a second per turn.

The system prompt and tool definitions are where the work lives. The agent is given a defined set of tools (look up business hours, check schedule, capture lead intake, escalate to owner) and a defined set of allowable topics. It calls tools as needed and stays inside the topic boundary.

Sources: Twilio Voice, OpenAI Realtime API, ElevenLabs TTS.

03Stack

What we build with.

Default voice agent stack.

Telephony

phone layer
TwilioVonage

Speech to text

STT
DeepgramOpenAI Whisper

LLM

response
OpenAI RealtimeClaude

Text to speech

TTS
ElevenLabsOpenAI TTSCartesia
05Pricing and timeline

What a voice agent costs.

A production voice agent ships in 4 to 6 weeks at the $2,497 Business OS tier. The build cost is one time; ongoing cost is per minute of call traffic plus a fixed monthly minimum.

Per minute cost runs $0.05 to $0.20 depending on the model and TTS choice. We instrument cost per call and cost per resolved interaction so the unit economics are visible.

06FAQ

Voice agent FAQ.

Will the caller know it is an AI?

By default yes; the agent identifies as an automated assistant. We can deploy with implied disclosure if local rules allow, but transparency is the safer default.

What about state two party consent recording?

Calls in two party consent states require explicit consent before recording. The agent asks. If consent is denied the call proceeds without recording.

Can the agent speak Spanish?

Yes. We deploy with primary language set per business and a language detection switch. Spanish, English, and several other languages are supported.

What about emergency calls?

The agent is built to escalate emergencies to the owner phone immediately and to provide local 911 guidance for genuine medical or safety emergencies. The agent does not attempt to handle emergency calls itself.

How fast does it answer?

Inside one ring. The streaming pipeline is real time; the customer does not hear a long delay before the greeting.

Ready to scope a Voice and audio project?