Calls

Voice calls are real-time conversations through your agent’s phone numbers. Calls can be inbound (received) or outbound (initiated via API). Each call includes metadata like duration, status, and transcript.

How calls are handled depends on your agent’s voice mode.

Voice modes

Custom Webhook

voiceMode: "webhook" (default) — Caller speech is transcribed and sent to your webhook as agent.message events. Your server controls every response using any LLM, RAG, or custom logic.

Built-in AI

voiceMode: "hosted" — Calls are handled end-to-end by a built-in LLM using your systemPrompt. No webhook or server needed.

Switch modes at any time via PATCH /v1/agents/:id. The backend automatically re-provisions voice infrastructure and rebinds phone numbers with no downtime.

SMS is always webhook-based regardless of voice mode.

Call flow (webhook mode)

When voiceMode is "webhook":

1

Caller dials your number

The voice engine answers and begins streaming audio.

2

Caller speaks

Streaming STT transcribes in real-time and detects end of speech.

3

Transcript is sent to your webhook

We POST the transcript to your webhook with event: "agent.message" and channel: "voice", including recentHistory for context.

4

Your server responds

You process the transcript (e.g., send to your LLM) and return a response. We strongly recommend streaming NDJSON — TTS starts speaking on the first chunk.

5

TTS speaks the response

Each NDJSON chunk is spoken with sub-second latency. No waiting for the full response.

6

Conversation continues

The caller can interrupt at any time (barge-in). The cycle repeats naturally.

Call flow (built-in AI mode)

When voiceMode is "hosted":

1

Caller dials your number

The AI answers with your beginMessage (e.g., “Hello! How can I help?”).

2

Caller speaks

Streaming STT transcribes in real-time.

3

Built-in LLM generates a response

The LLM uses your systemPrompt to generate a contextual response.

4

TTS speaks the response

Streaming TTS speaks the response with sub-second latency.

5

Conversation continues

No server or webhook involved — the platform handles everything.

Voice capabilities

Both modes share the same low-latency engine:

CapabilityDescription
Streaming STTReal-time speech-to-text transcription
Streaming TTSSub-second text-to-speech synthesis
Barge-inCaller can interrupt the agent mid-sentence
BackchannelingNatural conversational cues (“uh-huh”, “right”)
Turn detectionSmart end-of-speech detection
Streaming responsesReturn NDJSON to start TTS on the first chunk

Webhook response format

For voice webhooks, your server must return a response telling the agent what to say.

Return Content-Type: application/x-ndjson with newline-delimited JSON chunks. TTS starts speaking on the very first chunk while your server continues processing.

{"text": "Let me check that for you.", "interim": true}
{"text": "Your order #4521 shipped yesterday via FedEx."}

Mark interim chunks with "interim": true — the final chunk (without interim) closes the turn. Use this for tool calls, LLM token forwarding, or any time your response takes more than ~1 second.

Simple response

Return a single JSON object for instant replies where no processing delay is expected.

1{
2 "text": "How can I help you?",
3 "voice": "Polly.Amy",
4 "hangup": false
5}

Response fields

FieldTypeDescription
textstringText to speak (also accepts say or message)
voicestringTTS voice identifier (optional, e.g. "Polly.Amy")
hangupbooleanSet to true to end the call after speaking
interimbooleanNDJSON only — marks a chunk as interim (TTS speaks it but the turn stays open)

Example: streaming handler (Python / FastAPI)

1from fastapi.responses import StreamingResponse
2import json, openai
3
4@app.post('/webhook')
5async def handle_voice(payload: dict):
6 if payload['channel'] != 'voice':
7 return Response(status_code=200)
8
9 history = payload.get('recentHistory', [])
10 context = "\n".join([
11 f"{'Customer' if h['direction'] == 'inbound' else 'Agent'}: {h['content']}"
12 for h in history
13 ])
14
15 async def generate():
16 yield json.dumps({"text": "One moment, let me check.", "interim": True}) + "\n"
17
18 stream = openai.chat.completions.create(
19 model="gpt-4",
20 stream=True,
21 messages=[
22 {"role": "system", "content": "You are a helpful phone agent."},
23 {"role": "user", "content": f"Conversation:\n{context}\n\nRespond."}
24 ]
25 )
26 full = ""
27 for chunk in stream:
28 delta = chunk.choices[0].delta.content or ""
29 full += delta
30 yield json.dumps({"text": full}) + "\n"
31
32 return StreamingResponse(generate(), media_type="application/x-ndjson")

Example: streaming handler (Node.js / Express)

1const OpenAI = require('openai');
2const openai = new OpenAI();
3
4app.post('/webhook', express.json(), async (req, res) => {
5 if (req.body.channel !== 'voice') return res.status(200).send('OK');
6
7 const history = req.body.recentHistory || [];
8 const context = history
9 .map(h => `${h.direction === 'inbound' ? 'Customer' : 'Agent'}: ${h.content}`)
10 .join('\n');
11
12 res.setHeader('Content-Type', 'application/x-ndjson');
13 res.write(JSON.stringify({ text: 'One moment, let me check.', interim: true }) + '\n');
14
15 const stream = await openai.chat.completions.create({
16 model: 'gpt-4',
17 stream: true,
18 messages: [
19 { role: 'system', content: 'You are a helpful phone agent.' },
20 { role: 'user', content: `Conversation:\n${context}\n\nRespond.` }
21 ]
22 });
23
24 let full = '';
25 for await (const chunk of stream) {
26 full += chunk.choices[0]?.delta?.content || '';
27 }
28 res.write(JSON.stringify({ text: full }) + '\n');
29 res.end();
30});

List calls

List all calls for this project.

GET /v1/calls

Query parameters

ParameterTypeRequiredDefaultDescription
limitintegerNo20Number of results to return (max 100)
offsetintegerNo0Number of results to skip (min 0)

Example

$curl -X GET "https://api.agentphone.to/v1/calls?limit=10&offset=0" \
> -H "Authorization: Bearer YOUR_API_KEY"
1{
2 "data": [
3 {
4 "id": "call_ghi012",
5 "agentId": "agt_abc123",
6 "phoneNumberId": "num_xyz789",
7 "phoneNumber": "+15551234567",
8 "fromNumber": "+15559876543",
9 "toNumber": "+15551234567",
10 "direction": "inbound",
11 "status": "completed",
12 "twilioCallSid": "CA1234567890abcdef",
13 "retellCallId": null,
14 "startedAt": "2025-01-15T14:00:00Z",
15 "endedAt": "2025-01-15T14:05:30Z",
16 "durationSeconds": 330,
17 "lastTranscriptSnippet": "Thank you for calling, goodbye!"
18 }
19 ],
20 "hasMore": false,
21 "total": 1
22}

Get call

Get details of a specific call, including its full transcript.

GET /v1/calls/{call_id}

Example

$curl -X GET "https://api.agentphone.to/v1/calls/call_ghi012" \
> -H "Authorization: Bearer YOUR_API_KEY"
1{
2 "id": "call_ghi012",
3 "agentId": "agt_abc123",
4 "phoneNumberId": "num_xyz789",
5 "phoneNumber": "+15551234567",
6 "fromNumber": "+15559876543",
7 "toNumber": "+15551234567",
8 "direction": "inbound",
9 "status": "completed",
10 "twilioCallSid": "CA1234567890abcdef",
11 "retellCallId": null,
12 "startedAt": "2025-01-15T14:00:00Z",
13 "endedAt": "2025-01-15T14:05:30Z",
14 "durationSeconds": 330,
15 "transcripts": [
16 {
17 "id": "tr_001",
18 "transcript": "Hello! Thanks for calling Acme Corp. How can I help you today?",
19 "confidence": 0.95,
20 "response": "Sure! Could you please provide your order number?",
21 "createdAt": "2025-01-15T14:00:05Z"
22 },
23 {
24 "id": "tr_002",
25 "transcript": "Hi, I'd like to check the status of my order.",
26 "confidence": 0.92,
27 "response": "Of course! Let me look that up for you.",
28 "createdAt": "2025-01-15T14:00:15Z"
29 }
30 ]
31}

Create outbound call

Initiate an outbound voice call from one of your agent’s phone numbers. The agent’s first assigned phone number is used as the caller ID.

POST /v1/calls

Request body

FieldTypeRequiredDescription
agentIdstringYesThe agent that will handle the call. Its first assigned phone number is used as caller ID.
toNumberstringYesThe phone number to call (E.164 format, e.g., "+15559876543")
initialGreetingstring or nullNoOptional greeting to speak when the recipient answers
voicestringNoVoice to use for speaking (default: "Polly.Amy")
systemPromptstring or nullNoWhen provided, uses a built-in LLM for the conversation instead of forwarding to your webhook.

Example

$curl -X POST "https://api.agentphone.to/v1/calls" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "agentId": "agt_abc123",
> "toNumber": "+15559876543",
> "initialGreeting": "Hi, this is Acme Corp calling about your recent order.",
> "systemPrompt": "You are a friendly support agent from Acme Corp."
> }'

List calls for number

List all calls associated with a specific phone number.

GET /v1/numbers/{number_id}/calls

Example

$curl -X GET "https://api.agentphone.to/v1/numbers/num_xyz789/calls?limit=10" \
> -H "Authorization: Bearer YOUR_API_KEY"