Getting Started with the Trio Stream API: A Developer's Guide
Connect a camera feed, ask questions, and get AI-powered answers in minutes
The hardest part of building with live video AI used to be everything that happened before you wrote a single line of business logic: standing up frame-extraction pipelines, stitching together vision models, managing GPU infrastructure, and somehow getting answers fast enough to act on them. The Trio Stream API collapses all of that into a few HTTP calls. This guide walks you through the entire journey — from creating an API key to parsing your first AI-generated answer from a real camera feed.
- Trio Stream API
The Trio Stream API is a multimodal inference interface that accepts live video, audio, and sensor streams as input and returns structured, natural-language AI responses. It handles frame sampling, vision-language model routing, and output formatting as managed infrastructure, so developers can query a camera feed the same way they would query a REST endpoint — without building or maintaining any ML pipeline.
Prerequisites
Before you write any code, make sure you have the following in place:
- A Trio account with an active API key. Sign up at machinefi.com — the Starter tier is free and includes 10,000 API calls per month.
- Python 3.9 or later installed on your machine.
- A camera feed URL in RTSP, WebRTC, HLS, or plain HTTP MJPEG format. If you don't have a physical camera handy, you can use a public test stream or a local webcam exposed via FFmpeg.
- Basic familiarity with Python virtual environments and
pip.
That's it. You don't need a GPU, a Kubernetes cluster, or any computer-vision background. The API handles all of that on Trio's infrastructure.
Installation
Create a fresh virtual environment and install the Trio SDK:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install trio-sdk python-dotenvNext, create a .env file in your project root and add your API key:
TRIO_API_KEY=sk_live_your_key_hereConnecting Your First Stream
With the SDK installed and your key loaded, connecting a camera feed is three lines of Python:
import os
import trio_sdk
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = trio_sdk.Client(api_key=os.environ["TRIO_API_KEY"])
# Connect a camera stream — RTSP, HLS, WebRTC, or HTTP MJPEG
stream = client.streams.connect(
url="rtsp://camera.local/live",
label="front-entrance", # optional human-readable name
region="us-east-1", # optional: route to nearest inference node
)
print(f"Stream connected: {stream.id}")
print(f"Status: {stream.status}")The connect() call registers your stream with Trio's ingestion layer. The SDK validates the URL, negotiates the transport protocol, and returns a Stream object with a stable stream.id you can reference in all subsequent calls. The stream stays active until you explicitly disconnect it or it times out due to inactivity.
50ms
median end-to-end latency from frame capture to API response on Trio's managed inference infrastructure
Asking Questions
Once you have a connected stream, you can ask it anything with the ask() method:
import os
import trio_sdk
from dotenv import load_dotenv
load_dotenv()
client = trio_sdk.Client(api_key=os.environ["TRIO_API_KEY"])
stream = client.streams.connect(url="rtsp://camera.local/live")
# Ask a natural-language question about the current frame
response = stream.ask("How many people are currently in frame?")
print(response.answer) # "3 people are visible in the frame."
print(response.confidence) # 0.94
print(response.latency_ms) # 48The ask() method captures a frame (or a short clip if temporal context is needed), runs it through Trio's vision-language routing layer, and returns a structured Response object synchronously. For most simple questions, you'll get an answer in well under 100 milliseconds.
You can ask follow-up questions in the same session to maintain context:
# Follow-up questions maintain context within the session
response2 = stream.ask("Are any of them wearing high-visibility vests?")
print(response2.answer) # "Yes, 2 of the 3 people are wearing hi-vis vests."
# Structured JSON output for downstream processing
response3 = stream.ask(
"List each person's approximate location in the frame.",
output_format="json",
)
print(response3.data)
# [{"person": 1, "location": "left foreground"},
# {"person": 2, "location": "center background"},
# {"person": 3, "location": "right midground"}]Handling Responses
Every ask() call returns a Response object with a consistent schema. Here is the full set of fields you can access:
response = stream.ask("Describe the scene.")
# Core fields
print(response.answer) # Natural-language answer string
print(response.confidence) # Float 0.0–1.0 model confidence score
print(response.latency_ms) # Round-trip time in milliseconds
print(response.frame_ts) # UTC timestamp of the captured frame
print(response.stream_id) # ID of the source stream
print(response.request_id) # Unique ID for this inference request
# Optional fields (present when output_format="json")
print(response.data) # Parsed dict or list from JSON output
print(response.raw_json) # Raw JSON string before parsing
# Metadata
print(response.model) # Which vision-language model was used
print(response.tokens_used) # Token count for billing/monitoringFor high-throughput or event-driven architectures, use the streaming or webhook modes rather than polling ask() in a loop. The streaming response mode emits tokens progressively — useful when you're rendering answers to a dashboard in real time. The webhook mode pushes results to your endpoint as they arrive, with no open connection required on your side.
Advanced Features
Multi-Stream Sessions
Trio supports querying multiple camera feeds within a single session. This is useful when you need to correlate observations across cameras — for example, tracking a person moving between zones in a warehouse:
import os
import trio_sdk
from dotenv import load_dotenv
load_dotenv()
client = trio_sdk.Client(api_key=os.environ["TRIO_API_KEY"])
# Connect multiple streams
entrance = client.streams.connect(url="rtsp://cam1.local/live", label="entrance")
floor_a = client.streams.connect(url="rtsp://cam2.local/live", label="floor-a")
floor_b = client.streams.connect(url="rtsp://cam3.local/live", label="floor-b")
# Create a session to query them together
session = client.sessions.create(streams=[entrance, floor_a, floor_b])
# Ask a cross-camera question
result = session.ask(
"Is anyone present in all three zones at the same time?"
)
print(result.answer)
# "No. 2 people are in Floor A, 1 person is at the Entrance. Floor B is empty."Webhook Subscriptions
For production pipelines that need to react to events without polling, use stream.subscribe() to push answers to your endpoint:
# Subscribe to continuous inference on a trigger condition
subscription = stream.subscribe(
question="Alert me if any person enters the restricted zone.",
webhook_url="https://your-app.com/api/trio-events",
confidence_threshold=0.85, # Only fire if model is >85% confident
cooldown_seconds=30, # Don't re-fire within 30s of last alert
)
print(f"Subscription active: {subscription.id}")When the condition is met, Trio posts a signed JSON payload to your webhook URL. Verify the signature using the X-Trio-Signature header and your webhook secret from the dashboard.
Next Steps
You now have everything you need to build a working Trio integration. Here is where to go next depending on what you're building:
- If you're evaluating Trio against a self-hosted pipeline, read our Build vs. Buy: Video Analytics Pipeline breakdown. It compares total cost of ownership, time to first inference, and maintenance burden side by side.
- If you want to understand the infrastructure behind the API, How to Analyze a Live Video Stream with AI walks through the full architecture from camera to answer.
- If you're hitting the limits of frame-by-frame queries, The Video-to-LLM Gap explains how Trio handles temporal reasoning across clips — and why that matters for complex detection tasks.
The Trio SDK reference docs, rate limit tables, and error code glossary are available at docs.machinefi.com. For questions, the #developers channel in the Trio Discord community is the fastest path to an answer from the team.
Keep Reading
- How to Analyze a Live Video Stream with AI — A deep dive into the full pipeline architecture behind real-time video AI, from frame extraction to model inference to structured output.
- Build vs. Buy: Video Analytics Pipeline — An honest cost and complexity comparison between building your own video AI stack and using a managed API like Trio.
- The Video-to-LLM Gap — Why standard LLMs can't process video directly, and how Trio bridges the gap between live camera feeds and language model intelligence.