How to Analyze a Live Video Stream with AI in Under 10 Minutes
A step-by-step tutorial using Trio's stream API — from zero to insights
You can go from zero to AI-powered video analysis in under 10 minutes. No ML engineering background required. No GPU setup, no model training, no RTSP pipeline management. Just a camera feed, a few API calls, and you'll have an AI watching your stream and answering questions about what it sees.
This tutorial walks you through three progressively more useful setups: one-shot frame analysis, continuous monitoring with alerts, and periodic summary reports. By the end, you'll have a working system you can point at any camera feed.
What You'll Need
- A video stream URL (RTSP from an IP camera, a YouTube Live URL, or an HLS stream)
- A Trio API key (sign up at machinefi.com/try — free tier available)
- Python 3.8+ installed
- 10 minutes
Step 1: Install the SDK
pip install trio-streamThat's the only dependency. The SDK handles RTSP connection management, frame extraction, API communication, and webhook delivery internally.
Step 2: One-Shot Frame Analysis (Ask Mode)
The simplest way to start. You point Trio at a video stream and ask a question. It analyzes the current frame and responds in natural language.
# Import the Trio SDK
from trio_stream import Client
# Initialize the client
client = Client(api_key="your-api-key")
# Connect to a stream (RTSP, YouTube Live, or HLS)
stream = client.connect("rtsp://192.168.1.100:554/stream1")
# Ask a question about the current frame
result = stream.ask("How many people are in the frame? What are they doing?")
print(result.answer)
# Output: "There are 3 people visible. Two are standing at a
# workstation assembling components. One is walking toward
# the exit carrying a clipboard."
print(result.confidence) # 0.92
print(result.timestamp) # 2026-03-03T14:23:45ZThat's it. Three lines of meaningful code. The SDK handled RTSP negotiation, frame capture, image encoding, and the Vision LLM API call behind the scenes — all the infrastructure that makes the Video-to-LLM gap so painful to solve on your own.
Try it with a YouTube Live stream if you don't have an IP camera handy:
stream = client.connect("https://youtube.com/watch?v=your-live-stream-id")
result = stream.ask("Describe what's happening in this scene.")
print(result.answer)Step 3: Continuous Monitoring (Watch Mode)
One-shot analysis is useful for testing, but the real power is continuous monitoring. You define a condition, and Trio watches the stream 24/7. When the condition is met, it fires a webhook to your endpoint.
from trio_stream import Client
client = Client(api_key="your-api-key")
# Define what to watch for
monitor = client.monitor(
stream_url="rtsp://192.168.1.100:554/stream1",
condition="A person enters the restricted area marked by yellow floor tape",
webhook_url="https://your-server.com/api/alerts",
check_interval=5, # Check every 5 seconds
)
print(f"Monitoring started: {monitor.id}")
print(f"Status: {monitor.status}") # "active"When the condition triggers, your webhook receives:
{
"monitor_id": "mon_abc123",
"triggered_at": "2026-03-03T14:30:22Z",
"condition": "A person enters the restricted area marked by yellow floor tape",
"description": "A worker wearing a blue jacket entered the yellow-taped restricted area near Station 7. They appear to be retrieving a tool from the workbench inside the zone.",
"confidence": 0.94,
"frame_url": "https://api.machinefi.com/frames/frm_xyz789.jpg"
}Handling Alerts in Your Application
Here's a minimal Flask endpoint to receive and process alerts:
from flask import Flask, request, jsonify
app = Flask(__name__)
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
@app.route("/api/alerts", methods=["POST"])
def handle_alert():
alert = request.json
# Send to Slack — pipe alerts to your team channel
requests.post(SLACK_WEBHOOK, json={
"text": f"Safety Alert: {alert['description']}"
})
# Log to your database, trigger other workflows, etc.
return jsonify({"received": True})
if __name__ == "__main__":
app.run(port=8080)Step 4: Periodic Summaries (Summary Mode)
For use cases where you don't need instant alerts but want regular reports — shift summaries, daily activity logs, or compliance audits — use summary mode.
from trio_stream import Client
client = Client(api_key="your-api-key")
# Generate summaries every hour
summary_job = client.summarize(
stream_url="rtsp://192.168.1.100:554/stream1",
interval="1h",
focus="Worker activity levels, equipment usage, and any safety observations",
webhook_url="https://your-server.com/api/summaries",
)
print(f"Summary job started: {summary_job.id}")The webhook receives hourly natural-language summaries:
{
"summary_id": "sum_def456",
"period_start": "2026-03-03T14:00:00Z",
"period_end": "2026-03-03T15:00:00Z",
"summary": "Moderate activity observed. Assembly Station 2 active for 45 min. Forklift made 3 transport runs. One near-miss at 14:47.",
"key_events": [
{"time": "14:32", "event": "Station 2 idle"},
{"time": "14:47", "event": "Near-miss in forklift lane"}
]
}Step 5: Putting It Together
In production, most deployments use all three modes across different cameras and use cases:
What to Try Next
Now that you have the basics working:
- Add multiple cameras. Each
client.monitor()call creates an independent watcher. You can monitor dozens of streams with different conditions. If you're scaling beyond a handful of cameras, read our build vs. buy analysis to understand when an API makes sense vs. custom infrastructure. - Combine with your existing systems. Pipe alerts to Slack, PagerDuty, or your incident management system. Send summaries to your operations dashboard.
- Experiment with conditions. Natural language conditions are flexible. Try: "The parking lot is more than 80% full," "A delivery truck is parked at the loading dock," or "The queue at register 3 has more than 5 people." For inspiration, see our roundup of 5 real-world video AI applications running in production today.
- Explore edge deployment. For latency-sensitive use cases, Trio supports on-device inference that processes video locally without sending frames to the cloud.
Keep Reading
- The Video-to-LLM Gap — Understand the infrastructure challenge this tutorial skips: RTSP handling, frame selection, and 24/7 reliability.
- Build vs. Buy: Should You Build Your Own Video Analytics Pipeline? — When to keep using an API vs. building custom infrastructure as you scale.
- 5 Real-World Applications of Real-Time Video AI — See how warehouses, factories, and farms are using the same techniques from this tutorial in production.