How to Analyze a Live Video Stream with AI: 10-Minute Tutorial

You can go from zero to AI-powered video analysis in under 10 minutes. No ML engineering background required. No GPU setup, no model training, no RTSP pipeline management. Just a camera feed, a few API calls, and you'll have an AI watching your stream and answering questions about what it sees.

This tutorial walks you through three progressively more useful setups: one-shot frame analysis, continuous monitoring with alerts, and periodic summary reports. By the end, you'll have a working system you can point at any camera feed.

What You'll Need

A video stream URL (RTSP from an IP camera, a YouTube Live URL, or an HLS stream)
A Trio API key (sign up at machinefi.com/try — free tier available)
Python 3.8+ installed
10 minutes

Step 1: Install the SDK

pip install trio-stream

That's the only dependency. The SDK handles RTSP connection management, frame extraction, API communication, and webhook delivery internally.

Step 2: One-Shot Frame Analysis (Ask Mode)

The simplest way to start. You point Trio at a video stream and ask a question. It analyzes the current frame and responds in natural language.

one_shot.py

# Import the Trio SDK
from trio_stream import Client
 
# Initialize the client
client = Client(api_key="your-api-key")
 
# Connect to a stream (RTSP, YouTube Live, or HLS)
stream = client.connect("rtsp://192.168.1.100:554/stream1")
 
# Ask a question about the current frame
result = stream.ask("How many people are in the frame? What are they doing?")
 
print(result.answer)
# Output: "There are 3 people visible. Two are standing at a
# workstation assembling components. One is walking toward
# the exit carrying a clipboard."
 
print(result.confidence)  # 0.92
print(result.timestamp)   # 2026-03-03T14:23:45Z

That's it. Three lines of meaningful code. The SDK handled RTSP negotiation, frame capture, image encoding, and the Vision LLM API call behind the scenes — all the infrastructure that makes the Video-to-LLM gap so painful to solve on your own.

Try it with a YouTube Live stream if you don't have an IP camera handy:

stream = client.connect("https://youtube.com/watch?v=your-live-stream-id")
result = stream.ask("Describe what's happening in this scene.")
print(result.answer)

Step 3: Continuous Monitoring (Watch Mode)

One-shot analysis is useful for testing, but the real power is continuous monitoring. You define a condition, and Trio watches the stream 24/7. When the condition is met, it fires a webhook to your endpoint.

monitor.py

from trio_stream import Client
 
client = Client(api_key="your-api-key")
 
# Define what to watch for
monitor = client.monitor(
    stream_url="rtsp://192.168.1.100:554/stream1",
    condition="A person enters the restricted area marked by yellow floor tape",
    webhook_url="https://your-server.com/api/alerts",
    check_interval=5,  # Check every 5 seconds
)
 
print(f"Monitoring started: {monitor.id}")
print(f"Status: {monitor.status}")  # "active"

When the condition triggers, your webhook receives:

Webhook payload

{
  "monitor_id": "mon_abc123",
  "triggered_at": "2026-03-03T14:30:22Z",
  "condition": "A person enters the restricted area marked by yellow floor tape",
  "description": "A worker wearing a blue jacket entered the yellow-taped restricted area near Station 7. They appear to be retrieving a tool from the workbench inside the zone.",
  "confidence": 0.94,
  "frame_url": "https://api.machinefi.com/frames/frm_xyz789.jpg"
}

Handling Alerts in Your Application

Here's a minimal Flask endpoint to receive and process alerts:

webhook_handler.py

from flask import Flask, request, jsonify
 
app = Flask(__name__)
 
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
 
@app.route("/api/alerts", methods=["POST"])
def handle_alert():
    alert = request.json
 
    # Send to Slack — pipe alerts to your team channel
    requests.post(SLACK_WEBHOOK, json={
        "text": f"Safety Alert: {alert['description']}"
    })
 
    # Log to your database, trigger other workflows, etc.
    return jsonify({"received": True})
 
if __name__ == "__main__":
    app.run(port=8080)

Step 4: Periodic Summaries (Summary Mode)

For use cases where you don't need instant alerts but want regular reports — shift summaries, daily activity logs, or compliance audits — use summary mode.

summary.py

from trio_stream import Client
 
client = Client(api_key="your-api-key")
 
# Generate summaries every hour
summary_job = client.summarize(
    stream_url="rtsp://192.168.1.100:554/stream1",
    interval="1h",
    focus="Worker activity levels, equipment usage, and any safety observations",
    webhook_url="https://your-server.com/api/summaries",
)
 
print(f"Summary job started: {summary_job.id}")

The webhook receives hourly natural-language summaries:

Summary payload

{
  "summary_id": "sum_def456",
  "period_start": "2026-03-03T14:00:00Z",
  "period_end": "2026-03-03T15:00:00Z",
  "summary": "Moderate activity observed. Assembly Station 2 active for 45 min. Forklift made 3 transport runs. One near-miss at 14:47.",
  "key_events": [
    {"time": "14:32", "event": "Station 2 idle"},
    {"time": "14:47", "event": "Near-miss in forklift lane"}
  ]
}

Step 5: Putting It Together

In production, most deployments use all three modes across different cameras and use cases:

Source: Trio API documentation

What to Try Next

Now that you have the basics working:

Add multiple cameras. Each client.monitor() call creates an independent watcher. You can monitor dozens of streams with different conditions. If you're scaling beyond a handful of cameras, read our build vs. buy analysis to understand when an API makes sense vs. custom infrastructure.
Combine with your existing systems. Pipe alerts to Slack, PagerDuty, or your incident management system. Send summaries to your operations dashboard.
Experiment with conditions. Natural language conditions are flexible. Try: "The parking lot is more than 80% full," "A delivery truck is parked at the loading dock," or "The queue at register 3 has more than 5 people." For inspiration, see our roundup of 5 real-world video AI applications running in production today.
Explore edge deployment. For latency-sensitive use cases, Trio supports on-device inference that processes video locally without sending frames to the cloud.

Keep Reading

The Video-to-LLM Gap — Understand the infrastructure challenge this tutorial skips: RTSP handling, frame selection, and 24/7 reliability.
Build vs. Buy: Should You Build Your Own Video Analytics Pipeline? — When to keep using an API vs. building custom infrastructure as you scale.
5 Real-World Applications of Real-Time Video AI — See how warehouses, factories, and farms are using the same techniques from this tutorial in production.