YYantrix
Applied AI

Deploying YOLOv11 to Jetson Orin Nano at 30 FPS, explained simply.

Most "deploy YOLO to edge" tutorials stop at a notebook on a laptop. This post is about what actually happens when you ship the model to a Jetson Orin Nano that's sitting on a robot cell — the steps, the surprises, and the numbers we hit.

YOLOv11 detection overlay running on a Jetson Orin Nano

Core idea

What this blog covers

The gap between a model that works in PyTorch on a dev machine and a model that runs reliably at sub-50 ms latency on an edge device is where most AI projects quietly die. The model is the easy part; quantization, runtime, and memory budget are where production pressure shows up.

Main discussion

Why YOLOv11 on Jetson Orin Nano

For vision-guided robotics and industrial inspection, YOLOv11 hits a sweet spot — it's accurate enough for production, small enough to quantize well, and has first-class support for segmentation, which matters when you're picking a part from a cluttered bin. Jetson Orin Nano (8 GB) gives you real CUDA + TensorRT on a module that costs about as much as a mid-range GPU and fits in the cell — no PC, no cloud.

Training with deployment in mind

The first decision that actually matters isn't about the model — it's about input resolution. Going from 640 x 640 to 320 x 320 roughly quarters inference time and usually costs only a few points of mAP on short-range inspection. Pick the smallest input size your worst-case object still survives at, then train there. This is the most common step teams skip and regret later.

ONNX export and TensorRT conversion

Export the trained model to ONNX with opset 17+, then convert via trtexec with --fp16. On Jetson Orin Nano we consistently see a 3-4x speed-up versus PyTorch runtime and a 1.8-2.2x speed-up versus plain ONNX Runtime on the same hardware. Keep the TensorRT engine file versioned alongside the model — engines aren't portable across Jetson variants or JetPack versions.

INT8 quantization — when it's worth it

INT8 gets you another ~1.6x over FP16 on Orin, but requires a calibration dataset that covers the edge cases your model will see at runtime. Skip calibration and your accuracy falls off a cliff on corner classes. We typically ship FP16 unless we genuinely need the extra headroom — it's the better risk-adjusted choice.

The hidden cost: preprocessing and NMS

A YOLOv11 forward pass on Orin might be 18 ms in FP16 — but the camera capture, resize, color-space conversion, NMS, and mask decoding can easily add another 30-50 ms if you implement them in Python. Move preprocessing to CUDA via the Jetson VPI or at minimum to OpenCV built with CUDA support. Do NMS on GPU. This is where most shipped pipelines lose their latency budget.

Monitoring and model updates in production

Once the model is on the device, you need a ring buffer of borderline detections (confidence near the decision threshold) getting uploaded for offline review. Without it, you don't learn what's changing in the real world. We also version every deployed engine with git SHA + dataset hash and sign OTA artifacts so a rollback is one command.

Key takeaways

What readers should remember

  • Train the model you can ship — pick an architecture and input size that respects the target hardware from day one.
  • Quantize thoughtfully — FP16 is usually enough on Jetson; INT8 requires calibration and a real validation pass.
  • Benchmark the full camera -> decision loop, not just inference. Preprocessing and post-processing usually dominate.
  • Plan the model update path (OTA, versioning, rollback) before you ship the first one.
Let's build

Have a machine to build? Let's scope it together.

Tell us about your project. We'll respond within 1-2 business days with a preliminary scope and timeline — no boilerplate, no up-sell.