Applied AI

Deploying YOLOv11 to Jetson Orin Nano at 30 FPS, explained simply.

Most "deploy YOLO to edge" tutorials stop at a notebook on a laptop. This post is about what actually happens when you ship the model to a Jetson Orin Nano that's sitting on a robot cell — the steps, the surprises, and the numbers we hit.

By Yantrix Engineering · Edge AI StudioPublished April 28, 2026Updated May 12, 20262 min read

YOLOv11 detection overlay running on a Jetson Orin Nano

Core idea

What this blog covers

The gap between a model that works in PyTorch on a dev machine and a model that runs reliably at sub-50 ms latency on an edge device is where most AI projects quietly die. The model is the easy part; quantization, runtime, and memory budget are where production pressure shows up.

Back to all blogs Talk to Yantrix

Main discussion

Why YOLOv11 on Jetson Orin Nano

For vision-guided robotics and industrial inspection, YOLOv11 hits a sweet spot — it's accurate enough for production, small enough to quantize well, and has first-class support for segmentation, which matters when you're picking a part from a cluttered bin. Jetson Orin Nano (8 GB) gives you real CUDA + TensorRT on a module that costs about as much as a mid-range GPU and fits in the cell — no PC, no cloud.

Training with deployment in mind

The first decision that actually matters isn't about the model — it's about input resolution. Going from 640 x 640 to 320 x 320 roughly quarters inference time and usually costs only a few points of mAP on short-range inspection. Pick the smallest input size your worst-case object still survives at, then train there. This is the most common step teams skip and regret later.

ONNX export and TensorRT conversion

Export the trained model to ONNX with opset 17+, then convert via trtexec with --fp16. On Jetson Orin Nano we consistently see a 3-4x speed-up versus PyTorch runtime and a 1.8-2.2x speed-up versus plain ONNX Runtime on the same hardware. Keep the TensorRT engine file versioned alongside the model — engines aren't portable across Jetson variants or JetPack versions.

INT8 quantization — when it's worth it

INT8 gets you another ~1.6x over FP16 on Orin, but requires a calibration dataset that covers the edge cases your model will see at runtime. Skip calibration and your accuracy falls off a cliff on corner classes. We typically ship FP16 unless we genuinely need the extra headroom — it's the better risk-adjusted choice.

The hidden cost: preprocessing and NMS

A YOLOv11 forward pass on Orin might be 18 ms in FP16 — but the camera capture, resize, color-space conversion, NMS, and mask decoding can easily add another 30-50 ms if you implement them in Python. Move preprocessing to CUDA via the Jetson VPI or at minimum to OpenCV built with CUDA support. Do NMS on GPU. This is where most shipped pipelines lose their latency budget.

Monitoring and model updates in production

Once the model is on the device, you need a ring buffer of borderline detections (confidence near the decision threshold) getting uploaded for offline review. Without it, you don't learn what's changing in the real world. We also version every deployed engine with git SHA + dataset hash and sign OTA artifacts so a rollback is one command.

Tagged

YOLOv11
Jetson Orin Nano
TensorRT
Edge AI
Quantization

Key takeaways

What readers should remember

Train the model you can ship — pick an architecture and input size that respects the target hardware from day one.
Quantize thoughtfully — FP16 is usually enough on Jetson; INT8 requires calibration and a real validation pass.
Benchmark the full camera -> decision loop, not just inference. Preprocessing and post-processing usually dominate.
Plan the model update path (OTA, versioning, rollback) before you ship the first one.

Frequently asked questions

Answers from the work itself.

How fast can YOLOv11 run on a Jetson Orin Nano?

On the 8 GB Orin Nano with FP16 + TensorRT, YOLOv11s at 640×640 lands at 28–35 FPS end-to-end including capture, NMS, and mask decoding. INT8 pushes that to 45–55 FPS at a 2–3 percent mAP cost.

Is FP16 or INT8 quantization better for YOLOv11 on Jetson?

FP16 is the safe default — a sub-1 percent mAP drop for a 2–3× speedup. INT8 needs a representative calibration set and can lose 3–5 percent mAP on corner classes. Ship FP16 unless you genuinely need the extra throughput.

Can a TensorRT engine built on one Jetson run on another?

No. TensorRT engines are pinned to the specific SoC and JetPack version. Build on the deployment device as part of first-boot, or maintain a build per Jetson variant.

Keep going on applied ai.

AI ML services in India — computer vision pipeline running on Jetson Orin Nano

Applied AI

AI & ML Services in India: 2026 Buyer's Guide for Engineering Teams

A practical 2026 guide to sourcing AI and ML services in India — what's available, realistic project costs, when to outsource vs hire, and how to evaluate Indian AI vendors.

Read article

Physics-Informed Neural Network predicting stress field — PINN for FEA workflow integration

ML-Accelerated Engineering

Physics-Informed Neural Networks (PINNs) for FEA: A 2026 Practitioner's Guide

What PINNs actually are, where they beat classical FEA, where they don't, and how to integrate them with SolidWorks and ANSYS workflows — without the academic-paper handwaving.

Read article

Computer vision development in India — YOLO detection running on Indian factory line

Applied AI

Computer Vision Development Services in India: Detection, Segmentation, OCR

A practical guide to computer vision development services in India — what use cases work, how vendors price, and what production deployment really looks like.

Read article

Continue exploring

Related services and proof points

Cross-links help readers move from educational content into service pages and real project examples, which also makes the crawl path stronger for Google.

Service pages

Case studies

Let's build

Have a machine to build? Let's scope it together.

Tell us about your project. We'll respond within 1-2 business days with a preliminary scope and timeline — no boilerplate, no up-sell.

Start your project View full portfolio