YYantrix
Applied AI · Vision-guided robotics

Vision-guided bin picking at 80 ms end-to-end, case study.

A fulfilment client needed reliable bin picking without pose fixtures. We delivered a YOLOv11-Seg + 3D pose stack running on Jetson Orin Nano, fully integrated with their MoveIt motion planner, hitting sub-80 ms decision latency.

Vision-guided bin picking robotic cell with detection overlay

Overview

Why this study matters

Yantrix built a production vision stack that lets a 6-DOF arm pick randomly oriented SKUs out of a cluttered bin — running entirely on an edge device.

Project Type: Applied AI + Robotic Manipulation

Industry: Warehouse automation

Service Used: Computer Vision + ROS 2 Integration

Objective

What the project needed to achieve

  • Detect and segment random-pose parts inside cluttered bins
  • Estimate 6-DOF pick pose for a parallel-jaw gripper
  • Run the full perception stack on embedded hardware at the cell
  • Integrate with existing MoveIt motion planning with zero PLC changes

Challenge

Engineering constraint

The client was operating a robotic cell that required fixed-pose presentation jigs for every SKU. Throughput was capped by manual part-staging and changeover. They needed vision-based picking that could generalize across SKUs without retooling and run on the edge — no cloud round-trips allowed on the production floor.

Deliverables

What the client receives

  • Trained and quantized vision model with reproducible training pipeline
  • ROS 2 perception package and MoveIt integration
  • Camera, lens, and lighting specification for the cell
  • Benchmark report: accuracy per class, latency distribution, failure modes
  • Retraining playbook so the client can extend to new SKUs themselves

Visual results

Key simulation and design views

YOLOv11-Seg detection and segmentation overlay on bin contents

Detection + mask overlay

ROS 2 action-server integration diagram

ROS 2 integration

Jetson Orin Nano running the perception stack at the cell

Jetson Orin deployment

Approach

How Yantrix approached the work

  • Collected and labelled a dataset of the client's top 40 SKUs inside representative bin clutter, then fine-tuned a YOLOv11-Seg detector with rotation and occlusion augmentation.
  • Layered a depth-based pose-estimation stage on top of 2D masks using the ZED 2i stereo camera, filtering picks by graspability (approach angle, jaw clearance, surface normal).
  • Quantized the detector to FP16 and exported through TensorRT targeting Jetson Orin Nano; benchmarked camera-to-command latency under realistic lighting.
  • Exposed the perception stack as a ROS 2 action server so the existing MoveIt planner could request picks without any downstream refactor.

Outcome

What improved by the end

  • Sub-80 ms end-to-end decision latency (capture -> model -> grasp command)
  • 99.2% detection accuracy across the labelled SKU set
  • False-pick rate reduced to 1.4 per 1,000 attempts under production lighting
  • Eliminated the need for per-SKU staging jigs — changeover now data-only
  • Fully edge-deployed — zero production cloud dependencies

Tools used

  • Ultralytics YOLOv11-Seg
  • PyTorch + TensorRT (FP16)
  • ROS 2 Humble + MoveIt 2
  • ZED 2i stereo camera
  • NVIDIA Jetson Orin Nano 8GB
  • Roboflow for dataset ops

Impact

  • Cell throughput up by ~40% vs. fixed-pose baseline
  • Operator labor reallocated away from part-staging
  • Extensibility to new SKUs without mechanical changes

Conclusion

The stack shows what becomes possible when vision, control, and hardware are designed as one system rather than handed across vendors. It's a playbook we re-use for any vision-guided manipulation project.

Next step

Have a robotic cell bottlenecked by manual staging, fixed jigs, or cloud-dependent vision? Let's talk about bringing the perception on-device.

Let's build

Have a machine to build? Let's scope it together.

Tell us about your project. We'll respond within 1-2 business days with a preliminary scope and timeline — no boilerplate, no up-sell.