Applied AI · Vision-guided robotics
Vision-guided bin picking at 80 ms end-to-end
How a YOLOv11-Seg + 3D-pose stack on a Jetson Orin Nano replaced fixed-pose jigs in a 6-DOF robotic cell — sub-80 ms latency, 99.2% accuracy, 40% throughput gain.
Yantrix builds AI that ships with a product, not AI that lives in a slide deck. We design, train, fine-tune, optimize, and deploy machine-learning systems for Indian engineering teams, hardware startups, B2B SaaS, and product companies — covering the full modern stack: GenAI and RAG, LLM fine-tuning, feature engineering, classical ML, computer vision, and edge AI. Strong opinions on what to put in production, and the operational discipline to keep it there.

What we do
We deliver end-to-end AI and machine-learning services in India across seven capability areas. (1) GenAI applications — multimodal copilots, content generation, structured-output agents, and tool-using assistants built on GPT-4, Claude, Gemini, Llama 3, Qwen2, and Mistral. (2) Retrieval-Augmented Generation (RAG) — production-grade systems with hybrid search (BM25 + dense embeddings), neural rerankers, agentic query decomposition, graph-RAG over knowledge graphs, evaluation harnesses, and grounded-citation pipelines. (3) LLM fine-tuning — LoRA, QLoRA, DoRA, and full PEFT pipelines on Hugging Face TRL, Axolotl, and Unsloth. Instruction tuning, DPO, and RLAIF for domain adaptation on a single GPU. (4) Feature engineering and classical ML — sensor signal processing, tabular pipelines, time-series feature extraction, materialized feature stores (Feast, Tecton), and XGBoost / LightGBM / CatBoost models that beat deep nets when the data shape says so. (5) Computer vision and edge AI — object detection, segmentation, OCR, pose estimation, deployed on Jetson, Coral, ESP32-S3, and Hailo with TensorRT / ONNX / TFLite optimization. (6) MLOps — model registries, signed-OTA rollouts, canary deployments, drift monitoring, retraining pipelines, evaluation dashboards. (7) ML-accelerated engineering — surrogate FEA / CFD, physics-informed neural networks (PINNs), and generative design. We ship binaries, firmware, and documentation — not just notebooks.
We adapt the same engineering service to different product contexts depending on the load case, packaging problem, validation target, or deployment environment.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
Relevant when the project needs focused ai & machine learning services in india support.
These links help visitors move from service intent to real examples of engineering work.
Applied AI · Vision-guided robotics
How a YOLOv11-Seg + 3D-pose stack on a Jetson Orin Nano replaced fixed-pose jigs in a 6-DOF robotic cell — sub-80 ms latency, 99.2% accuracy, 40% throughput gain.
Edge AI · On-device inspection
A production conveyor inspection camera running a quantized INT8 CNN entirely on an ESP32-S3 — 18 FPS at 0.4 W, no cloud, 6× lower capex per station.
ML-accelerated engineering
A physics-informed neural network trained on 12,000 ANSYS runs replaces the full solver for early-stage topology — predicts stress fields in 40 ms vs. 22-minute solves.
Technical articles give Google more paths into the service pages and help visitors explore adjacent engineering questions before they get in touch.
3D Printing
Learn how 3D printing services help startups and manufacturers in India validate CAD designs, reduce prototyping cost, and build functional parts faster.
Applied AI
Walkthrough of shipping a segmentation-class YOLOv11 model to a Jetson Orin Nano at production latency — quantization, TensorRT conversion, and the pitfalls.
Simulation
How CFD-based thermal analysis catches hotspots, airflow dead zones, and IP67-versus-cooling trade-offs in electronics enclosures before the first prototype ships.
Service-specific questions are useful for both users and search visibility around intent-driven queries.
Seven capability areas: GenAI applications, Retrieval-Augmented Generation (RAG), LLM fine-tuning (LoRA / QLoRA / DoRA / PEFT), feature engineering and classical ML, computer vision and edge AI, MLOps, and ML-accelerated engineering simulation. We deliver across the full lifecycle — data strategy through production deployment and monitoring.
Yes. We build production-grade RAG systems with hybrid search (BM25 + dense embeddings), neural rerankers, agentic query decomposition, graph-RAG over knowledge graphs, and grounded-citation pipelines. Every engagement includes a retrieval evaluation harness (RAGAS metrics) and a faithfulness check — because 40–60% of RAG projects fail to reach production without these. Common engagements: customer-support copilots, engineering-documentation search, sales enablement, and compliance Q&A.
Yes. We run LoRA, QLoRA, and DoRA fine-tuning pipelines on Llama 3, Qwen2 (text and VL), Mistral, Mixtral, and Phi-3 — typically on a single A100 or RTX 4090 with Axolotl, Unsloth, or the Hugging Face TRL stack. Outputs include adapter artifacts, evaluation reports against your domain benchmark, and serving guidance (multi-adapter on one GPU when the use case fits).
Common project shapes and price bands in 2026: GenAI / RAG pilot (one corpus, one use case): ₹4–9 lakh, 6–10 weeks. Production RAG with evaluation harness and MLOps: ₹15–35 lakh, 4–6 months. LLM fine-tuning program (data prep + LoRA training + serving): ₹8–22 lakh, 8–14 weeks. Computer vision pilot: ₹3–8 lakh. Production CV deployment with MLOps: ₹15–30 lakh. ML surrogate FEA / CFD: ₹15–40 lakh. Edge AI MLOps platform (multi-device fleet): ₹25–80 lakh. Team-augmentation retainers scoped monthly.
Use RAG when the answer lives in a body of documents that change over time — customer support, knowledge bases, compliance, engineering documentation. Use fine-tuning when you need the model to adopt a specific style, format, or domain vocabulary it can't pick up from in-context examples — structured-output generation, code in a proprietary API, brand voice, low-resource languages. The two are complementary: many production systems use both — a fine-tuned generator on top of a RAG retriever.
Yes. MLOps is a stand-alone engagement category: model registries (MLflow, SageMaker), feature stores (Feast, Tecton), CI/CD for ML, signed-OTA deployments on edge fleets, canary rollouts with automatic rollback, drift detection, and evaluation dashboards. Typical engagement: 8–16 weeks to take an existing ML system from ad-hoc operations to a documented, observable, retrainable pipeline.
Yes. Time-series, signal-processing, and tabular feature pipelines are a recurring engagement — predictive maintenance, demand forecasting, churn, anomaly detection, energy management. Outputs are documented feature definitions materialized to a feature store (Feast or Tecton), plus the downstream model that uses them. We default to gradient boosting (XGBoost, LightGBM, CatBoost) for tabular and switch to deep models only when the data shape justifies it.
Yes — vision-based quality inspection, defect detection, OCR for batch tracking, pose estimation for robotic pick-and-place, predictive maintenance, energy forecasting, and inspection-report GenAI are recurring projects. We deploy on-premise where data sovereignty or latency matters, which is the norm for Indian factory floors.
Yes. Model optimization is a core service — quantization (INT8 / FP16), pruning, TensorRT / ONNX conversion, hardware-specific operator fusion, and benchmarking against latency / throughput / power targets. We also optimize LLM serving with vLLM, TGI, and llama.cpp for self-hosted inference.
Yes. NDAs are routine on all our AI / ML work, including for confidential product datasets, vision footage, training corpora, and proprietary model architectures. We can sign your template or use ours.
Our core team is in Surat, Gujarat. We work remotely with clients across India (Mumbai, Bangalore, Delhi, Pune, Hyderabad, Chennai, Ahmedabad) and internationally (US, EU, UK, UAE, Singapore). Client communication is over Slack / email / Google Meet — we don't require on-site presence for most engagements.
Yes. We offer team augmentation engagements where Yantrix engineers join your team for a quarter or longer to drive ML strategy, build pipelines, and ramp up your in-house team. Scoped on a monthly retainer.
Send the problem, your current design stage, and any existing files. We can scope the work from there.