Real-Time Object Detection for Embedded Vision Systems: Architectural Comparison of YOLO, SSD, and MobileNet-SSD on NVIDIA Jetson and Raspberry Pi Platforms
Real-time object detection on embedded vision platforms -- required for applications including autonomous mobile robots, industrial quality inspection, and smart camera systems -- demands neural network architectures that balance detection accuracy, inference latency, and power consumption within the constraints of embedded hardware. This paper presents a comprehensive empirical evaluation of three real-time detection architecture families -- YOLOv3, Single Shot Detector (SSD), and MobileNet-SSD -- on two representative embedded platforms: NVIDIA Jetson Nano and Raspberry Pi 4B with Coral USB Accelerator. Each architecture is evaluated under five optimization conditions: FP32 baseline, FP16 mixed precision, INT8 post-training quantization, INT8 quantization-aware training, and TensorRT engine optimization. On Jetson Nano, YOLOv3-Tiny with TensorRT INT8 optimization achieves 47.3 FPS at 58.4 mAP on COCO, versus 28.1 FPS at 71.2 mAP for full YOLOv3. MobileNet-SSD with Coral USB acceleration achieves 89 FPS on Raspberry Pi 4B at 53.7 mAP, making it the preferred choice for power-constrained mobile deployments. We introduce the Embedded Vision Deployment Score (EVDS) that weights accuracy, throughput, power draw, and memory footprint according to four deployment profile templates, and provide a model selection decision tree for common embedded vision scenarios. Quantitative energy profiling data for all configurations is released to support green computing analysis in edge vision system design.