AMOD: A Large-scale Benchmark for RGB-T Multi-view Aerial Military Object Detection
Abstract
Existing benchmarks for aerial object detection provide limited support for studying RGB-T perception under controlled conditions. They often lack synchronized multi-view observations of the same scenario, making it hard to analyze viewpoint variation and cross-modal learning in a unified setting, especially under large viewpoint changes and modality discrepancies. In this paper, we introduce AMOD, a new benchmark dataset for RGB-T aerial military object detection, constructed using a game Arma3. AMOD provides paired visible (RGB) and thermal (T) images with aligned annotations, along with multi-view observations of the same area of interest, enabling consistent analysis across viewpoints and modalities. The dataset comprises 73,920 images and 383,212 instances spanning 12 military categories, generated across various background maps with controlled viewpoint configurations. We further observe that, although AMOD is constructed with military object categories, models pretrained on AMOD also transfer effectively to real-world aerial benchmarks containing civilian objects, suggesting its utility beyond military-target detection.