AMOD: A Large-scale Benchmark for RGB-T Multi-view Aerial Military Object Detection

Kim, Yechan; Joo, Jongmin; Park, JongHyun; Yu, Jongmin; Jeon, Moongu

AMOD: A Large-scale Benchmark for RGB-T Multi-view Aerial Military Object Detection

Yechan Kim^*,1, Jongmin Joo¹, JongHyun Park¹, Jongmin Yu², Moongu Jeon¹

¹GIST, ²University of Cambridge
Under Review
^*Correspondence to yechankim@gm.gist.ac.kr

Paper Supplementary Dataset Code

(a) Sample images with annotations from our AMOD dataset. (b) Comparison of the AMOD dataset with existing RGB-T aerial object detection benchmarks.

Abstract

Existing benchmarks for aerial object detection provide limited support for studying RGB-T perception under controlled conditions. They often lack synchronized multi-view observations of the same scenario, making it hard to analyze viewpoint variation and cross-modal learning in a unified setting, especially under large viewpoint changes and modality discrepancies. In this paper, we introduce AMOD, a new benchmark dataset for RGB-T aerial military object detection, constructed using a game Arma3. AMOD provides paired visible (RGB) and thermal (T) images with aligned annotations, along with multi-view observations of the same area of interest, enabling consistent analysis across viewpoints and modalities. The dataset comprises 73,920 images and 383,212 instances spanning 12 military categories, generated across various background maps with controlled viewpoint configurations. We further observe that, although AMOD is constructed with military object categories, models pretrained on AMOD also transfer effectively to real-world aerial benchmarks containing civilian objects, suggesting its utility beyond military-target detection.

How to construct the AMOD dataset?

(a) Geometric definition of the observing angle (𝜃), where the camera shifts laterally by Δ𝑦 in 𝑌-axis, producing an off-nadir view. (b) Drone camera sweeping setup simultaneously capturing the same area from nadir to very off-nadir viewpoints. (c) Examples from the RGB version of AMOD under different observation angles, showing how target appearance and perspective distortion vary with 𝜃.

Statistics of the AMOD dataset

(a) Object models per category used in Arma3. (b-1) Instance count per category. (b-2) Instance distribution per category in train/dev/test splits. (c) Normalized instance count by Arma3 models across categories. (d) Size distribution (Small/Medium/Large) in train/dev/test splits. (e) Instance size of OBBs per category. (f) Aspect ratio distribution of OBBs per category. Note that RGB and thermal modalities are perfectly aligned, with identical image counts, instance counts, and bounding box annotations in our benchmark.

Examples of low inter-class and high intra-class variances in Arma3 assets in our dataset

Examples of (a) low inter-class and (b) high intra-class variances in Arma3 models we used for our dataset. The below table indicates the class-wise test AP scores on our AMOD benchmark (Baseline model: Oriented-RCNN with Swin-S).

Potential applications of the AMOD dataset

01

Multi-view Learning Benchmark

Benchmark and advance multi-view learning methods with diverse aerial viewpoints and structured scene variations.

02

RGB-T Fusion and Cross-modal Learning

Develop and evaluate RGB-T fusion pipelines and cross-modal representation learning approaches in a controlled setting.

03

Pretraining for Real-world Transfer

Use AMOD as an effective pretraining source for transfer to real-world aerial domains and downstream perception tasks.

Tip: Arma3?

Arma3 is a military tactical shooter video game developed and published by Bohemia Interactive. It’s well-known in the gaming and simulation community for its realism, large-scale combat environments, and modding capabilities. For more information, please visit the official website of Arma3: https://arma3.com/.

License

Usage terms and external dependencies for the AMOD dataset

The dataset is distributed under the Arma Public License (APL), and users are required to comply with all terms of the APL as well as Bohemia Interactive’s Arma3 End User License Agreement (EULA).

These terms include attribution obligations and strict restrictions against commercial use. This dataset is provided for research purposes only.

Dependencies

We rely on 1 officially supported DLC and 11 community-created mods available for Arma3.

DLC

Global Mobilization (Vertexmacht)

MODs

3CB Factions (3cb.mods)
Automatic Warning Suppressor (skirmish_uk)
Cold War Rearmed III (W0lle)
CBA_A3 (CBA Team)
CUP Vehicles (CUP Team)
POOK ARTY Pack
POOK SAM Pack (hcpookie)
RHSAFRF
RHSGREF
RHSSAF
RHSUSAF (Red Hammer Studios)

BibTeX

@article{AMOD2026,
  title={AMOD: A Large-scale Benchmark for RGB-T Multi-view Aerial Military Object Detection},
  author={Kim, Yechan and Joo, Jongmin and Park, JongHyun and Yu, Jongmin and Jeon, Moongu},
  year={2026},
  journal={Preprint}
}

More Works from Our Lab

G-MAD: An Open-Source Toolkit for Synthetic RGB-T Military Asset Detection Data Generation

NBBOX: Noisy Bounding Box Improves Remote Sensing Object Detection

NSegment: Label-specific Deformations for Remote Sensing Image Segmentation

AMOD: A Large-scale Benchmark for RGB-T Multi-view Aerial Military Object Detection

(a) Sample images with annotations from our AMOD dataset. (b) Comparison of the AMOD dataset with existing RGB-T aerial object detection benchmarks.

Abstract

How to construct the AMOD dataset?

Statistics of the AMOD dataset

Examples of low inter-class and high intra-class variances in Arma3 assets in our dataset

Examples of (a) low inter-class and (b) high intra-class variances in Arma3 models we used for our dataset. The below table indicates the class-wise test AP scores on our AMOD benchmark (Baseline model: Oriented-RCNN with Swin-S).

Potential applications of the AMOD dataset

Multi-view Learning Benchmark

RGB-T Fusion and Cross-modal Learning

Pretraining for Real-world Transfer

Tip: Arma3?

License

Dependencies

BibTeX