On Creating Benchmark Dataset for Aerial Image
Interpretation: Reviews, Guidances and Million-AID

Yang Long¹, Gui-Song Xia^1,2,*, Shengyang Li³, Wen Yang^1,4,
Michael Ying Yang⁵, Xiao Xiang Zhu⁶, Liangpei Zhang¹, Deren Li¹.

1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China
2. School of Computer Science, Wuhan University, Wuhan 430079, China
3. Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
4. School of Electronic Information, Wuhan University, Wuhan 430072, China
5. Faculty of Geo-Information Science and Earth Observation, University of Twente, Hengelosestraat 99, Enschede, Netherlands
6. German Aerospace Center (DLR) and also Technical University of Munich, Germany

DiRS

Million-AID

Paper

PPT

1. Abstract

The past years have witnessed great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images. In this context, the benchmark datasets serve as essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present the general guidances on creating benchmark datasets in efficient manners. Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset containing a million instances for RS image scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this paper will provide the RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.

2. Annotated Datasets for RS Image Interpretation

The interpretation of RS images has been playing an increasingly important role in a large diversity of applications, and thus, has attracted remarkable research attentions. Consequently, various datasets have been built to advance the development of interpretation algorithms for RS images. Covering literature published over the past decade, we perform a systematic review of the existing RS image datasets concerning the current mainstream of RS image interpretation tasks, including scene classification, object detection, semantic segmentation and change detection.

- Scene Classification

Comparison among different RS image scene classification datasets
Dataset	#Cat.	#Images per cat.	#Images	Resolution (m)	Image size	GL/IT/SP	Year
UC-Merced WHU-RS19 RSSCN7 SAT-4 SAT-6 BCS RSC11 SIRI-WHU NWPU-RESISC45 AID RSI-CB128 RSI-CB256 Planet-UAS RSD46-WHU MASATI EuroSAT PatternNet fMoW WiDS Datathon 2019 Optimal-31 BigEarthNet CLRS MLRSN	21 19 7 4 6 2 11 12 45 30 45 35 17 46 7 10 38 62 2 31 43 25 46	100 50 to 61 400 89,963 to 178,034 10,262 to 150,400 1,438 ~100 200 700 220 to 420 173 to 1,550 198 to 1,331 -- 500 to 3,000 304 to 1,789 2,000 to 3,000 800 -- -- 60 328 to 217,119 600 1,500 to 3,000	2,100 1,013 2,800 500,000 405,000 2,876 1,232 2,400 31,500 10,000 36,000 24,000 40,408 117,000 7,389 27,000 30,400 132,716 20,000 1,860 590,326 15,000 109,161	0.3 up to 0.5 -- 1 to 6 1 to 6 -- ~0.2 2 0.2 to 30 0.5 to 8 0.3 to 3 0.3 to 3 3 to 5 0.5 to 2 -- 10 0.06 to 4.7 0.5 3 -- 10,20,60 0.26 to 8.85 0.1 to 10	256×256 600×600 400×400 28×28 28×28 600×600 512×512 200×200 256×256 600×600 128×128 256×256 256×256 256×256 512×512 64×64 256×256 74×58 to 16184×16288 256×256 256×256 20×20,60×60,120×120 256×256 256×256	✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✓✓✓ ✗✗✗ ✗✗✗ ✓✓✓ ✗✗✗ ✓✓✓ ✗✗✗ ✗✗✗ ✓✓✓ ✗✗✗ ✗✗✗	2010 2012 2015 2015 2015 2015 2016 2016 2016 2017 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019 2019 2020 2020

- Object Detection

Comparison among different RS image object detection datasets
Dataset	#Annot.	#Cat.	#Instances	#Images	Resolution (m)	Image width	GL/IT/SP	Year
TAS ORIDS SZTAKI-INRIA NWPU-VHR10 DLR-MVDA UCAS-AOD VEDAI COWC HRSC2016 RSOD CARPK SSDD/SSDD+ SpaceNet1-6 LEVIR VisDrone xView DOTA-v1.0 ITCVD WHU building dataset DeepGlobe Building OpenSARShip CrowdAI Mapping Challenge Airbus Ship Detection Challenge iSAID HRRSD DIOR DOTA-v1.5 SAR-Ship-Dataset AIR-SARShip HRSID RarePlanes DOTA-v2.0	HBB OBB OBB HBB OBB OBB OBB CP OBB HBB HBB HBB/OBB Polygon HBB HBB HBB OBB HBB Polygon Polygon Chip Polygon Polygon Polygon HBB HBB OBB HBB HBB HBB Polygon OBB	1 5 1 10 2 2 9 1 26 4 1 1 1 3 10 60 15 1 1 2 1 1 1 15 13 20 16 1 1 1 1 18	1,319 1,800 665 3,651 14,235 14,596 3,640 32,716 2,976 6,950 89,777 2,456 859,982 11,000 54,200 1,000,000 188,282 29,088 221,107 302,701 1,1346 2,910,917 ~131,000 655,451 55,740 192,472 402,089 5,9535 2,040 16,951 644,258 1,793,658	30 900 9 800 20 1,510 1,210 53 1,061 976 1,448 1,160 -- 22,000 10,209 1,413 2,806 173 25,420 24586 41 341,058 208,162 2,806 21,761 23,463 2,806 43,819 300 5,604 50,253 11,268	-- up to 0.08 -- 0.08 to 2 0.13 -- 0.125 0.15 -- 0.3 to 3 -- 1 to 15 up to 0.3 0.2 to 1 -- 0.3 up to 0.3 0.1 0.075 to 2.7 0.3 ~10 -- -- up to 0.3 0.15 to 1.2 0.5 to 30 up to 0.3 up to 0.3 1;3 0.5;1;3 0.3 up to 0.3	792 256 to 640 ~800 ~1,000 5,616 ~1,000 512/1,024 2,000 to 19,000 ~1,100 ~1,000 1,280 ~500 -- 800 2,000 ~3,000 800 to 4,000 3,744×5,616 512 650 -- 300 768 800 to 4,000 152 to 10,569 800 800 to 13,000 256 1,000 800 -- 800--20,000	✗✗✗ ✓✓✓ ✗✗✗ ✗✗✗ ✗✗✓ ✗✗✗ ✓✗✗ ✓✗✗ ✗✗✗ ✗✗✗ ✗✗✓ ✗✗✓ ✓✓✓ ✗✗✗ ✗✗✗ ✓✗✓ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✓ ✓✓✓ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✗ ✗✗✓ ✓✓✓ ✗✗✗ ✓✓✓ ✗✗✗	2008 2009 2012 2014 2015 2015 2016 2016 2016 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2019 2019 2019 2019 2019 2020 2020 2020 2020

- Semantic Segmentation

Comparison among different RS image semantic segmentation datasets
Dataset	#Cat.	#Images.	Resolution (m)	#Bands	Image size	GL/IT/SP	Year
Kenney Space Center Botswana Salinas University of Pavia Pavia Centre ISPRS Vaihingen ISPRS Potsdam Massachusetts Buildings Massachusetts Roads Indian Pines Zurich Summer SPARCS Validation Biome Inria Dataset EvLab-SS RIT-18 CITY-OSM Dstl-SIFD IEEE GRSS Data Fusion Contest 2017 IEEE GRSS Data Fusion Contest 2018 Aeroscapes DLRSD DeepGlobe Land Cover So2Sat LC242 SEN12MS 95-Cloud Shakeel et al. ALCD Cloud Masks SkyScapes DroneDeploy Slovenia LULC LandConverNet UAVid GID LandCover.ai Agriculture-Vision Sentinel-2 Cloud Mask Catalogue	13 14 16 9 9 6 6 2 2 16 8 7 4 2 10 18 3 10 17 20 11 17 7 17 33 1 1 8 31 7 10 7 8 15 3 9 18	1 1 1 1 1 33 38 151 1,171 1 20 80 96 360 60 3 1,671 57 30 1 3,269 2,100 1,146 400,673 180,662 triplets 43,902 2,682 38 16 55 940 1,980 420 150 41 94,986 513	18 30 3.7 1.3 1.3 0.09 0.05 1 1 20 0.62 30 30 0.3 0.1 to 2 0.047 0.1 up to 0.3 1.4 1 -- 0.3 0.5 10 10 to 50 30 0.3 10 0.13 0.1 10 10 -- 0.8 to 10 0.25,0.5 0.1,15,0.2 20	224 bands 242 bands 224 bands 115 bands 115 bands IR, R, G, DSM, nDSM IR, RGB, DSM, nDSM RGB RGB 224 bands NIR, RGB 11 11 RGB RGB 6 bands RGB up to 16 9 48 RGB RGB RGB 10 bands up to 13 bands NIR,RGB RGB RGB RGB RGB 6 NIR,RGB RGB 4 bands RGB NIR,RGB 13	512×614 1,476×256 512×217 610×340 1,096×492 ~2,500×2500 6,000×6,000 1,500×1,500 1,500×1,500 145×145 1,000×1,150 1000×1000 ~9000×9000 5000×5000 4,500×4,500 9,000×6,000 2500×2500 to 3300×33300 ~3,350×3,400 643×666,374×515 4172×1202 720×1280 256×256 2,448×2,448 32×32 256×256 384×384 300×300 1,830×1,830 5,616×3,744 up to 12,039×13,854 5,00×5,00 256×256 ~4,000×2,160 6,800×7,200 9,000×9,500;4,200×4,700 512×512 1,024×1,024	✗✓✓ ✗✓✓ ✗✗✓ ✗✗✓ ✗✗✓ ✗✗✓ ✓✗✓ ✓✓✗ ✓✓✗ ✓✓✓ ✓✓✓ ✓✓✓ ✓✓✓ ✗✗✗ ✗✗✓ ✓✓✓ ✗✗✗ ✓✗✓ ✓✓✓ ✓✓✓ ✗✗✗ ✗✗✗ ✗✗✓ ✓✗✓ ✓✗✓ ✓✗✓ ✗✗✗ ✓✓✓ ✗✗✗ ✗✗✗ ✓✓✓ ✓✓✓ ✗✗✓ ✓✓✓ ✓✗✗ ✗✗✓ ✓✓✓	2005 2005 -- -- -- 2012 2012 2013 2013 2015 2015 2016 2017 2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019 2019 2019 2019 2019 2019 2019 2020 2020 2020 2020 2020 2020

- Change Detection

Comparison among different RS image change detection datasets
Dataset	#Cat.	#Image pairs.	Resolution (m)	#Bands	Image size	GL/IT/SP	Year
SZTAKI AirChange AICD Taizhou Data Kunshan Data Cross-sensor Bastrop MtS-WH Yancheng GETNET dataset Urban-rural boundary of Whuhan Hermiston City area, Oregon OSCD WHU building dataset Season-varing Dataset ABCD California flood dataset Lopez-Fandino et al. HRSCD xBD LEVIR-CD SECOND Google Dataset Zhang et al. Hi-UCD SpaceNet7 S2MTCP	2 2 4 3 2 9 4 2 20 5 2 2 2 2 2 5 6 6 2 30 2 2 9 -- 2	13 1,000 1 1 4 1 2 1 1 1 24 1 16,000 16,950 1 2 291 11034 637 4,214 1067 4 1,293 24 1,520	1.5 0.5 30 30 30, 120 1 30 30 4/30 30 10 0.2 0.03 to 0.1 0.4 5,30 20 up to 0.8 0.5 0.5 0.5 to 3 0.55 2,2.4,5.8 0.1 4 up to 10	RGB 115 bands 6 bands 6 bands 7,9 NIR, RGB 242 bands 198 4/9 bands 242 bands 13 bands RGB RGB RGB RGB,11 224 RGB RGB RGB RGB RGB NIR,RGB RGB RGB 13	952×640 800×600 400×400 800×800 444×300;1534×808 7,200×6,000 400×145 463×241 960×960 390×200 600×600 32,207×15,354 256×256 128×128;160×160 1534×808 984×740;600×500 10,000×10,000 1024×1,024 1,024×1,024 512×512 256×256 1,431×1,431;458×559;1,154×740 1,024×1,024 -- 600×600	✗✓✗ ✗✗✗ ✓✓✓ ✓✓✓ ✓✓✓ ✓✓✓ ✓✓✓ ✗✓✓ ✓✓✓ ✓✓✓ ✓✓✓ ✓✓✓ ✗✗✗ ✗✓✗ ✓✓✓ ✗✓✓ ✓✓✓ ✓✓✓ ✗✗✗ ✗✗✗ ✓✓✗ ✓✓✓ --/--/✓ ✓✓✓ ✓✓✓	2009 2011 2014 2014 2015 2017 2018 2018 2018 2018 2018 2018 2018 2018 2019 2019 2019 2019 2020 2020 2020 2020 2020 2020 2021

3. DiRS: Principles to Build RS Image Benchmarks

The primary point to construct a meaningful RS image dataset is that the dataset should be created on the basis of the requirements of practical applications rather than the characteristics of algorithms to be employed. Moreover, the annotation of RS image dataset is better to be conducted by the application sides rather than the algorithm developers. Thus, the annotated dataset is naturally application-oriented, which is more conducive to enhance the practicability of the interpretation algorithm. With these points in mind, the i.e., diversity, richness, and scalability (called DiRS) could be considered as the basic principles when creating benchmark datasets for RS image interpretation. We believe that these principles are complementary to each other. That is, the improvement of dataset in one principle can simultaneously promote the dataset quality reflected in other principles.

4. An Example: Million-AID

Following the DiRS principles, we provide an example on building datasets for RS image classification, i.e., Million-AID, a new large-scale benchmark dataset containing million instances for RS scene classification. Million-AID contains a wide range of semantic categories, i.e., 51 scene categories organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unutilized land, and water area. The scene category network provides the dataset with excellent organization of relationship among different scene categories and also the property of scalability. The number of images in each scene category ranges from about 2,000 to 45,000, endowing the dataset with the property of long tail distribution. Besides, Million-AID has superiorities over the existing scene classification datasets owing to its high spatial resolution, large scale, and global distribution.

- Category Network

- Semantic Coordinates Collection

- Scene Image Acquisition

Dataset & Evaluation

Million-AID has been released for public accessibility.

Citation

If you want to make use of Million-AID, please cite our following paper:

@article{Long2021DiRS,
title={On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID},
author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
year={2021},
volume={14},
pages={4205-4230}
}

@misc{Long2022ASP,
title={Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling}, 
author={Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren Li},
year={2022},
eprint={2201.01953},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Contact

If you have any problem, please contact:

Yang Long at longyang@whu.edu.cn