On Creating Benchmark Dataset for Aerial Image
Interpretation: Reviews, Guidances and Million-AID

Yang Long1, Gui-Song Xia1,2,*, Shengyang Li3, Wen Yang1,4,
Michael Ying Yang5, Xiao Xiang Zhu6, Liangpei Zhang1, Deren Li1.

1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China
2. School of Computer Science, Wuhan University, Wuhan 430079, China
3. Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
4. School of Electronic Information, Wuhan University, Wuhan 430072, China
5. Faculty of Geo-Information Science and Earth Observation, University of Twente, Hengelosestraat 99, Enschede, Netherlands
6. German Aerospace Center (DLR) and also Technical University of Munich, Germany

        

DiRS

Million-AID

Paper

PPT

1. Abstract

The past years have witnessed great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images. In this context, the benchmark datasets serve as essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present the general guidances on creating benchmark datasets in efficient manners. Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset containing a million instances for RS image scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this paper will provide the RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.


2. Annotated Datasets for RS Image Interpretation

The interpretation of RS images has been playing an increasingly important role in a large diversity of applications, and thus, has attracted remarkable research attentions. Consequently, various datasets have been built to advance the development of interpretation algorithms for RS images. Covering literature published over the past decade, we perform a systematic review of the existing RS image datasets concerning the current mainstream of RS image interpretation tasks, including scene classification, object detection, semantic segmentation and change detection.

- Scene Classification

Comparison among different RS image scene classification datasets
Dataset #Cat. #Images per cat. #Images Resolution (m) Image size GL/IT/SP Year
UC-Merced
WHU-RS19
RSSCN7
SAT-4
SAT-6
BCS
RSC11
SIRI-WHU
NWPU-RESISC45
AID
RSI-CB128
RSI-CB256
Planet-UAS
RSD46-WHU
MASATI
EuroSAT
PatternNet
fMoW
WiDS Datathon 2019
Optimal-31
BigEarthNet
CLRS
MLRSN
21
19
7
4
6
2
11
12
45
30
45
35
17
46
7
10
38
62
2
31
43
25
46
100
50 to 61
400
89,963 to 178,034
10,262 to 150,400
1,438
~100
200
700
220 to 420
173 to 1,550
198 to 1,331
--
500 to 3,000
304 to 1,789
2,000 to 3,000
800
--
--
60
328 to 217,119
600
1,500 to 3,000
2,100
1,013
2,800
500,000
405,000
2,876
1,232
2,400
31,500
10,000
36,000
24,000
40,408
117,000
7,389
27,000
30,400
132,716
20,000
1,860
590,326
15,000
109,161
0.3
up to 0.5
--
1 to 6
1 to 6
--
~0.2
2
0.2 to 30
0.5 to 8
0.3 to 3
0.3 to 3
3 to 5
0.5 to 2
--
10
0.06 to 4.7
0.5
3
--
10,20,60
0.26 to 8.85
0.1 to 10
256×256
600×600
400×400
28×28
28×28
600×600
512×512
200×200
256×256
600×600
128×128
256×256
256×256
256×256
512×512
64×64
256×256
74×58 to 16184×16288
256×256
256×256
20×20,60×60,120×120
256×256
256×256
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✓✓✓
✗✗✗
✗✗✗
✓✓✓
✗✗✗
✓✓✓
✗✗✗
✗✗✗
✓✓✓
✗✗✗
✗✗✗
2010
2012
2015
2015
2015
2015
2016
2016
2016
2017
2017
2017
2017
2017
2018
2018
2018
2018
2019
2019
2019
2020
2020

- Object Detection

Comparison among different RS image object detection datasets
Dataset #Annot. #Cat. #Instances #Images Resolution (m) Image width GL/IT/SP Year
TAS
ORIDS
SZTAKI-INRIA
NWPU-VHR10
DLR-MVDA
UCAS-AOD
VEDAI
COWC
HRSC2016
RSOD
CARPK
SSDD/SSDD+
SpaceNet1-6
LEVIR
VisDrone
xView
DOTA-v1.0
ITCVD
WHU building dataset
DeepGlobe Building
OpenSARShip
CrowdAI Mapping Challenge
Airbus Ship Detection Challenge
iSAID
HRRSD
DIOR
DOTA-v1.5
SAR-Ship-Dataset
AIR-SARShip
HRSID
RarePlanes
DOTA-v2.0
HBB
OBB
OBB
HBB
OBB
OBB
OBB
CP
OBB
HBB
HBB
HBB/OBB
Polygon
HBB
HBB
HBB
OBB
HBB
Polygon
Polygon
Chip
Polygon
Polygon
Polygon
HBB
HBB
OBB
HBB
HBB
HBB
Polygon
OBB
1
5
1
10
2
2
9
1
26
4
1
1
1
3
10
60
15
1
1
2
1
1
1
15
13
20
16
1
1
1
1
18
1,319
1,800
665
3,651
14,235
14,596
3,640
32,716
2,976
6,950
89,777
2,456
859,982
11,000
54,200
1,000,000
188,282
29,088
221,107
302,701
1,1346
2,910,917
~131,000
655,451
55,740
192,472
402,089
5,9535
2,040
16,951
644,258
1,793,658
30
900
9
800
20
1,510
1,210
53
1,061
976
1,448
1,160
--
22,000
10,209
1,413
2,806
173
25,420
24586
41
341,058
208,162
2,806
21,761
23,463
2,806
43,819
300
5,604
50,253
11,268
--
up to 0.08
--
0.08 to 2
0.13
--
0.125
0.15
--
0.3 to 3
--
1 to 15
up to 0.3
0.2 to 1
--
0.3
up to 0.3
0.1
0.075 to 2.7
0.3
~10
--
--
up to 0.3
0.15 to 1.2
0.5 to 30
up to 0.3
up to 0.3
1;3
0.5;1;3
0.3
up to 0.3
792
256 to 640
~800
~1,000
5,616
~1,000
512/1,024
2,000 to 19,000
~1,100
~1,000
1,280
~500
--
800
2,000
~3,000
800 to 4,000
3,744×5,616
512
650
--
300
768
800 to 4,000
152 to 10,569
800
800 to 13,000
256
1,000
800
--
800--20,000
✗✗✗
✓✓✓
✗✗✗
✗✗✗
✗✗✓
✗✗✗
✓✗✗
✓✗✗
✗✗✗
✗✗✗
✗✗✓
✗✗✓
✓✓✓
✗✗✗
✗✗✗
✓✗✓
✗✗✗
✗✗✗
✗✗✗
✗✗✓
✓✓✓
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✗
✗✗✓
✓✓✓
✗✗✗
✓✓✓
✗✗✗
2008
2009
2012
2014
2015
2015
2016
2016
2016
2017
2017
2017
2018
2018
2018
2018
2018
2018
2018
2018
2018
2018
2018
2019
2019
2019
2019
2019
2020
2020
2020
2020

- Semantic Segmentation

Comparison among different RS image semantic segmentation datasets
Dataset #Cat. #Images. Resolution (m) #Bands Image size GL/IT/SP Year
Kenney Space Center
Botswana
Salinas
University of Pavia
Pavia Centre
ISPRS Vaihingen
ISPRS Potsdam
Massachusetts Buildings
Massachusetts Roads
Indian Pines
Zurich Summer
SPARCS Validation
Biome
Inria Dataset
EvLab-SS
RIT-18
CITY-OSM
Dstl-SIFD
IEEE GRSS Data Fusion Contest 2017
IEEE GRSS Data Fusion Contest 2018
Aeroscapes
DLRSD
DeepGlobe Land Cover
So2Sat LC242
SEN12MS
95-Cloud
Shakeel et al.
ALCD Cloud Masks
SkyScapes
DroneDeploy
Slovenia LULC
LandConverNet
UAVid
GID
LandCover.ai
Agriculture-Vision
Sentinel-2 Cloud Mask Catalogue
13
14
16
9
9
6
6
2
2
16
8
7
4
2
10
18
3
10
17
20
11
17
7
17
33
1
1
8
31
7
10
7
8
15
3
9
18
1
1
1
1
1
33
38
151
1,171
1
20
80
96
360
60
3
1,671
57
30
1
3,269
2,100
1,146
400,673
180,662 triplets
43,902
2,682
38
16
55
940
1,980
420
150
41
94,986
513
18
30
3.7
1.3
1.3
0.09
0.05
1
1
20
0.62
30
30
0.3
0.1 to 2
0.047
0.1
up to 0.3
1.4
1
--
0.3
0.5
10
10 to 50
30
0.3
10
0.13
0.1
10
10
--
0.8 to 10
0.25,0.5
0.1,15,0.2
20
224 bands
242 bands
224 bands
115 bands
115 bands
IR, R, G, DSM, nDSM
IR, RGB, DSM, nDSM
RGB
RGB
224 bands
NIR, RGB
11
11
RGB
RGB
6 bands
RGB
up to 16
9
48
RGB
RGB
RGB
10 bands
up to 13 bands
NIR,RGB
RGB
RGB
RGB
RGB
6
NIR,RGB
RGB
4 bands
RGB
NIR,RGB
13
512×614
1,476×256
512×217
610×340
1,096×492
~2,500×2500
6,000×6,000
1,500×1,500
1,500×1,500
145×145
1,000×1,150
1000×1000
~9000×9000
5000×5000
4,500×4,500
9,000×6,000
2500×2500 to 3300×33300
~3,350×3,400
643×666,374×515
4172×1202
720×1280
256×256
2,448×2,448
32×32
256×256
384×384
300×300
1,830×1,830
5,616×3,744
up to 12,039×13,854
5,00×5,00
256×256
~4,000×2,160
6,800×7,200
9,000×9,500;4,200×4,700
512×512
1,024×1,024
✗✓✓
✗✓✓
✗✗✓
✗✗✓
✗✗✓
✗✗✓
✓✗✓
✓✓✗
✓✓✗
✓✓✓
✓✓✓
✓✓✓
✓✓✓
✗✗✗
✗✗✓
✓✓✓
✗✗✗
✓✗✓
✓✓✓
✓✓✓
✗✗✗
✗✗✗
✗✗✓
✓✗✓
✓✗✓
✓✗✓
✗✗✗
✓✓✓
✗✗✗
✗✗✗
✓✓✓
✓✓✓
✗✗✓
✓✓✓
✓✗✗
✗✗✓
✓✓✓
2005
2005
--
--
--
2012
2012
2013
2013
2015
2015
2016
2017
2017
2017
2017
2017
2017
2017
2018
2018
2018
2018
2019
2019
2019
2019
2019
2019
2019
2019
2020
2020
2020
2020
2020
2020

- Change Detection

Comparison among different RS image change detection datasets
Dataset #Cat. #Image pairs. Resolution (m) #Bands Image size GL/IT/SP Year
SZTAKI AirChange
AICD
Taizhou Data
Kunshan Data
Cross-sensor Bastrop
MtS-WH
Yancheng
GETNET dataset
Urban-rural boundary of Whuhan
Hermiston City area, Oregon
OSCD
WHU building dataset
Season-varing Dataset
ABCD
California flood dataset
Lopez-Fandino et al.
HRSCD
xBD
LEVIR-CD
SECOND
Google Dataset
Zhang et al.
Hi-UCD
SpaceNet7
S2MTCP
2
2
4
3
2
9
4
2
20
5
2
2
2
2
2
5
6
6
2
30
2
2
9
--
2
13
1,000
1
1
4
1
2
1
1
1
24
1
16,000
16,950
1
2
291
11034
637
4,214
1067
4
1,293
24
1,520
1.5
0.5
30
30
30, 120
1
30
30
4/30
30
10
0.2
0.03 to 0.1
0.4
5,30
20
up to 0.8
0.5
0.5
0.5 to 3
0.55
2,2.4,5.8
0.1
4
up to 10
RGB
115 bands
6 bands
6 bands
7,9
NIR, RGB
242 bands
198
4/9 bands
242 bands
13 bands
RGB
RGB
RGB
RGB,11
224
RGB
RGB
RGB
RGB
RGB
NIR,RGB
RGB
RGB
13
952×640
800×600
400×400
800×800
444×300;1534×808
7,200×6,000
400×145
463×241
960×960
390×200
600×600
32,207×15,354
256×256
128×128;160×160
1534×808
984×740;600×500
10,000×10,000
1024×1,024
1,024×1,024
512×512
256×256
1,431×1,431;458×559;1,154×740
1,024×1,024
--
600×600
✗✓✗
✗✗✗
✓✓✓
✓✓✓
✓✓✓
✓✓✓
✓✓✓
✗✓✓
✓✓✓
✓✓✓
✓✓✓
✓✓✓
✗✗✗
✗✓✗
✓✓✓
✗✓✓
✓✓✓
✓✓✓
✗✗✗
✗✗✗
✓✓✗
✓✓✓
--/--/✓
✓✓✓
✓✓✓
2009
2011
2014
2014
2015
2017
2018
2018
2018
2018
2018
2018
2018
2018
2019
2019
2019
2019
2020
2020
2020
2020
2020
2020
2021

3. DiRS: Principles to Build RS Image Benchmarks

The primary point to construct a meaningful RS image dataset is that the dataset should be created on the basis of the requirements of practical applications rather than the characteristics of algorithms to be employed. Moreover, the annotation of RS image dataset is better to be conducted by the application sides rather than the algorithm developers. Thus, the annotated dataset is naturally application-oriented, which is more conducive to enhance the practicability of the interpretation algorithm. With these points in mind, the i.e., diversity, richness, and scalability (called DiRS) could be considered as the basic principles when creating benchmark datasets for RS image interpretation. We believe that these principles are complementary to each other. That is, the improvement of dataset in one principle can simultaneously promote the dataset quality reflected in other principles.


4. An Example: Million-AID

Following the DiRS principles, we provide an example on building datasets for RS image classification, i.e., Million-AID, a new large-scale benchmark dataset containing million instances for RS scene classification. Million-AID contains a wide range of semantic categories, i.e., 51 scene categories organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unutilized land, and water area. The scene category network provides the dataset with excellent organization of relationship among different scene categories and also the property of scalability. The number of images in each scene category ranges from about 2,000 to 45,000, endowing the dataset with the property of long tail distribution. Besides, Million-AID has superiorities over the existing scene classification datasets owing to its high spatial resolution, large scale, and global distribution.

- Category Network

- Semantic Coordinates Collection

- Scene Image Acquisition

Dataset & Evaluation

Million-AID has been released for public accessibility.


Citation

If you want to make use of Million-AID, please cite our following paper:

@article{Long2021DiRS,
title={On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID},
author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
year={2021},
volume={14},
pages={4205-4230}
}

@misc{Long2022ASP,
title={Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling}, 
author={Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren Li},
year={2022},
eprint={2201.01953},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Contact

If you have any problem, please contact:

  • Yang Long at longyang@whu.edu.cn