DiRS: On Creating Benchmark Datasets for Remote
Sensing Image Interpretation

Yang Long1, Gui-Song Xia1,2,*, Shengyang Li3, Wen Yang1,4,
Michael Ying Yang5, Xiao Xiang Zhu6, Liangpei Zhang1, Deren Li1.

1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China
2. School of Computer Science, Wuhan University, Wuhan 430079, China
3. Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
4. School of Electronic Information, Wuhan University, Wuhan 430072, China
5. Faculty of Geo-Information Science and Earth Observation, University of Twente, Hengelosestraat 99, Enschede, Netherlands
6. German Aerospace Center (DLR) and also Technical University of Munich, Germany

        

DiRS

Million-AID

Paper

PPT

1. Abstract

The past decade has witnessed the great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images, where benchmark datasets are essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image analysis. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present some principles, i.e., diversity, richness, and scalability (called DiRS), on constructing benchmark datasets in efficient manners. Following the DiRS principles, we also provide an example on building datasets for RS image classification, i.e., Million-AID, a new large-scale benchmark dataset containing million instances for RS scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this paper will provide RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.


2. Annotated Datasets for RS Image Interpretation

The interpretation of RS images has been playing an increasingly important role in a large diversity of applications, and thus, has attracted remarkable research attentions. Consequently, various datasets have been built to advance the development of interpretation algorithms for RS images. Covering literature published over the past decade, we perform a systematic review of the existing RS image datasets concerning the current mainstream of RS image interpretation tasks, including scene classification, object detection, semantic segmentation and change detection.

- Scene Classification

Comparison among different RS image scene classification datasets
Dataset #Cat. #Images per cat. #Images Resolution (m) Image size Year
UC-Merced
WHU-RS19
RSSCN7
SAT-4
SAT-6
BCS
RSC11
SIRI-WHU
NWPU-RESISC45
AID
RSI-CB128
RSI-CB256
RSD46-WHU
EuroSAT
PatternNet
21
19
7
4
6
2
11
12
45
30
45
35
46
10
38
100
50 to 61
400
89,963 to 178,034
10,262 to 150,400
1,438
~100
200
700
220 to 420
173 to 1,550
198 to 1,331
500 to 3,000
2,000 to 3,000
800
2,100
1,013
2,800
500,000
405,000
2,876
1,232
2,400
31,500
10,000
36,000
24,000
117,000
27,000
30,400
0.3
up to 0.5
--
1 to 6
1 to 6
--
~0.2
2
0.2 to 30
0.5 to 8
0.3 to 3
0.3 to 3
0.5 to 2
10
0.06 to 4.7
256×256
600×600
400×400
28×28
28×28
600×600
512×512
200×200
256×256
600×600
128×128
256×256
256×256
64×64
256×256
2010
2012
2015
2015
2015
2015
2016
2016
2016
2017
2017
2017
2017
2018
2018

- Object Dectection

Comparison among different RS image object detection datasets
Dataset #Annot. #Cat. #Instances #Images Image width Year
TAS
SZTAKI-INRIA
NWPU-VHR10
DLR3k
UCAS-AOD
VEDAI
COWC
HRSC2016
RSOD
CARPPK
LEVIR
VisDrone
xView
DOTA-v1.0
HRRSD
DIOR
DOTA-v1.5
DOTA-v2.0
HBB
OBB
HBB
OBB
OBB
OBB
CP
OBB
HBB
HBB
HBB
HBB
HBB
OBB
HBB
HBB
OBB
OBB
1
1
10
2
2
9
1
26
4
1
3
10
60
15
13
20
16
18
1,319
665
3,651
14,235
14,596
3,640
32,716
2,976
6,950
89,777
11,000
54,200
1,000,000
188,282
55,740
192,472
402, 089
1,488,666
30
9
800
20
1,510
1,210
53
1,061
976
1,448
22,000
10,209
1,413
2,806
21,761
23,463
2,806
11,067
792
~800
~1,000
5,616
~1,000
512/1,024
2,000--19,000
~1,100
~1,000
1,280
800
2,000
~3,000
800--4,000
152--10,569
800
800--13,000
800--20,000
2008
2012
2014
2015
2015
2016
2016
2016
2017
2017
2018
2018
2018
2018
2019
2019
2019
2020

- Semantic Segmentation

Comparison among different RS image semantic segmentation datasets
Dataset #Cat. #Images. Resolution (m) #Bands Image size Year
Kenney Space Center
Botswana
Salinas
University of Pavia
Pavia Centre
ISPRS Vaihingen
ISPRS Potsdam
Massachusetts Buildings
Massachusetts Roads
Indian Pines
Zurich Summer
Inria Dataset
EVlab-SS
RIT-18
WHU Building-Aerial Imagery
WHU Building-Satellite Imagery I
WHU Building-Satellite Imagery II
So2Sat LC242
SEN12MS
UAVid
GID
13
14
16
9
9
6
6
2
2
16
8
2
10
18
2
2
2
17
33
8
15
1
1
1
1
1
33
38
151
1,171
1
20
360
60
3
8,189
204
17,388
400,673
180,662 triplets
420
150
18
30
3.7
1.3
1.3
0.09
0.05
1
1
20
0.62
0.3
0.1 to 2
0.047
0.3
0.3 to 2.5
2.7
10
10 to 50
--
0.8 to 10
224 bands
242 bands
224 bands
115 bands
115 bands
IR, R, G, DSM, nDSM
IR, RGB, DSM, nDSM
RGB
RGB
224 bands
NIR, RGB
RGB
RGB
6 bands
RGB
RGB
RGB
10 bands
up to 13 bands
RGB
4 bands
512×614
1,476×256
512×217
610×340
1,096×492
~2,500×2500
6,000×6,000
1,500×1,500
1,500×1,500
145×145
1,000×1,150
1,500×1,500
4,500×4,500
9,000×6,000
512×512
512×512
512×512
32×32
256×256
~4,000×2,160
6,800×7,200
2005
2005
--
--
--
2012
2012
2013
2013
2015
2015
2017
2017
2017
2019
2019
2019
2019
2019
2020
2020

- Change Dectection

Comparison among different RS image change detection datasets
Dataset #Cat. #Image pairs. Resolution (m) #Bands Image size Year
SZTAKI AirChange
AICD
Taizhou Data
Kunshan Data
Yangcheng
Urban-rural boundary of Whuhan
Hermiston City area, Oregon
OSCD
Quasi-urban areas
WHU Building-Change Detection
Season-varing Dataset
ABCD
HRSCD
MtS-WH
LEVIR-CD
SCDN
2
2
4
3
4
20
5
2
3
2
2
2
6
9
2
30
13
1,000
1
1
2
1
1
24
1
1
16,000
4,253
291
1
637
4,214
1.5
0.5
30
30
30
4/30
30
10
0.5
0.2
0.03 to 0.1
0.4
0.5
1
0.5
0.5 to 3
RGB
115 bands
6 bands
6 bands
242 bands
4/9 bands
242 bands
13 bands
8 bands
RGB
RGB
RGB
RGB
NIR,RGB
RGB
RGB
952×640
800×600
400×400
800×800
400×145
960×960
390×200
600×600
1,200×1,200
32,207×15,354
256×256
160×160
10,000×10,000
7,200×6,000
1,024×1,024
512×512
2009
2011
2014
2014
2018
2018
2018
2018
2018
2018
2018
2018
2019
2019
2020
2020

3. DiRS: Principles to Build RS Image Benchmarks

The primary point to construct a meaningful RS image dataset is that the dataset should be created on the basis of the requirements of practical applications rather than the characteristics of algorithms to be employed. Moreover, the annotation of RS image dataset is better to be conducted by the application sides rather than the algorithm developers. Thus, the annotated dataset is naturally application-oriented, which is more conducive to enhance the practicability of the interpretation algorithm. With these points in mind, the i.e., diversity, richness, and scalability (called DiRS) could be considered as the basic principles when creating benchmark datasets for RS image interpretation. We believe that these principles are complementary to each other. That is, the improvement of dataset in one principle can simultaneously promote the dataset quality reflected in other principles.


4. An Example: Million-AID

Following the DiRS principles, we provide an example on building datasets for RS image classification, i.e., Million-AID, a new large-scale benchmark dataset containing million instances for RS scene classification. Million-AID contains a wide range of semantic categories, i.e., 51 scene categories organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unutilized land, and water area. The scene category network provides the dataset with excellent organization of relationship among different scene categories and also the property of scalability. The number of images in each scene category ranges from about 2,000 to 45,000, endowing the dataset with the property of long tail distribution. Besides, Million-AID has superiorities over the existing scene classification datasets owing to its high spatial resolution, large scale, and global distribution.

- Category Network

- Semantic Coordinates Collection

- Scene Image Acquisition

- Data Download

Million-AID will be released for public accessibility.


Citation

If you want to make use of Million-AID, please cite our following paper:

@article{Long2020DiRS,
title={DiRS: On Creating Benchmark Datasets for Remote Sensing Image Interpretation},
author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
journal={arXiv preprint arXiv: 2006.12485},
year={2020}
}
	                

Contact

If you have any problem, please contact:

  • Yang Long at longyang@whu.edu.cn
  • Gui-Song Xia at guisong.xia@whu.edu.cn