DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The instances in DOTA images are annotated by experts in aerial image interpretation by arbitrary (8 d.o.f.) quadrilateral. We will continue to update DOTA, to grow in size and scope to reflect evolving real-world conditions. Now it has three versions:
If you make use of DOTA, please cite our following works:
@misc{ding2021object,
title={Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges},
author={Jian Ding and Nan Xue and Gui-Song Xia and Xiang Bai and Wen Yang and Micheal Ying Yang and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
year={2021},
eprint={2102.12219},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@InProceedings{Xia_2018_CVPR,
title={DOTA: A Large-Scale Dataset for Object Detection in Aerial Images},
author={Gui-Song Xia and Xiang Bai and Jian Ding and Zhen Zhu and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2018}
}
@InProceedings{Ding_2019_CVPR,
title={Learning RoI Transformer for Detecting Oriented Objects in Aerial Images},
author={Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2019}
}
If you have any the problem or feedback in using DOTA, please contact
Million-AID is a new large-scale benchmark dataset containing a million instances for RS scene classification. There are 51 semantic scene categories in Million-AID. And the scene categories are customized to match the land-use classification standards, which greatly enhance the practicability of the constructed Million-AID. Different form the existing scene classification datasets of which categories are organized with parallel or uncertain relationships, scene categories in Million-AID are organized with systematic relationship architecture, giving it superiority in management and scalability. Specifically, the scene categories in Million-AID are organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unutilized land, and water area. The scene category network provides the dataset with excellent organization of relationship among different scene categories and also the property of scalability. The number of images in each scene category ranges from 2,000 to 45,000, endowing the dataset with the property of long tail distribution. Besides, Million-AID has superiorities over the existing scene classification datasets owing to its high spatial resolution, large scale, and global distribution.
If you make use of Million-AID, please cite our following work:
@Article{Long2021DiRS,
title={On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID},
author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
year={2021},
volume={14},
pages={4205-4230}
}
@misc{Long2022ASP,
title={Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling},
author={Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren Li},
year={2022},
eprint={2201.01953},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
If you have any the problem or feedback in using Million-AID, please contact
AID is a large-scale aerial image dataset, by collecting sample images from Google Earth imagery. Note that although the Google Earth images are post-processed using RGB renderings from the original optical aerial images, it has proven that there is no significant difference between the Google Earth images with the real optical aerial images even in the pixel-level land use/cover mapping. Thus, the Google Earth images can also be used as aerial images for evaluating scene classification algorithms.
The dataset is made up of the following 30 aerial scene types: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks and viaduct. All the images are labelled by the specialists in the field of remote sensing image interpretation, and some samples of each class are shown in Fig.1. In all, the AID dataset has a number of 10000 images within 30 classes.
The images in AID are actually multi-source, as Google Earth images are from different remote imaging sensors. This brings more challenges for scene classification than the single source images like UC-Merced dataset. Moreover, all the sample images per each class in AID are carefully chosen from different countries and regions around the world, mainly in China, the United States, England, France, Italy, Japan, Germany, etc., and they are extracted at different time and seasons under different imaging conditions, which increases the intra-class diversities of the data.
If you make use of AID, please cite our following work:
@Article{Xia2017AID,
title={AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification},
author={Gui-Song Xia and Jingwen Hu and Fan Hu and Baoguang Shi and Xiang Bai and Yanfei Zhong and Liangpei Zhang and Xiaoqiang Lu},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2017},
volume={55},
number={7},
pages={3965-3981}
}
If you have any the problem or feedback in using AID, please contact
WHU-RS19 is a set of satellite images exported from Google Earth, which provides high-resolution satellite images up to 0.5 m. Some samples of the database are displayed in the following picture. It contains 19 classes of meaningful scenes in high-resolution satellite imagery, including airport, beach, bridge, commercial, desert, farmland, footballfield, forest, industrial, meadow, mountain, park, parking, pond, port, railwaystation, residential, river, and viaduct. For each class, there are about 50 samples. It’s worth noticing that the image samples of the same class are collected from different regions in satellite images of different resolutions and then might have different scales, orientations and illuminations.
If you make use of WHU-RS19 please cite our following work:
@InProceedings{Xia2010WHURS19,
title={Structural high-resolution satellite image indexing},
author={Gui-Song Xia and Wen Yang and Julie Delon and Yann Gousseau and Hong Sun and Henri MaÎtre},
journal={ Symposium: 100 Years ISPRS - Advancing Remote Sensing Science},
year={2010},
address={Vienna, Austria},
}
@Article{Dai2011WHURS19,
title={Satellite Image Classification via Two-Layer Sparse Coding With Biased Image Representation},
author={Dengxin Dai and Wen Yang},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2011},
volume={8},
number={1},
pages={173-176}
}
NaSC-TG2 (Natural Scene Classification with Tiangong-2 Remotely Sensed Imagery) is a novel benchmark dataset for remote sensing natural scene classification built from Tiangong-2 remotely sensed imagery. The goal of this dataset is to expand and enrich the annotation data for advancing remote sensing classification algorithms, especially for the natural scene classification. The dataset contains 20,000 images, which are equally divided into 10 scene classes: beach, circle farmland, cloud, desert, forest, mountain, rectangle farmland, residential, river, and snowberg. Each scene includes 2,000 images with a size of 128×128 pixels and a spatial resolution of 100 m. Compared with other datasets collected from the Google Earth, the NaSC-TG2 has abundant natural scenes with novel spatial scale and imaging performance. In addition to true-color RGB images, the NaSC-TG2 dataset also covers the corresponding 14-band multi-spectral scene images providing valuable experimental data for research on high-dimensional scene image classification algorithms.
If you make use of NaSC-TG2 please cite our following work:
@Article{Zhou2021NaSCTG2,
title={NaSC-TG2: Natural Scene Classification With Tiangong-2 Remotely Sensed Imagery},
author={Zhuang Zhou and Shengyang Li and Wei Wu and Weilong Guo and Xuan Li and Guisong Xiaand Zifei Zhao},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
year={2021},
volume={14},
pages={3228-3242}
}
If you have any the problem or feedback in using NaSC-TG2, please contact
GID is large-scale land-cover dataset with Gaofen-2 (GF-2) satellite images. This new dataset, which is named as Gaofen Image Dataset (GID), has superiorities over the existing land-cover dataset because of its large coverage, wide distribution, and high spatial resolution. GID consists of two parts: a large-scale classification set and a fine land-cover classification set. The large-scale classification set contains 150 pixel-level annotated GF-2 images, and the fine classification set is composed of 30,000 multi-scale image patches coupled with 10 pixel-level annotated GF-2 images. The training and validation data with 15 categories is collected and re-labeled based on the training and validation images with 5 categories, respectively.
If you make use of GID, please cite our following work:
@Article{Tong2020GID,
title={Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models},
author={Xin-Yi Tong, Gui-Song Xia, Qikai Lu, Huangfeng Shen, Shengyang Li, Shucheng You, Liangpei Zhang},
journal={Remote Sensing of Environment},
year={2020},
volume={237},
pages={111322}
}
If you have any the problem or feedback in using GID, please contact
SECOND is a large-scale aerial image dataset for semantic change detection (SCD). In order to set up a new benchmark for SCD problems with adequate quantities, sufficient categories and proper annotation methods, in this paper we present SECOND, a well-annotated semantic change detection dataset. To ensure data diversity, we firstly collect 4662 pairs of aerial images from several platforms and sensors. These pairs of images are distributed over the cities such as Hangzhou, Chengdu, and Shanghai. Each image has size 512 x 512 and is annotated at the pixel level. The annotation of SECOND is carried out by an expert group of earth vision applications, which guarantees high label accuracy. For the change category in the SECOND dataset, we focus on 6 main land-cover classes, i.e. , non-vegetated ground surface, tree, low vegetation, water, buildings and playgrounds , that are frequently involved in natural and man-made geographical changes. It is worth noticing that, in the new dataset, non-vegetated ground surface ( n.v.g. surface for short) mainly corresponds to impervious surface and bare land. In summary, these 6 selected land-cover categories result in 30 common change categories (including non-change ). Through the random selection of image pairs, the SECOND reflects real distributions of land-cover categories when changes occur.
If you make use of SECOND, please cite our following work:
@Misc{Yang2020SECOND,
title={Semantic Change Detection with Asymmetric Siamese Networks},
author={Kunping Yang and Gui-Song Xia and Zicheng Liu and Bo Du and Wen Yang and Marcello Pelillo and Liangpei Zhang},
year={2020},
eprint={arXiv:2010.05687}
}
If you have any the problem or feedback in using SECOND, please contact
UAVid dataset is an UAV video dataset for semantic segmentation task focusing on urban scenes. As a new high-resolution UAV semantic segmentation, UAVid dataset brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 42 video sequences capturing 4K high-resolution images in slanted views. In total, 300 images have been densely labeled for the semantic labeling task. There are 8 semantic categories in UAVid, including Building, road, static car, tree, low vegetation, human, moving car, background clutter.
If you make use of UAVid, please cite our following works:
@Article{LYU2020108,
title={UAVid: A semantic segmentation dataset for UAV imagery},
author={Ye Lyu and George Vosselman and Gui-Song Xia and Alper Yilmaz and Michael Ying Yang},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
year={2020},
volume={165},
pages={108-119}
}
@misc{1810.10438,
Title={The UAVid Dataset for Video Semantic Segmentation},
Author={Ye Lyu and George Vosselman and Guisong Xia and Alper Yilmaz and Michael Ying Yang},
Year={2018},
Eprint={arXiv:1810.10438},
}
If you have any the problem or feedback in using UAVid, please contact
iSAID is the first benchmark dataset for instance segmentation in aerial images. This large-scale and densely annotated dataset is built on the basis of DOTA-v1.0 dataset. It contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The distinctive characteristics of iSAID are the following: (a) large number of images with high spatial resolution, (b) fifteen important and commonly occurring categories, (c) large number of instances per category, (d) large count of labelled instances per image, which might help in learning contextual information, (e) huge object scale variation, containing small, medium and large objects, often within the same image, (f) Imbalanced and uneven distribution of objects with varying orientation within images, depicting real-life aerial conditions, (g) several small size objects, with ambiguous appearance, can only be resolved with contextual reasoning, (h) precise instance-level annotations carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.
If you make use of iSAID, please cite our following works:
@InProceedings{waqas2019isaid,
title={iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images},
author={Waqas Zamir, Syed and Arora, Aditya and Gupta, Akshita and Khan, Salman and Sun, Guolei and Shahbaz Khan, Fahad and Zhu, Fan and Shao, Ling and Xia, Gui-Song and Bai, Xiang},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
pages={28--37},
year={2019}
}
@InProceedings{Xia_2018_CVPR,
title={DOTA: A Large-Scale Dataset for Object Detection in Aerial Images},
author={Gui-Song Xia and Xiang Bai and Jian Ding and Zhen Zhu and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2018}
}
If you have any the problem or feedback in using iSAID, please contact
School of Computer Science & State Key Lab. LIESMARS, Wuhan University
Luojia Hill, Bayi Road, Wuhan, 430079, China.
guisong.xia@whu.edu.cn
027-68772503