• Home
  • Datasets  
    • DOTA
    • Million-AID
    • AID
    • WHU-RS19
    • NaSC-TG2
    • GID
    • SECOND
    • UAVid
    • iSAID
  • Benchmarks  
    • DOTA-v2.0
    • DOTA-v1.5
    • DOTA-v1.0
    • iSAID
    • UAVid
    • Million-AID
  • Challenges  
    • LUAI-ICCV'2021
    • IIAI-PRCV'2021
    • ODAI-CVPR'2019
    • SRIASRI 2019
    • ODAI-ICPR'2018
  • Workshops  
    • LUAI ICCV'2021
    • ODAI CVPR'2019
  • Contact

  • Latest News

    • 2021-05-12 A workshop and challenge on Learning to Understand Aerial Images (LUAI) will be held in conjunction with IEEE ICCV 2021!
    • 2021-04-30 A challenge on Intelligent Interpretation of Aerial Images (IIAI) in conjunction with PRCV 2021 will be held!
    • 2021-02-05 A new benchmark DOTA-v2.0, including dataset, code library, and 70 baselines, is released.
    • 2020-07-20 UAVid'2020, a dataset for UAV Video Semantic Segmentation, is now available online.

    Datasets

    DOTA
    Aerial Oriented Object Detection
    1.8 million instances, 11268 images, 18 classes.
    Million-AID
    Aerial Scene Classification
    1 million images (1002~ 300002), 51 classes.
    AID
    Aerial Scene Classification
    10k images (600 * 600), 30 classes.
    • WHU-RS19
      Aerial Scene Classification
      1005 images (600 * 600), 19 classes.
    • NaSC-TG2
      Aerial Scene Classification
      20000 images (128 * 128), 10 classes.
    GID
    Land Use Classification
    150 images (6800 * 7200), 15 classes.
    SECOND
    Aerial Change Detection
    4662 image pairs (512 * 512), 30 classes.
    UAVid
    UAV Video Semantic Segmentation
    42 video sequences (~4000 * 2160), 8 classes.
    iSAID
    Aerial Image Instance Segmentation
    655451 instances, 2806 images, 15 classes.
  • DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The instances in DOTA images are annotated by experts in aerial image interpretation by arbitrary (8 d.o.f.) quadrilateral. We will continue to update DOTA, to grow in size and scope to reflect evolving real-world conditions. Now it has three versions:

    • DOTA-v1.0 contains 15 common categories, 2,806 images and 188, 282 instances. The proportions of the training set, validation set, and testing set in DOTA-v1.0 are 1/2, 1/6, and 1/3, respectively.
    • DOTA-v1.5 uses the same images as DOTA-v1.0, but the extremely small instances (less than 10 pixels) are also annotated. Moreover, a new category, ”container crane” is added. It contains 403,318 instances in total. The number of images and dataset splits are the same as DOTA-v1.0. This version was released for the DOAI Challenge 2019 on Object Detection in Aerial Images in conjunction with IEEE CVPR 2019.
    • DOTA-v2.0 collects more Google Earth, GF-2 Satellite, and aerial images. There are 18 common categories, 11,268 images and 1,793,658 instances in DOTA-v2.0. Compared to DOTA-v1.5, it further adds the new categories of ”airport” and ”helipad”. The 11,268 images of DOTA are split into training, validation, test-dev, and test-challenge sets. To avoid the problem of overfitting, the proportion of training and validation set is smaller than the test set. Furthermore, we have two test sets, namely test-dev and test-challenge. Training contains 1,830 images and 268,627 instances. Validation contains 593 images and 81,048 instances. We released the images and ground truths for training and validation sets. Test-dev contains 2,792 images and 353,346 instances. We released the images but not the ground truths. Test-challenge contains 6,053 images and 1,090,637 instances. The images and ground truths of test-challenge will be available only during the challenging.
    Data Download
    • Download DOTA-v1.0, DOTA-v1.5, DOTA-v2.0

    Citation

    If you make use of DOTA, please cite our following works:

    @misc{ding2021object,
    title={Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges},
    author={Jian Ding and Nan Xue and Gui-Song Xia and Xiang Bai and Wen Yang and Micheal Ying Yang and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
    year={2021},
    eprint={2102.12219},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }

    @InProceedings{Xia_2018_CVPR,
    title={DOTA: A Large-Scale Dataset for Object Detection in Aerial Images},
    author={Gui-Song Xia and Xiang Bai and Jian Ding and Zhen Zhu and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2018} }

    @InProceedings{Ding_2019_CVPR,
    title={Learning RoI Transformer for Detecting Oriented Objects in Aerial Images},
    author={Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2019}
    }

    Contact

    If you have any the problem or feedback in using DOTA, please contact

    • Jian Ding at: jian.ding@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • Million-AID is a new large-scale benchmark dataset containing a million instances for RS scene classification. There are 51 semantic scene categories in Million-AID. And the scene categories are customized to match the land-use classification standards, which greatly enhance the practicability of the constructed Million-AID. Different form the existing scene classification datasets of which categories are organized with parallel or uncertain relationships, scene categories in Million-AID are organized with systematic relationship architecture, giving it superiority in management and scalability. Specifically, the scene categories in Million-AID are organized by the hierarchical category network of a three-level tree: 51 leaf nodes fall into 28 parent nodes at the second level which are grouped into 8 nodes at the first level, representing the 8 underlying scene categories of agriculture land, commercial land, industrial land, public service land, residential land, transportation land, unutilized land, and water area. The scene category network provides the dataset with excellent organization of relationship among different scene categories and also the property of scalability. The number of images in each scene category ranges from 2,000 to 45,000, endowing the dataset with the property of long tail distribution. Besides, Million-AID has superiorities over the existing scene classification datasets owing to its high spatial resolution, large scale, and global distribution.

    Data Download
    • Download Million-AID

    Citation

    If you make use of Million-AID, please cite our following work:

    @Article{Long2021DiRS,
    title={On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID},
    author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
    journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
    year={2021},
    volume={14},
    pages={4205-4230}
    }

    @misc{Long2022ASP,
    title={Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling},
    author={Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren Li},
    year={2022},
    eprint={2201.01953},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }

    Contact

    If you have any the problem or feedback in using Million-AID, please contact

    • Yang Long at: longyang@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • AID is a large-scale aerial image dataset, by collecting sample images from Google Earth imagery. Note that although the Google Earth images are post-processed using RGB renderings from the original optical aerial images, it has proven that there is no significant difference between the Google Earth images with the real optical aerial images even in the pixel-level land use/cover mapping. Thus, the Google Earth images can also be used as aerial images for evaluating scene classification algorithms.

    The dataset is made up of the following 30 aerial scene types: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks and viaduct. All the images are labelled by the specialists in the field of remote sensing image interpretation, and some samples of each class are shown in Fig.1. In all, the AID dataset has a number of 10000 images within 30 classes.

    The images in AID are actually multi-source, as Google Earth images are from different remote imaging sensors. This brings more challenges for scene classification than the single source images like UC-Merced dataset. Moreover, all the sample images per each class in AID are carefully chosen from different countries and regions around the world, mainly in China, the United States, England, France, Italy, Japan, Germany, etc., and they are extracted at different time and seasons under different imaging conditions, which increases the intra-class diversities of the data.

    Data Download
    • Download AID

    Citation

    If you make use of AID, please cite our following work:

    @Article{Xia2017AID,
    title={AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification},
    author={Gui-Song Xia and Jingwen Hu and Fan Hu and Baoguang Shi and Xiang Bai and Yanfei Zhong and Liangpei Zhang and Xiaoqiang Lu},
    journal={IEEE Transactions on Geoscience and Remote Sensing},
    year={2017},
    volume={55},
    number={7},
    pages={3965-3981}
    }

    Contact

    If you have any the problem or feedback in using AID, please contact

    • Jingwen Hu at: hujingwen@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • WHU-RS19 is a set of satellite images exported from Google Earth, which provides high-resolution satellite images up to 0.5 m. Some samples of the database are displayed in the following picture. It contains 19 classes of meaningful scenes in high-resolution satellite imagery, including airport, beach, bridge, commercial, desert, farmland, footballfield, forest, industrial, meadow, mountain, park, parking, pond, port, railwaystation, residential, river, and viaduct. For each class, there are about 50 samples. It’s worth noticing that the image samples of the same class are collected from different regions in satellite images of different resolutions and then might have different scales, orientations and illuminations.

    Data Download
    • Download WHU-RS19

    Citation

    If you make use of WHU-RS19 please cite our following work:

    @InProceedings{Xia2010WHURS19,
    title={Structural high-resolution satellite image indexing},
    author={Gui-Song Xia and Wen Yang and Julie Delon and Yann Gousseau and Hong Sun and Henri MaÎtre},
    journal={ Symposium: 100 Years ISPRS - Advancing Remote Sensing Science},
    year={2010},
    address={Vienna, Austria},
    }

    @Article{Dai2011WHURS19,
    title={Satellite Image Classification via Two-Layer Sparse Coding With Biased Image Representation},
    author={Dengxin Dai and Wen Yang},
    journal={IEEE Transactions on Geoscience and Remote Sensing},
    year={2011},
    volume={8},
    number={1},
    pages={173-176}
    }

  • NaSC-TG2 (Natural Scene Classification with Tiangong-2 Remotely Sensed Imagery) is a novel benchmark dataset for remote sensing natural scene classification built from Tiangong-2 remotely sensed imagery. The goal of this dataset is to expand and enrich the annotation data for advancing remote sensing classification algorithms, especially for the natural scene classification. The dataset contains 20,000 images, which are equally divided into 10 scene classes: beach, circle farmland, cloud, desert, forest, mountain, rectangle farmland, residential, river, and snowberg. Each scene includes 2,000 images with a size of 128×128 pixels and a spatial resolution of 100 m. Compared with other datasets collected from the Google Earth, the NaSC-TG2 has abundant natural scenes with novel spatial scale and imaging performance. In addition to true-color RGB images, the NaSC-TG2 dataset also covers the corresponding 14-band multi-spectral scene images providing valuable experimental data for research on high-dimensional scene image classification algorithms.

    Data Download
    • Download NaSC-TG2

    Citation

    If you make use of NaSC-TG2 please cite our following work:

    @Article{Zhou2021NaSCTG2,
    title={NaSC-TG2: Natural Scene Classification With Tiangong-2 Remotely Sensed Imagery},
    author={Zhuang Zhou and Shengyang Li and Wei Wu and Weilong Guo and Xuan Li and Guisong Xiaand Zifei Zhao},
    journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
    year={2021},
    volume={14},
    pages={3228-3242}
    }

    Contact

    If you have any the problem or feedback in using NaSC-TG2, please contact

    • Zhuang Zhou at: zhouzhuang@csu.ac.cn
    • Shengyang Li at: shyli@csu.ac.cn
  • GID is large-scale land-cover dataset with Gaofen-2 (GF-2) satellite images. This new dataset, which is named as Gaofen Image Dataset (GID), has superiorities over the existing land-cover dataset because of its large coverage, wide distribution, and high spatial resolution. GID consists of two parts: a large-scale classification set and a fine land-cover classification set. The large-scale classification set contains 150 pixel-level annotated GF-2 images, and the fine classification set is composed of 30,000 multi-scale image patches coupled with 10 pixel-level annotated GF-2 images. The training and validation data with 15 categories is collected and re-labeled based on the training and validation images with 5 categories, respectively.

    Data Download
    • Download GID

    Citation

    If you make use of GID, please cite our following work:

    @Article{Tong2020GID,
    title={Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models},
    author={Xin-Yi Tong, Gui-Song Xia, Qikai Lu, Huangfeng Shen, Shengyang Li, Shucheng You, Liangpei Zhang},
    journal={Remote Sensing of Environment},
    year={2020},
    volume={237},
    pages={111322}
    }

    Contact

    If you have any the problem or feedback in using GID, please contact

    • Xinyi Tong at: xinyi.tong@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • SECOND is a large-scale aerial image dataset for semantic change detection (SCD). In order to set up a new benchmark for SCD problems with adequate quantities, sufficient categories and proper annotation methods, in this paper we present SECOND, a well-annotated semantic change detection dataset. To ensure data diversity, we firstly collect 4662 pairs of aerial images from several platforms and sensors. These pairs of images are distributed over the cities such as Hangzhou, Chengdu, and Shanghai. Each image has size 512 x 512 and is annotated at the pixel level. The annotation of SECOND is carried out by an expert group of earth vision applications, which guarantees high label accuracy. For the change category in the SECOND dataset, we focus on 6 main land-cover classes, i.e. , non-vegetated ground surface, tree, low vegetation, water, buildings and playgrounds , that are frequently involved in natural and man-made geographical changes. It is worth noticing that, in the new dataset, non-vegetated ground surface ( n.v.g. surface for short) mainly corresponds to impervious surface and bare land. In summary, these 6 selected land-cover categories result in 30 common change categories (including non-change ). Through the random selection of image pairs, the SECOND reflects real distributions of land-cover categories when changes occur.

    Data Download
    • Download SECOND

    Citation

    If you make use of SECOND, please cite our following work:

    @Misc{Yang2020SECOND,
    title={Semantic Change Detection with Asymmetric Siamese Networks},
    author={Kunping Yang and Gui-Song Xia and Zicheng Liu and Bo Du and Wen Yang and Marcello Pelillo and Liangpei Zhang},
    year={2020},
    eprint={arXiv:2010.05687}
    }

    Contact

    If you have any the problem or feedback in using SECOND, please contact

    • Kunping Yang at: kunpingyang@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • UAVid dataset is an UAV video dataset for semantic segmentation task focusing on urban scenes. As a new high-resolution UAV semantic segmentation, UAVid dataset brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 42 video sequences capturing 4K high-resolution images in slanted views. In total, 300 images have been densely labeled for the semantic labeling task. There are 8 semantic categories in UAVid, including Building, road, static car, tree, low vegetation, human, moving car, background clutter.

    Data Download
    • Download UAVid

    Citation

    If you make use of UAVid, please cite our following works:

    @Article{LYU2020108,
    title={UAVid: A semantic segmentation dataset for UAV imagery},
    author={Ye Lyu and George Vosselman and Gui-Song Xia and Alper Yilmaz and Michael Ying Yang},
    journal={ISPRS Journal of Photogrammetry and Remote Sensing},
    year={2020},
    volume={165},
    pages={108-119}
    }

    @misc{1810.10438,
    Title={The UAVid Dataset for Video Semantic Segmentation},
    Author={Ye Lyu and George Vosselman and Guisong Xia and Alper Yilmaz and Michael Ying Yang},
    Year={2018},
    Eprint={arXiv:1810.10438},
    }

    Contact

    If you have any the problem or feedback in using UAVid, please contact

    • Ye Lyu at: y.lyu@utwente.nl
    • Michael Ying Yang at: michael.yang@utwente.nl
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • iSAID is the first benchmark dataset for instance segmentation in aerial images. This large-scale and densely annotated dataset is built on the basis of DOTA-v1.0 dataset. It contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The distinctive characteristics of iSAID are the following: (a) large number of images with high spatial resolution, (b) fifteen important and commonly occurring categories, (c) large number of instances per category, (d) large count of labelled instances per image, which might help in learning contextual information, (e) huge object scale variation, containing small, medium and large objects, often within the same image, (f) Imbalanced and uneven distribution of objects with varying orientation within images, depicting real-life aerial conditions, (g) several small size objects, with ambiguous appearance, can only be resolved with contextual reasoning, (h) precise instance-level annotations carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.

    Data Download
    • Download iSAID

    Citation

    If you make use of iSAID, please cite our following works:

    @InProceedings{waqas2019isaid,
    title={iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images},
    author={Waqas Zamir, Syed and Arora, Aditya and Gupta, Akshita and Khan, Salman and Sun, Guolei and Shahbaz Khan, Fahad and Zhu, Fan and Shao, Ling and Xia, Gui-Song and Bai, Xiang},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
    pages={28--37},
    year={2019}
    }

    @InProceedings{Xia_2018_CVPR,
    title={DOTA: A Large-Scale Dataset for Object Detection in Aerial Images},
    author={Gui-Song Xia and Xiang Bai and Jian Ding and Zhen Zhu and Serge Belongie and Jiebo Luo and Mihai Datcu and Marcello Pelillo and Liangpei Zhang},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2018} }

    Contact

    If you have any the problem or feedback in using iSAID, please contact

    • Syed Waqas Zamir at: waqas.zamir@inceptioniai.org
    • Aditya Arora at: aditya.arora@inceptioniai.org
    • Akshita Gupta at: akshita.gupta@inceptioniai.org
    • Jian Ding at: jian.ding@whu.edu.cn
    • Gui-Song Xia at: guisong.xia@whu.edu.cn
  • CAPTAIN

    School of Computer Science & State Key Lab. LIESMARS, Wuhan University

    Luojia Hill, Bayi Road, Wuhan, 430079, China.

    guisong.xia@whu.edu.cn

    027-68772503



Luojia Hill, Bayi Road, Hongshan district, Wuhan, Hubei province, China
© CAPTAIN, School of Computer Science, Wuhan University