Detecting change through multi-image, multi-date remote sensing is essential for developing and understanding of global conditions. Despite recent advancements in remote sensing realized through deep learning, novel methods for accurate multi-image change detection remain unrealized. Recently, several promising methods have been proposed to address this topic, but a lack of publicly available data limits the methods that can be assessed. In particular, there exists limited work on categorizing the nature and status of change across an observation period.

Mou et al[1] proposed a deep patch-based architecture wherein bi-temporal patches are processed in parallel by a series of dilated convolutional layers generating features which are then fed to a recurrent sub-network to learn sequential information. In the end, fully connected layers are used to create the change prediction map. Although accuracy, such patch-based methods are very time-consuming since they need to process every single pixel of the image individually

A number of methods use transfer learning to overcome the problem of scarcity of training data through transfer learning by using pre-trained networks to generate pixel descriptors [2,3]

Daudt et al. [4] suggested three different fully convolutional Siamese networks based on the U-Net architecture [25] aiming to address this problem and detect accurately the regions with changes. This approach, however, lacks the appropriate modeling of the data’s temporal pattern.

Papadomanolaki et al. [5] proposed a convolutional LSTM-based approach that takes in five dates to detect changes in the first and last date’s images. This method is resilient to seasonal changes, cloud covers, and shadows

Daudt et al. [6] proposed a weakly supervised change detection method using guided anisotropic diffusion and iterative learning.

: OSCD [7] dataset is a bi-date change detection dataset comprising 24 Sentinel-2 image pairs of two dates from 24 different cities, mostly European. Seasonal change, cloud cover, and shadows are present in some of the images. In total, there are 1769 change polygons

: Papadomanolaki et al. [5] extended the OSCD dataset to 5 dates by introducing three intermediate dates to show that by having more dates their proposed method is resilient to false positives occurring due to seasonal changes, cloud covers, and shadows.

 LEVIR-CD [6] is a large-scale remote sensing building change detection dataset. LEVIR-CD consists of 637 very high-resolution Google Earth image patch pairs with a size of 1024 × 1024. These bitemporal images with a time span of 5 to 14 years have significant land-use changes, especially the construction growth.

: HRSCD [7] contains 291 coregistered image pairs of RGB aerial images from IGS’s BD ORTHO database. Pixel-level change and land cover annotations are provided, generated by rasterizing 50 cm resolution images from Urban Atlas 2006, Urban Atlas 2012, and Urban Atlas 2006-2012 maps. Change and no-change labels are available along with six land cover classes for pixel-level LULC.

 We select 504 different locations from 100 different cities maximizing dataset coverage of all geographic and urban types. For each of these 504 locations, we sample satellite images for five different dates such that the duration between two consecutive date-pairs are almost equal. An attempt was made to introduce different months to capture seasonal patterns and have different seasons present on different dates for each location. Image selection was further restricted to images with no more than 5% cloud cover.

 For our baseline, we used 3-channel RGB Images, but also examined the contributions of the near-infrared (NIR) channel. These images were enhanced with a separate, higher-resolution panchromatic (grayscale) channel to double the original resolution of the multispectral imagery (i.e., ”pan-sharpened”). Each location is a tile of size 8192 × 8192 with a resolution of 0.45m × 0.45m ground sample distance. The 16-bit pan-sharpened RGB-NIR pixel intensities were truncated at 3000 and then rescaled to an 8-bit range.

 A team of 20 annotators annotated the ground truth details in two phases. In the first stage, each location is assigned to an annotator to annotate all change polygons present in that location from first and last date images. In the second stage, each annotated polygon is used to crop out images from five date images. A buffer was introduced around the region of interest to ensure that annotators had a perspective on the surroundings, enabling them to annotate change type, change status, geographic type, and urban type.

The details of the dataset can be observed here: 

https://engine.granular.ai/organizations/granular/projects/631e0974b59aa3b615b0d29a/overview

. We also have a paper published on our dataset: 

https://openaccess.thecvf.com/content/CVPR2021W/EarthVision/papers/Verma_QFabric_Multi-Task_Change_Detection_Dataset_CVPRW_2021_paper.pdf

We split the RGB dataset into training, validation, and test sets in the ratio of 70:20:10 by randomly selecting cities. Patches of size 512 × 512 are used by taking a striding window with stride size 512, and 6 different mask files are generated for each city grid: a change type mask and 6 change status masks. We randomly applied augmentations: 90-degree rotations, X and Y flips, imagery zooming of up to 25%, and linear brightness adjustments of up to 50%, to training images. We ran our experiments for each of the problem types: Change Detection, Change Type Classification, Change Status Tracking, and Neighbourhood Classification.

PyTorch is employed to implement all networks. In order to manage our experiments we use Polyaxon on a Kubernetes cluster and use three computing nodes with eight V100 GPU each.

[1]: Lichao Mou, Lorenzo Bruzzone, and Xiao Xiang Zhu.Learning spectral-spatial-temporal features via a recurrent convolutional neural network for change detection in multispectral imagery. CoRR, abs/1803.02642, 2018.

[2]: Arabi Mohammed El Amin, Qingjie Liu, and Yunhong Wang. Convolutional neural network features-based change detection in satellite images. In Xudong Jiang, Guojian Chen, Genci Capi, and Chiharu Ishll, editors, First International Workshop on Pattern Recognition, volume 10011, pages 181 – 186. International Society for Optics and Photonics, SPIE, 2016.

[3]: Ken Sakurada and Takayuki Okatani. Change detection from a street image pair using cnn features and superpixel segmentation. BMVC, pages 61.1–61.12, 2015.

[4]: Rodrigo Caye Daudt, Bertrand Le Saux, and Alexandre Boulch. Fully convolutional siamese networks for change detection. In ICIP, 2018.

[5]: M. Papadomanolaki, S. Verma, M. Vakalopoulou, S. Gupta, and K. Karantzalos. Detecting urban changes with recurrent neural networks from multitemporal sentinel-2 data. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pages 214–217, 2019

[6]: Hao Chen and Zhenwei Shi. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sensing, 12(10), 2020.

[7]: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, and Yann Gousseau. Multitask learning for large-scale semantic change detection. Computer Vision and Image Understanding, 187:102783, 2019.

[8]: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, and Yann Gousseau. Urban change detection for multispectral earth observation using convolutional neural networks. In IEEE International Geoscience and Remote Sensing Symposium, IGARSS. IEEE, 2018.

Curation of an accurate temporal multi-task change type and status detection dataset.

QFabric: Multi-Task Change Detection Dataset

Uploading files is one of the most basic features of web development. However, it is not without its complexities. Designing a reliable and fast upload feature for a large number of files is still challenging. The naive upload process sends bytes from a client to a server, which receives said bytes and writes them into a file. Essentially, this still remains the same regardless of the number of files or their size needed to upload. You start to run into issues with resource overload, namely CPU, RAM, and network when the number of files and their size increases.

The front end makes all the requests using the client’s resources. Sequential requests for all the files would be very slow, whereas simultaneous (concurrent) requests to upload would overwhelm the client’s resources. In the case of file I/O, reading them one by one is fine and slow, whereas loading too many crashes the client.

The same goes for the server as well. Sequentially serving upload acknowledgments is slow, whereas simultaneously receiving too many upload requests can crash the server as well, It’s more capable than a client’s computer, but there is a limit.

Then there is the expectation around the speed of upload. Say you have 100Mbps network bandwidth, to upload 100 files 10MB each, which would ideally take 

. But in reality, for each file, there has to be an IO operation and request, and response operations, which takes CPU time. This inflates the total time to upload a single file by a lot (can take up to a full minute per file as well), Numbers may vary but it could be as large as 40-60% CPU time and the rest over the network. The network bandwidth is only partially utilized. This only increases as the file size grows as well as the number of files.

We cannot ignore the possibility of network failure, RAM out-of-memory error or similar issues, that can interrupt the upload. Upload features for large data must provide reliability as well and not just speed, as the process takes time anyway. Re-doing a large upload is usually not a good option,

At last, there is the issue of the maximum size of payload a request can carry. If your requests are routed by Cloudflare, then 100MB is the limit.

The most obvious choice to improve upload speed is not to send requests sequentially and send them concurrently from UI. The CPU usage per file may look like a lot of overhead, but improving our network utilization by concurrent requests can improve the upload times by a lot. Another addition would be to chunk the files into a manageable payload size (e.g. 1MB). While this is counterintuitive to our discussion and the number of requests and responses by order, it offers two main benefits.

The first would be a clear view of the progress of the upload, this can be reliably tracked with acknowledgments for each chunk. With the payload much smaller, the CPU processing time for each request and response decreases substantially, and the payload can be sent over much more frequently than a large request. This isn’t a new solution and is referred to as a multipart upload process. Concurrently sending file chunks turns out to be reliable and very fast. On the client side, just by managing the maximum number of concurrent requests and file I/Os, we can prevent freezes and crashes as well. A naive approach like a hard limit on the number of requests or files loaded into memory also works.

Uploading a large number of files over the network is a challenge. The author provides an opinion piece on the insights gained while designing a reliable file upload feature.

Challenges in Building a Reliable Upload Feature

Recent developments in large-scale text and vision models have revolutionized downstream task solutions. These large-scale models are termed foundational models and their versatility can be attributed to the vast data requirements and high modeling capacity (large number of tunable parameters). An example would be the GPT, a large language model from OpenAI, which took the world by storm with its chat-based application ChatGPT. In our recently 

 2023 [1],  we look at the performance of such foundational models in one downstream task of image captioning using satellite imagery. The findings point towards subpar results (near-random performance).

Authors test the zero-shot performance of CLIP [2] and BLIP [3] language models, along with their image-encoder-based variants on remote-sensing datasets EuroSAT and BigEarthNet-S2 [4, 5]. EuroSAT is a Land User / Land cover dataset with 27k images and BigEarthNet-S2 large-scale multi-label dataset with 590,326 Sentinel-2 patches. More details of the experiments can be found in [1].

The above table shows the zero-shot performance of CLIP and BLIP with different backbone networks. ‘Standard’ refers to the out-of-the-box model and ‘Context’ refers to the fine-tuned model on geospatial datasets. It is worth noting that, while the standard model has subpar performance, adding geospatial context doesn’t necessarily improve performance and is subjective to the model and the dataset.

Akash Panigrahi, Sagar Verma, Matthieu Terris, Maria Vakalopoulou. Have Foundational Models Seen Satellite Images?. IGARSS 2023 - International Geoscience and Remote Sensing Symposium, IEEE, Jul 2023, Pasadena, United States. hal-04112634f

A. Radford, J. W. Kim, C. Hallacy, et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.

J. Li, D. Li, C. Xiong, and S. C. H. Hoi, “BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022.

P. Helber, B. Bischke, A. R. Dengel, and D. Borth, “EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification,” IEEE JSTARS, vol. 12, pp. 2217–2226,2017.

G. Sumbul, M. Charfuelan, B. Demir, and V. Markl, “BigEarthNet: A large-scale benchmark archive for remote sensing image understanding,” IGARSS, pp. 5901–5904, 2019.

The shortcomings of large-scale text and vision models when used in conjunction with satellite images are discussed in this article. Published in IGARSS 2023.

Foundational models and satellite imagery

Introduction

Experiments and Datasets

Results and Conclusion

References

Related Post

QFabric: Multi-Task Change Detection Dataset

Challenges in Building a Reliable Upload Feature

Try our platform for free