1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results
The 1st Workshop on Maritime Computer Vision (MaCVi)|2023 focused on maritime computer vision for Unmanned|Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV),|and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS|benchmarks. This report summarizes the main findings of the|individual subchallenges and introduces a new benchmark,|called SeaDronesSee Object Detection v2, which extends the|previous benchmark by including more classes and footage.|We provide statistical and qualitative analyses, and assess|trends in the best-performing methodologies of over 130|submissions. The methods are summarized in the appendix.|The datasets, evaluation code and the leaderboard are|publicly available (https://seadronessee.cs.uni-tuebingen.de/macvi).
Investigating Model Robustness Against Sensor Variation
Large datasets of geospatial satellite images are available online, exhibiting significant variations in both image quality and content. These variations in image quality stem from the image processing pipeline and image acquisition settings, resulting in subtle differences within datasets of images acquired with the same satellites. Recent progress in the field of image processing have considerably enhanced capabilities in noise and artifacts removal, as well as image super-resolution. Consequently, this opens up possibilities for homogenizing geospatial image datasets by reducing the intra-dataset variations in image quality. In this work, we show that conventional image detection and segmentation neural networks trained on geospatial data are robust neither to noise and artefact removal preprocessing, nor to mild resolution variations.
Detecting Urban Changes with Recurrent Neural Networks from Multitemporal Sentinel-2 Data
The advent of multitemporal high resolution data, like the Copernicus Sentinel-2, has enhanced significantly the potential of monitoring the earth's surface and environmental dynamics. In this paper, we present a novel deep learning framework for urban change detection which combines state-of-the-art fully convolutional networks (similar to U-Net) for feature representation and powerful recurrent networks (such as LSTMs) for temporal modeling. We report our results on the recently publicly available bi-temporal Onera Satellite Change Detection (OSCD) Sentinel-2 dataset, enhancing the temporal information with additional images of the same region on different dates. Moreover, we evaluate the performance of the recurrent networks as well as the use of the additional dates on the unseen test-set using an ensemble cross-validation strategy. All the developed models during the validation phase have scored an overall accuracy of more than 95%, while the use of LSTMs and further temporal information, boost the F1 rate of the change class by an additional 1.5%.
Sparsifying Networks via Subdifferential Inclusion
Sparsifying deep neural networks is of paramount interest in many areas, especially when those networks have to be implemented on low-memory devices. In this article, we propose a new formulation of the problem of generating sparse weights for a pre-trained neural network. By leveraging the properties of standard nonlinear activation functions, we show that the problem is equivalent to an approximate subdifferential inclusion problem. The accuracy of the approximation controls the sparsity. We show that the proposed approach is valid for a broad class of activation functions (ReLU, sigmoid, softmax). We propose an iterative optimization algorithm to induce sparsity whose convergence is guaranteed. Because of the algorithm flexibility, the sparsity can be ensured from partial training data in a minibatch manner. To demonstrate the effectiveness of our method, we perform experiments on various networks in different applicative contexts: image classification, speech recognition, natural language processing, and time-series forecasting.
Post Wildfire Burnt-up Detection using Siamese UNet
In this article, we present an approach for detecting burnt area due to wild fire in Sentinel-2 images by leveraging the power of Siamese neural networks. By employing a Siamese network, we are able to efficiently encode the feature extraction process for pairs of images. This is achieved by utilizing two branches within the Siamese network, which capture and combine information at different resolutions to make predictions. The weights are shared between these two branches in siamese networks. This design allows to effectively analyze the changes between two remote sensing images, enabling precise identification of areas impacted by forest wildfires in the state of California as part of ChaBuD challenge thereby assisting local authorities in effectively monitoring the impacted regions and facilitating the restoration process. We experimented with various model architectures to train ChaBuD dataset and carefully evaluated the performance. Through rigorous testing and analysis, we have achieved promising results, ultimately obtaining a final private score (IoU) of 0.7495 on the hidden test dataset. The code is available at https://github.com/kavyagupta/chabud. We also deploy the final model as a point solution for anyone to use at https://firemap.io.
FloodNet-to-FloodGAN: Generating Flood Scenes in Aerial Images
A global rise in the occurrences of natural disasters and human-borne conflicts has put a spotlight on the need for Earth Observation (EO) data in designing practical Humanitarian Assistance and Disaster Relief (HADR) interventions. Novel techniques that leverage remotely sensed data are leading to a paradigm shift in our understanding of such situations and improving the efficacy of our response. Aerial flood maps can provide localized insight into the extent of flood-related damage and the degree to which communities’ access to shelter, clean water, and communication channels have been compromised. Unfortunately, such insights typically only emerge hours or days after a flooding event has occurred. Moreover, a dearth of available historical data restricts the development of practical machine learning based methods. This work examines the use of Generative Adversarial Networks (GANs) in simulating flooding in aerial images. We first introduce the Houston UAV dataset, an extension of the FloodNet dataset. Our dataset accommodates more well-defined semantic classes and significantly reduces the label noise in semantic masks. We propose a GAN-based pipeline to generate flood conditions in non-flooded regions, generating synthetic flooding scenes for predictive mapping. Code and dataset are available at https://github. com/granularai/flood-synthesis.
Aligning Geospatial AI for Disaster Relief with The Sphere Handbook
The Sphere handbook and its core premise of right to life with dignity have been broadly adopted, establishing a standard operating procedure for global humanitarian intervention. Plenty of machine learning methods aim to aid in disaster relief. While performing exceptionally on a machine learning task, these methods fail to deliver targeted effort to the victims of natural disasters. We argue that this is due to the misalignment of such methods with real-world relief practices. This paper presents the alignment of the Sphere guidelines with Geospatial AI solutions. We show several limitations in different machine learning methods proposed for disaster relief in recent years. We take the case of WASH requirements during flood disasters, extend these models to align with Sphere guidelines, and build a solution that has a much better potential to serve individuals stuck in disasters.
GeoEngine: A Platform for Production-Ready Geospatial Research
Geospatial machine learning has seen tremendous academic advancement, but its practical application has been constrained by difficulties with operationalizing performant and reliable solutions. Sourcing satellite imagery in real-world settings, handling terabytes of training data, and managing machine learning artifacts are a few of the challenges that have severely limited downstream innovation. In this paper we introduce the GeoEngine platform for reproducible and production-ready geospatial machine learning research. GeoEngine removes key technical hurdles to adopting computer vision and deep learning-based geospatial solutions at scale. It is the first end-to-end geospatial machine learning platform, simplifying access to insights locked behind petabytes of imagery. Backed by a rigorous research methodology, this geospatial framework empowers researchers with powerful abstractions for image sourcing, dataset development, model development, large scale training, and model deployment. In this paper we provide the GeoEngine architecture explaining our design rationale in detail. We provide several real-world use cases of image sourcing, dataset development, and model building that have helped different organisations build and deploy geospatial solutions.
Have Foundational Models Seen Satellite Images?
This paper presents an investigation into the zero-shot performance of pre-trained foundation models on remote sensing tasks. Recent advances in self-supervised learning suggest that these models, when trained on vast amounts of unsupervised data, could potentially improve generalization across a number of downstream tasks. Our study offers an empirical evaluation of these models on standard remote-sensing benchmarks such as EuroSAT and BigEarthNet-S2, with the intent to confirm whether these models have encountered satellite imagery during their training phase. Moreover, we examine the impact of adding a geospatial domain-specific textual description of classes, contrasting it with the standard class-based prompts. Our findings indicate that the fine-tuned BLIP models exhibit superior zeroshot performance on these benchmarks compared to their standard counterparts, signifying that fine-tuning on standard benchmarks enhances performance. Furthermore, the addition of geospatial context variably influences performance depending on the specific model and dataset. This work provides crucial insights into the applicability of foundation models in remote sensing tasks and lays the groundwork for further research.
CertViT: Certified Robustness of Pre-Trained Vision Transformers
Lipschitz bounded neural networks are certifiably robust and have a good trade-off between clean and certified accuracy. Existing Lipschitz bounding methods train from scratch and are limited to moderately sized networks (< 6M parameters). They require a fair amount of hyper-parameter tuning and are computationally prohibitive for large networks like Vision Transformers (5M to 660M parameters). Obtaining certified robustness of transformers is not feasible due to the non-scalability and inflexibility of the current methods. This work presents CertViT, a two-step proximal-projection method to achieve certified robustness from pre-trained weights. The proximal step tries to lower the Lipschitz bound and the projection step tries to maintain the clean accuracy of pre-trained weights. We show that CertViT networks have better certified accuracy than state-of-the-art Lipschitz trained networks. We apply CertViT on several variants of pre-trained vision transformers and show adversarial robustness using standard attacks.
Investigating Large Vision Model Training Challenges on Satellite Datasets
Contrastive learning methods that bridge textual descriptions and images, such as Contrastive Language-Image Pre-training (CLIP), have demonstrated remarkable advancements. These foundational models have shown exceptional performance in tasks related to zero-shot image classification, as evidenced by their substantial enhancement of zero-shot ImageNet accuracy from the prior state-of-the-art of 12\% to an impressive 76\%. However, the exposure of these models to satellite images during training has been limited, resulting in suboptimal performance when dealing with geospatial data. This limitation raises a pivotal question: Can these foundational models, which have demonstrated potential across multiple domains, be trained on geospatial imagery out-of-box? To answer this question, we perform a study on training CLIP on diverse geospatial datasets. Within our research, we delve into unique challenges in this context and discuss the strategies we employ to address these challenges effectively. We demonstrate that handling resolution is crucial when training CLIP like models on a large multi-resolution dataset.
Europa: Increasing Accessibility of Geospatial Datasets
In this paper we introduce a novel platform for teams to develop rich, analysis-ready datasets for geospatial machine learning. Europa 1 1 https://europa.granular.ai addresses longstanding challenges that remote sensing and machine vision researchers face when developing datasets, including data sourcing, dataset development and sharing. By simplifying and accelerating the dataset creation process, Europa serves to expedite the pace of geospatial machine learning innovation. The platform enables users to develop feature-rich, spatio-temporal datasets using multiple sources of satellite imagery. Europa supports the development of datasets for segmentation, classification, object detection, and change detection problems. Europa also enables collaborative dataset development, with a management protocol for crowdsourcing labels and annotations. The web interface and API are built upon a resilient dataset management protocol that supports versioning, forking and access control, enabling greater research collaboration.
QFabric: Multi-Task Change Detection Dataset
Detecting change through multi-image, multi-date remote sensing is essential to developing an understanding of global conditions. Despite recent advancements in remote sensing realized through deep learning, novel methods for accurate multi-image change detection remain unrealized. Recently, several promising methods have been proposed to address this topic, but a paucity of publicly available data limits the methods that can be assessed. In particular, there exists limited work on categorizing the nature and status of change across an observation period. This paper introduces the first labeled dataset available for such a task. We present an open-source change detection dataset, termed QFabric, with 450,000 change polygons annotated across 504 locations in 100 different cities covering a wide range of geographies and urban fabrics. QFabric is a temporal multi-task dataset with 6 change types and 9 change status classes. The geography and environment metadata around each polygon provides context that can be leveraged to build robust deep neural networks. We apply multiple benchmarks on our dataset for change detection, change type and status classification tasks. Project page: https://engine.granular.ai/organizations/granular/projects/631e0974b59aa3b615b0d29a/overview
Shrink & Cert: Bi-level Optimization for Certified Robustness
In this paper, we advance the concept of shrinking weights to train certifiably robust models from the fresh perspective of gradient-based bi-level optimization. Lack of robustness against adversarial attacks remains a challenge in safety-critical applications. Many attempts have been made in literature which only provide empirical verification of the defenses to certain attacks and can be easily broken. Methods in other lines of work can only develop certified guarantees of the model robustness in limited scenarios and are computationally expensive. We present a weight shrinkage formulation that is computationally inexpensive and can be solved as a simple first-order optimization problem. We show that model trained with our method has lower Lipschitz bounds in each layer, which directly provides formal guarantees on the certified robustness. We demonstrate that our approach, Shrink & Cert (SaC) achieves provably robust networks which simultaneously give excellent standard and robust accuracy. We demonstrate the success of our approach on CIFAR-10 and ImageNet datasets and compare them with existing robust training techniques. Code : https: //github.com/sagarverma/BiC