Multidimensional Time-Series Analysis
I study and develop machine learning algorithms to understand large, multidimensional scientific datasets collected over time. These observations are modeled using machine learning architectures ideal for characterizing temporally dependencies — such as recurrent neural network models and long short term neural network variants, transformers, convolutional neural networks, state space models (Kalman and particle filters), etc. My approaches typically use unsupervised pre-training to model large unlabeled datasets which are then useful for various downstream tasks such as forecasting, gap-filling, anomaly and change detection, etc. These approaches have been used on different image time-series data from satellites and have been applied to detect natural disasters and recovery, land cover change, urbanization, electricity use, access and stability, power outages, socioeconomic disruptions such as COVID-19 induced changes in human activities, etc.
Anomaly Detection: ‘Needle in the Haystack’ of Large Unlabeled Data
For this research thrust I adapt anomaly detectors to search for ‘interesting’ or novel signals in large, multidimensional science datasets. These signals are often rare, yet of high scientific importance but challenging or almost infeasible to search for — like a need in a haystack. For these studies I use anomaly detectors that encompass a wide range of neural network architectures, traditional one-class outlier detectors, and dimensionality reduction methods to identify the anomalous samples in the datasets. These approaches are being applied to extract rare class signals of scientific interest such as wildfires, volcanoes, gas flares, solar emission, drifts in time-series from power outages, urbanization, and natural hazards.
Geospatial Foundation Models
My research also focuses on supporting multi-institute collaborations for foundation model development using satellite imagery in varying capacities for both model training, finetuning for downstream tasks, and red-teaming for supporting NASA’s operationalization efforts . This research thrust has applications in studying the Earth’s land and atmosphere (weather) using higher resolution imagery from Landsat/Sentinel and data from geostationary satellites such as GOES.
ML-Based Analysis Ready Datasets and Assisting Stakeholders:
The analysis derived from AI/ML have high scientific value resulting in labeled, analysis ready datasets for different applications and end users. Specifically, these datasets are derived from large, unlabeled datasets using unsupervised and self-supervised learning methods and produce targeted application specific products that can inform further studies. Some examples of ML-based products: thermal anomalies at night (related to fires, gas flares), global human activity changes due to COVID-19, global electricity access status; with ongoing studies focusing on offshore gas flares, volcanoes, power outages and global electric grid resilience data catalogs, etc. As a NASA Science Team and Implementation team member, I produce these datasets that are openly available for supporting stakeholder needs.