Flash floods are some of the most deadly weather events, claiming over 5,000 lives annually. They are also notoriously difficult to predict. Google believes it has a solution by leveraging news articles.
Despite vast weather data collection, flash floods elude comprehensive measurement, unlike temperature and river flows monitored over time. This gap means even advanced deep learning models struggle to forecast them.
To address this, Google researchers employed Gemini, a large language model, to sift through five million news articles globally, identifying reports of 2.6 million floods, thereby creating a geo-tagged time series called “Groundsource.” This is the first use of language models for such work, according to Gila Loike, a Google Research product manager. The research and dataset were publicly released.
Using Groundsource as a baseline, researchers trained a Long Short-Term Memory (LSTM) neural network model to analyze global weather forecasts and predict flash flood probabilities.
Google’s model now highlights urban area risks in 150 countries on its Flood Hub platform, sharing data with global emergency response agencies. António José Beleza from the Southern African Development Community, who trialed the model, reported improved flood response times.
The model’s limitations include a low resolution, identifying risk over 20-square-kilometer areas, and lower precision than the US National Weather Service’s flood alert system, partly due to the lack of local radar data.
The project aims to benefit areas lacking the resources for costly weather infrastructure or comprehensive meteorological records.
“By aggregating millions of reports, the Groundsource dataset helps rebalance the map,” said Juliet Rothenberg, Google’s Resilience team program manager. “It allows extrapolation to less documented regions.”
Rothenberg noted the potential of using LLMs to develop data sets from qualitative sources for phenomena like heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech, which uses deep learning models to forecast river flows, sees Google’s contribution as part of a broader effort to build data for such forecasting models. Moutenot co-founded dynamical.org to curate machine learning-ready weather data for research.
“Data scarcity is a significant challenge in geophysics,” Moutenot stated. “There’s both an abundance and a lack of Earth data. This approach creatively addresses that gap.”
