With as many as 800,000 forgotten oil and gas wells scattered across the US, researchers from Lawrence Berkeley National Laboratory (LBNL), have developed an AI model capable of accurately locating, at scale, wells that may be leaking toxic chemicals and greenhouse gases, like methane, into the environment.
The model is designed to identify many of the roughly 3.7M oil and gas wells dug in the US since the mid-1800s.
But its primary purpose is to help find a particular subset of wells: undocumented orphaned wells (UOWs).
These wells don’t appear on official records, and have no known owner, leaving no legal entity responsible for sealing these “orphans.” Moreover, these wells’ locations, especially ones drilled more than a century ago—when wellheads were often six inches in diameter—rarely appear in official databases that identify oil and gas wells.
Making matters worse, these leaky wells aren’t anomalous.
Across the roughly three million square miles of US land, there are an estimated 300,000 to 800,000 UOWs.
The only way to prevent potentially leaking wells from harming the environment is by sealing them—which is usually done with concrete.
But before a wellhead can be sealed, it must be found.
To accurately identify UOWs at scale, the team from LBNL trained a vision-language model, U-Net, on digitized maps of the US created between 1947 and 1992.
A key feature of these so-called “quadrangle” maps—which the US Geological Survey aggregated and digitized —-is their uniformity and georeferencing. Their symbols and coloring for things like wellheads, oil rigs, and forests are largely the same, and each symbol accurately corresponds to specific longitudinal and latitudinal locations.
“One great feature of these maps is that they’re extremely consistent throughout the entire surface of the United States,” said Fabio Ciulla, one of the Lawrence Berkeley researchers, and the lead author of a paper outlining their work with AI and UOWs. “We chose to use this particular set of historical topographic maps because we could investigate UOWs at a continental scale, using an approach that nobody has effectively done before.”
Using the National Energy Research Scientific Computing Center (NERSC) supercomputer at UC Berkeley, which is powered by more than 6,000 NVIDIA A100 Tensor Core GPUs, the researchers trained their wellhead-finding model on maps of two California counties—Los Angeles and Kern—which in the early-1900s were top oil and gas producing counties.

Before their model fine-tuning began, the researchers manually annotated 79 digitized, georeferenced maps of LA and Kern counties to ensure the maps accurately identified every wellhead symbol.
With those updated maps, the team fine-tuned their model on all the georeferenced maps of the two California counties.
To identify UOWs, the researchers cross-referenced wellheads their model identified on the historical quadrangle maps with locations in a database the state of California keeps of known wellheads in LA and Kern counties.
When the model identified a new wellhead that was more than 100 meters from a known wellhead, the researchers treated it as a potential UOW. Across the four counties in California and Oklahoma the researchers found 1,301 potential UOWs.
Using satellite imagery from Google Earth and in-person visits to some of the potential UOW sites, the research team worked to verify the accuracy of their wellhead-finding process.
They discovered that the model’s accuracy in identifying UOWs varied, ranging from 31% to 98%.
In more rural areas, the model was highly accurate in identifying UOWs. However, it was less accurate in urban areas where potential UOWs might now be paved over—making them harder to verify—or where the model made a mistake, confusing symbols for roundabouts or cul-de-sacs for wellheads.
Importantly, the model proved it had transferability.
After running their cross-reference tests on LA and Kern counties, the team used the same fine-tuned model to look for UOWs in Oklahoma’s Osage and Oklahoma counties. Similar to Kern and LA counties, at the end of the last century Osage and Oklahoma counties were two of the country’s largest oil and gas producing counties.
Having never “seen” the Oklahoma maps, the model nevertheless identified potential UOWs at a similar rate of accuracy as it did for Kern and LA counties.
“When we started thinking about our study, we wanted to find an algorithm that would scale to many regions across the US without having to retrain the model for many different locations,” said Charuleka Varadharajan, a staff scientist at LBNL, and senior author of the UOW study. “We saw that with the model trained just on the California maps, we still reached the same, if not higher precision in identifying potential UOWs in Oklahoma.”
The study is part of a Department of Energy program designed to help states identify UOWs.
Going forward, Ciulla and Varadharajan plan to continue to refine their model to expand to other locations, and work with states interested in using their work to identify UOWs.
Read the researcher’s paper on UOWs.
Check out additional reporting on the Berkeley researcher’s study as well as the Lawrence Berkeley Lab’s own report.