Developer Blog

AI / Deep Learning | HPC |

Deep Learning Classifies Largest-Ever Catalog of Distant Galaxies

University of Pennsylvania researchers have used convolutional neural networks to catalog the morphology of 27 million galaxies, giving astronomers a massive dataset for studying the evolution of the universe. 

“Galaxy morphology is one of the key aspects of galaxy evolution,” said study author Helena Domínguez Sánchez, former postdoc at Penn. “The shape and structure of galaxies has a lot of information about the way they were formed, and knowing their morphologies gives us clues as to the likely pathways for the formation of the galaxies.”

While past research projects have focused on classifying images of bright, nearby galaxies, the team focused their neural network on fainter, further galaxies captured by the Dark Energy Survey, an international project to image an eighth of the sky. 

The further away a galaxy is from the Milky Way, the longer it takes for light to reach our corner of the universe. So images from the Dark Energy Survey, which contains more images of distant galaxies than previous studies, “show us what galaxies looked like more than 6 billion years ago,” said Mariangela Bernardi, professor in the Department of Physics and Astronomy at Penn. 

While the researchers already had a CNN that could categorize galaxies as spiral or elliptical, the model had been trained on nearby galaxies captured in the Sloan Digital Sky Survey. To teach the neural network to process further, more pixelated images from the Dark Energy Survey, the team collected a labeled dataset of 20,000 galaxies from both astronomical surveys, where the morphological classifications were already known. 

They then created a synthetic dataset that simulated how the images would look if they depicted galaxies that were further away.

Simulated spiral and elliptical galaxy images illustrate how fainter and more distant galaxies would look in the Dark Energy Survey dataset.

Once trained on a combination of simulated and real galaxy images, the CNN was applied to the massive Dark Energy Survey dataset, cataloging 27 million galaxies as either early-type or late-type galaxies, and as face-on or edge-on images. 

The team used NVIDIA GPUs on Amazon Web Services for training and inference of their neural network. They found the model was 97 percent accurate at classifying the morphology of even faint galaxies too difficult to categorize by eye. 

The resulting collection is the largest multi-band catalog of automated galaxy morphologies to date.

“We pushed the limits by three orders of magnitude, to objects that are 1,000 times fainter than the original ones,” said lead author Jesús Vega-Ferrero. “That is why we were able to include so many more galaxies in the catalog.”

The researchers are next combining the morphological classification predictions with additional factors including the age, mass, distance, star-formation rate, and chemical composition of the galaxies to enable a better understanding of the relationship between galaxy morphology and star formation.

Find the full study in Monthly Notices of the Royal Astronomical Society. A preprint of the paper is available on ArXiv.