A new research paper published in the journal Nature Genomics shows how a Princeton University-led team developed a deep learning-based method for searching the human genome for mutations, including their associations with diseases such as autism.
“Many mutations in DNA that contribute to disease are not in actual genes but instead lie in the 99% of the genome once considered “junk.” Even though scientists have recently come to understand that these vast stretches of DNA do in fact play critical roles, deciphering these effects on a wide scale has been impossible until now,” the university wrote in a blog post.
In the new paper, the researchers analyzed the genomes of over 1,700 families in which one child has autism but other members do not. The system, sorted through 120,000 mutations to find the ones that affect the behavior of genes in people with autism.
Using NVIDIA Tesla P100 GPUs on local clusters at Princeton and the Flatiron Institute, with the cuDNN accelerated PyTorch deep learning framework, the researchers trained their convolutional neural network on data from the Human Gene Mutation Database.
“Applying this framework to 1,790 Autism Spectrum Disorder (ASD) simplex families reveals autism disease causality of noncoding mutations by demonstrating that ASD probands harbor transcriptional (TRDs) and post-transcriptional (RRDs) regulation-disrupting mutations of significantly higher functional impact than unaffected siblings,” the researchers stated in their paper. “Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development, and reveals a convergent genetic landscape of coding and noncoding (TRD and RRD) de novo mutations in ASD.”
The CNN can detect biologically relevant sections of DNA and can predict whether those sections play a role in the more than 2,000 protein interactions known to affect the regulation of genes. The algorithm can also detect whether disrupting a single pair of DNA unites would affect those protein interactions.
“This method provides a framework for doing this analysis with any disease,” said Olga Troyanskaya, professor of computer science and genomics and a senior author of the study.
The work has the potential to help with similar analyses for neurological disorders, heart disease, cancer, and many other conditions that don’t have a clear genetic cause.
“This transforms the way we need to think about the possible causes of those diseases,” said Troyanskaya.