AI-Assisted Genome Studies Are Riddled with Errors

Credits; TheScientist

The genome serves as the blueprint for the body, influencing every trait from the shape of the face to the arches of the feet, and even the development of certain diseases. While some disorders, like cystic fibrosis, are linked to single genes and can be reliably predicted based on a person’s genetic data, many others—such as autism spectrum disorder, Alzheimer’s disease, depression, and obesity— are not.

For the past 15 years, scientists have used genome-wide association studies (GWAS) to compare genomes of large groups of people to identify hundreds of thousands of genetic variants that are associated with a trait or disease. This method has helped scientists unravel the underlying biology and risk factors of complex diseases and has also led to the discovery of novel drug targets. Despite these advancements, GWAS studies have their limitations, which scientists have tried to address with the help of artificial intelligence (AI). However, in two studies published in Nature Genetics, researchers at the University of Wisconsin-Madison identified pervasive biases these new approaches can introduce when working with large but incomplete datasets.

GWAS rely on large biobanks with extensive patient data. However, these repositories could be lacking anything from blood reports, scans, and patient history to family data. Even with a thorough survey, challenges such as the lack of data on late onset diseases in a cohort of young participants can throw a wrench into researchers’ plans.

To address gaps in the data, scientists developed two approaches: machine learning and GWAS-by-proxy (GWAX), which relies on family history data as predictors of late-onset diseases. Many researchers combine GWAS and GWAX to improve the statistical power of their predictions. However, the University of Wisconsin-Madison research team has found that these “solutions” can erroneously link gene variants with diseases.

“It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data,” said Qiongshi Lu, a biostatistician at the University of Wisconsin-Madison and coauthor of the studies, in a press release.

 

 

By Sahana Sitaraman, PhD

Article can be accessed on: The Scientist