Overview of SAVANA. Credit: Nature Methods (2025). DOI: 10.1038/s41592-025-02708-0
Long-read sequencing technologies analyse long, continuous stretches of DNA. These methods have the potential to improve researchers’ ability to detect complex genetic alterations in cancer genomes. However, the complex structure of cancer genomes means that standard analysis tools, including existing methods specifically developed to analyse long-read sequencing data, often fall short, leading to false-positive results and unreliable interpretations of the data.
To address this challenge, researchers developed SAVANA, a new algorithm which they describe in the journal Nature Methods. SAVANA uses machine learning to accurately identify structural variants, large genomic alterations such as insertions, deletions, duplications, or rearrangements—and the resulting copy number aberrations in cancer genomes, using long-read sequencing data. This algorithm was developed and tested across 99 human tumour samples by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the R&D laboratory of Genomics England, in collaboration with clinical partners at University College London (UCL), the Royal National Orthopedic Hospital (RNOH), Instituto de Medicina Molecular João Lobo Antunes, and Boston Children’s Hospital.
The team also compared SAVANA’s results from long-read data with Illumina sequencing of the same samples analyzed using a whole-genome sequencing data analysis pipeline used to deliver clinical reports. The findings were highly consistent across technologies, demonstrating that SAVANA performs on par with current clinical standards while revealing additional cancer-relevant alterations.
______________________________________________________________________________________________________________
By European Molecular Biology Laboratory, edited by Sadie Harley, reviewed by Robert Egan
Article can be accessed on: MedicalXpress