Paralogous genes arise from gene duplications, meaning they have nearly identical sequences. These aberrations are thought to be responsible for diverse conditions – from spinal muscular atrophy to red-green color blindness – but they have so far evaded detection with standard sequencing techniques. Until recently, researchers have been relying on indirect methods to analyze these regions, often leading to incomplete or inaccurate genetic profiles.
Now, a study published in Nature Communications introduces a computational tool that significantly enhances analysis of paralogous genes. The researchers demonstrate how this new method, powered by HiFi long-read sequencing, can resolve genetic variations that were previously difficult to detect, offering new insights into disease-related genes and population genetics.
The method – catchily known as “Paraphase” – works by aligning all haplotypes (gene copies inherited from parents) of a given set of paralogous genes, creating a phased, high-resolution genetic map. The researchers applied this approach to 316 genes across 160 duplicated regions of the genome, revealing unexpected variations in gene copy number and sequence diversity across different ancestral populations.
By analyzing whole-genome sequencing data from 259 individuals representing five ancestral groups, the study uncovered that certain genes have exceptionally low genetic diversity, suggesting they may be undergoing strong evolutionary pressure. Some genes also showed high mutation rates, including de novo gene conversions, where one gene copy adopts sequences from another, potentially influencing disease risk.
The study highlights nine medically significant genes, including those linked to spinal muscular atrophy (SMN1/SMN2), congenital adrenal hyperplasia (CYP21A2), and Lynch syndrome (PMS2). The ability to accurately distinguish genetic variants in these genes could improve genetic testing and disease diagnosis, reducing false negatives in clinical settings.
Corresponding author, Xiao Chen, said, “This study shows that with advanced sequencing technology and bioinformatics tools like Paraphase, it’s possible to untangle complex regions such as segmental duplications. By finally being able to accurately solve and study such hard-to-detect variants, the findings create the possibility of improving diagnostic yield for conditions underpinned by paralogous genes. The breakthrough also provides new opportunities for population genetics and understanding patient risk across generations.”