Delving into the Dark Code

Genome exploration tool Orion digs into non-coding regions for disease-causing mutations. Developer Ayal Gussow explains…

September 2017

What inspired you to develop Orion?

A large part of our research focuses on detecting disease-causing mutations in patients. As the non-coding regions of the genome are so poorly understood, mutations within those regions are generally ignored, even though they may be causing the disease. There was a clear need for a method that can assess whether a non-coding mutation is pathogenic.

Prior to Orion, we developed similar methods for localizing disease-causing mutations within protein subunits (1) and untranslated regions (2). Tackling the whole genome was a natural next step (3).

The process of developing the tool was quite enjoyable. Given the challenge posed by the vast size of whole genome – three billion bases! – and the dearth of knowledge of the non-coding genome, we needed to properly harness computational resources and population genetics theory to develop Orion. Our team consisted of talented programmers and computational biologists, and working together to resolve these challenges was both interesting and exciting.

How does Orion work?

Orion is based on a cohort of 1,662 control genomes. We scanned this cohort’s genomes for regions that comparatively appear to have fewer mutations than we would expect. The underlying assumption of this approach is that regions with fewer observed mutations than expected are biologically important and therefore would likely cause disease if mutated.

Thus far, we have demonstrated that the Orion regions are significantly enriched for previously discovered pathogenic mutations. We anticipate that researchers can immediately implement Orion to detect new pathogenic mutations in patients.

In my view, our most striking finding was the comparison between conservation and intolerance. Conservation (as measured by GERP++) evaluates purifying selection across the mammalian lineage, whereas intolerance (as measured by Orion) evaluates purifying selection within humans. By comparing the two scores, we found that a subset of regulatory regions are very intolerant, but not very conserved – that is, these regions appear to be undergoing purifying selection in the human, but not the mammalian, lineage. What does that mean? That those regions are uniquely important to humans. It would be interesting to follow up by exploring their biological functions. And of course, from a clinical perspective, these results illustrate Orion’s value in detecting important, disease-relevant regions that cannot be spotted via conservation-based methods. 

How can pathologists use Orion?

We anticipate that Orion will point researchers to interesting portions of the genome that have unclear function, but are clearly under purifying selection. It would be interesting to explore these regions to better understand the biology of the non-coding genome.

We have created a web tool ( that allows researchers to view and extract scores, along with links to download the full set of scores and regions. Orion can easily be implemented into a mutation analysis pipeline, either by analyzing the scores of potential disease-causing mutations or by flagging mutations that fall in Orion intolerant regions. We have a great team and are always happy to help researchers who want to work with Orion, so feel free to email us with questions.

In the clinic, Orion can already help assess the likelihood that a mutation is pathogenic. That said, we are currently working on applying Orion to a much larger cohort, which should vastly improve our resolution. And as more researchers and laboratory medicine professionals use Orion with their data, we will learn more about how best to deploy it and what type of improvements we may want to implement.

What are the next steps for Orion?

The other intolerance methodologies we previously developed for the exonic portion of the genome are based on several thousand to tens of thousands of samples – but because there are far fewer whole-genome samples publicly available, Orion is based on fewer than 2,000 samples. Adding more will greatly improve our resolution and the tool’s overall utility.

We are also analyzing the effects of mutations in Orion regions on gene expression, and using Orion to explore other features of the non-coding genome, such as lncRNAs.

  1. AB Gussow et al., “The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes”, Genome Biol, 17, 9 (2016). PMID: 26781712.
  2. S Petrovski et al., “The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivitiy”, PLoS Genet, 11, e1005492 (2015). PMID: 26332131.
  3. AB Gussow et al., “Orion: Detecting regions of the human noncoding genome that are intolerant to variation using population genetics”, PLoS One, 12, e0181604 (2017). PMID: 28797091.