Patents, Pandemics, and Sequencing
The rise of genetic sequencing amid the pandemic
Joe Henderson, Michael Roberts | | Longer Read
The COVID-19 pandemic has been a foundry for scientific achievements – and perhaps none more impressive than the use of mass genetic sequencing to track the spread of SARS-CoV-2 and identify mutations in it as they arise.
In spring 2020, a consortium known as COVID-19 Genomics UK (COG-UK) – consisting of National Health Service organizations, the four UK public health agencies, the Wellcome Sanger Institute, and more than 12 academic institutions – came into being. One of COG-UK’s aims was to perform genetic sequencing on as many positive COVID-19 test samples as possible. This has been a huge success, with over 440,000 genomes generated in the consortium’s first twelve months (1). Sequencing on this scale has contributed significantly to the UK’s efforts to identify and track the spread of viral variants.
Genetic sequencing has been used in earlier virus outbreaks – for instance, Ebola. However, mass sequencing on the scale seen in response to COVID-19 would, until recently, have been impossible. The cost of sequencing a human genome (admittedly much more complicated than that of SARS-CoV-2) has fallen from US$10 million per genome to less than $1,000 in the past 13 years (2) – and viral sequencing has followed suit. But now that low-cost, high-throughput sequencing is widely available, what lies ahead?
Sequencing the virus
By sequencing the RNA of SARS-CoV-2, we can gain insight into its features – including transmissibility, vaccine escape, and more. The first step of COG-UK’s sequencing process is to convert the viral RNA to a complementary DNA molecule via reverse transcription. Then, to sequence that DNA, the consortium uses both “second-generation” and “third-generation” techniques.
In second-generation or “short-read” sequencing, the DNA strand is split up into many fragments and each fragment is sequenced. This reduces error rates and sequencing time. A high throughput can be achieved if all of the fragments are sequenced in parallel – driving a reduction in cost and an increase in the speed of sequencing. For their short-read processes, COG-UK uses a process known as sequencing by synthesis (SBS).
In SBS, fragments of single-stranded DNA to be sequenced are amplified (important for quality control purposes). After amplification, each fragment undergoes a synthesis reaction whereby DNA polymerase builds complementary strands for each fragment. The bases used to build the complementary strands are each fluorescently tagged (see Figure 1) – with the tag colors allowing identification of each base and thus sequencing of the fragment. After synthesis, the full DNA sequence is built by piecing together fragments with overlapping sequences.
The SARS-CoV-2 genome is about 30,000 bases long (4). Because the fragments in short-read sequencing typically have a length of tens or hundreds of bases, many of them must be pieced together to yield a complete genome from short-read processes. This is computationally complex, especially if the sequence is not already well known. Clearly, avoiding or reducing the computational requirements of short-read sequencing is desirable – and that’s where third-generation sequencing comes in. Third-generation sequencing can produce much longer reads, thus removing or substantially reducing the computational complexity involved.
Upping the ante
Although long-read sequencing is still in its infancy, there are several commercially available technologies that can cheaply sequence long strands of DNA with acceptable accuracy. COG-UK use nanopore sequencing, which involves a nanometer-scale protein pore in a membrane surrounded by an electrolyte solution. An electrical field applied to the membrane drives a constant flow of charged particles from the electrolyte solution through the pore; this flow of charged particles can be detected as an electrical current.
During sequencing, a DNA strand is drawn through the nanopore. Once in the pore, the DNA restricts the flow of charged particles, causing a drop in the detected current. Because the bases have different geometries, sizes, and chemical compositions, the current drop changes as each base passes through the pore.
Most nanopore sequencing systems comprise an array of nanopores and so can sequence multiple strands in parallel. This leads to three benefits: i) much lower computational requirements, ii) effective, real-time sequencing without the need for amplification, and iii) sequencing devices that can be made small and portable.
But the work isn’t over yet. We can expect further developments in long-read systems – for example, to reduce error rates and increase speed. Work is already being done to produce nanopore sequencing systems that don’t rely on protein nanopores. Solid-state nanopores, perhaps made of graphene, would allow for lower-cost nanopore sequencing at scale.
We can look at patent application numbers as a proxy for innovation and how sequencing will evolve. Patent applications must be filed before an invention can be made available to the public – so high numbers of patent applications indicate that exciting developments are in the pipeline. The Wellcome Sanger Institute has identified five companies whose sequencing equipment they use; Figure 3 below shows the number of PCT patent applications those companies published each year in the categories of “measuring or testing processes involving enzymes, nucleic acids, or microorganisms” and “investigative methods involving various entities including entities relevant to sequencing.”
The patent filings from these companies indicates that we can look forward to significant innovation in this field in the coming years.
The benefits of genetic sequencing for scientific research are self-evident. And, if COVID-19 is here to stay, the general public may become familiar with genetic sequencing systems, too – because sequencing can be used as an accurate and reliable test for COVID-19 in its own right. As sequencing systems become cheaper, quicker, and more portable, it’s quite possible that we will see them appear in venues like airports for COVID-19 testing purposes in the near future.
- COVID-19 Genomics UK Consortium, “What’s next for COG-UK?” (2021). Available at: https://bit.ly/3fhyV8j.
- National Human Genome Research Institute, “The cost of sequencing a human genome.” Available at: https://bit.ly/3ydcqd1.
- DMLapato, “Sequence by synthesis” (2015). Available at: https://bit.ly/2Su2ufa.
- RA Khailany et al., “Genomic characterization of a novel SARS-CoV-2,” Gene Rep, 19, 100682 (2020). PMID: 32300673.