Subscribe to Newsletter
Diagnostics Omics, Technology and innovation

Biobank Boost

Lili Milani. Credit: University of Tartu, Estonia 
 

Founded in 2000, the Estonian Biobank has expanded in both size and importance over the last two decades. Approximately 20 percent of the adult population of Estonia  – around 212,000 adult individuals – have now joined the biobank. The volunteers’ DNA has been genotyped using either microarrays or, for a small subset (around 3,000 samples), short-read sequencing technology. In addition to its genetic research, the biobank is also used as a testing site for the development of personalized health care. 

Several recall studies have been carried out based on the biobank’s monogenic findings, polygenic risk score results, and findings in pharmacogenetics. Biobank participants have then been invited into disease-specific studies and investigated by clinicians. The project has demonstrated a genetics-first approach to health care that leaders now want to implement on a national scale. 

Recently, the Biobank received a funding boost from the European Commission, with an equal investment from the Estonian government. These additional funds have enabled set-up of a new center for personalized medicine in Estonia, as well as investment in long-read sequencing technology that will be used to expand the genomic database.

To find out more about this project – and what the cash injection could mean – we spoke with Lili Milani, Head of Estonian Biobank and Professor of Pharmacogenomics, at the University of Tartu, Estonia, and genomics expert Neil Ward, Vice President and General Manager EMEA at PacBio.

Neil Ward. Credit: PacBio

Who is eligible to join the Estonian Biobank?
 

Lili Milani (LM): It’s volunteer based – any adult can choose to join the biobank. Eligibility is not dependent on any pre-existing conditions or enrichment of specific diseases. 

When participants join the Biobank, they sign a broad informed consent that allows us to update their health records annually. That includes data from electronic health records, national health registries, and the health insurance fund’s database. In this way, we track all the participants’ diagnoses, medical procedures, laboratory measurements, medications, and so on – all on one big eHealth database, thanks to single payer health insurance in Estonia.

Now the Biobank has received more funding, what will the next steps be?
 

LM: This investment will enable us to set up a center for personalized medicine in Estonia, in collaboration with partners from the Netherlands and Finland. It will also allow genomic sequencing of a further 10,000 biobank participants. The purpose is both to maximize clinical findings in the genomes themselves and to build an improved genotype imputation reference panel. 

The funding has also enabled us to upgrade to long-read sequencing technology. We will use the long-read sequencing data to create a population-specific reference for rare variants, structural variants, and any other relevant genetic variation that we can then impute into the genotype data.

How has the biobank impacted healthcare in Estonia?
 

LM: So far we have used the findings mostly for research. But what’s unique about the Estonian Biobank is that we share individuals’ results directly with them. For comparison, the UK Biobank is number one for research, but they do not run genetics-based recall studies and they don’t return results to the participants. 

Our first two studies focused on breast cancer mutation carriers and people with familial hypercholesterolemia – common monogenic disorders, where carrying a mutation in one gene results in a considerably higher risk for breast cancer or cardiovascular disease. We identified those individuals in the biobank with mutations in these genes and invited them to participate in a study. The study findings were validated and returned to the individuals. Overall we found that, although most of them had family history of early onset breast cancer or cardiovascular death, very few of them had actually been referred to a medical geneticist for further review. So most of the participants were completely unaware of the risk. These findings really brought oncologists and cardiologists on board – they demonstrated that the biobank is a great resource for identifying people at high risk for disease. 

The next step we took was to find polygenic risk scores – both for breast cancer and cardiovascular disease. A thousand individuals with high risk for breast cancer or cardiovascular disease participated in the studies. They were referred to an oncologist for mammography screening or to a cardiologist for a cardiovascular health review, prescribed preventive medications, and offered new treatment plans. 

The success of this trial has led to the roll out of a national polygenic screening program for breast cancer. It has also led to a reduction in the age threshold for regular calls for mammographic screening for people with high polygenic risk for breast cancer.

If we can catch cancer early, there is a much better prognosis for the treatment. I’m also interested in pharmacogenetics – and that’s another reason why we invested in long-read sequencing; genes involved in drug metabolism are very complex and highly similar to each other. We have had difficulties in mapping them correctly based on genotype data or short-read sequencing. 

Credit: PacBio

What are the advantages of long-read genomic sequencing technology?
 

Neil Ward (NW): The Estonian Biobank, and many biobanks around the world, have typically carried out genetic analysis with microarray data. That technique uses small glass slides with spots of DNA on them that allow researchers to look at around a million locations in the genome prone to variation. That can be very informative across many diseases, but there are a lot of other parts of the human genome that are difficult to assess accurately within the microarray technology or short-read sequencing technology. 

Long-read sequencing technology allows us to look at very long stretches of DNA – typically 15,000 to 20,000 base pairs of fragments – very accurately at any one point in time. That is often sufficient to understand different genetic variations inherited from parents, for example. We can understand those differences much more easily with our technology than with short-read sequencing. 

A great many genes in the human genome have copies or gene families that look very similar to one another. Having long-reads, with accurate sequencing, allows us to understand the subtle genetic variations between genes that are very closely related to one another. Sometimes you can have genes that are 99 percent similar but have different functions in the human genome. Long-read sequencing allows us to disambiguate what would otherwise be a sort of mixed signal on those similar genes. The pharmacogenomics genes typically fall into that category, and there are a number of other conditions and genes where the information generated from long reads is particularly useful.

Credit: PacBio

What are the advantages of long-read genomic sequencing technology?
 

Neil Ward (NW): The Estonian Biobank, and many biobanks around the world, have typically carried out genetic analysis with microarray data. That technique uses small glass slides with spots of DNA on them that allow researchers to look at around a million locations in the genome prone to variation. That can be very informative across many diseases, but there are a lot of other parts of the human genome that are difficult to assess accurately within the microarray technology or short-read sequencing technology. 

Long-read sequencing technology allows us to look at very long stretches of DNA – typically 15,000 to 20,000 base pairs of fragments – very accurately at any one point in time. That is often sufficient to understand different genetic variations inherited from parents, for example. We can understand those differences much more easily with our technology than with short-read sequencing. 

A great many genes in the human genome have copies or gene families that look very similar to one another. Having long-reads, with accurate sequencing, allows us to understand the subtle genetic variations between genes that are very closely related to one another. Sometimes you can have genes that are 99 percent similar but have different functions in the human genome. Long-read sequencing allows us to disambiguate what would otherwise be a sort of mixed signal on those similar genes. The pharmacogenomics genes typically fall into that category, and there are a number of other conditions and genes where the information generated from long reads is particularly useful.

How do long-read and short-read sequencing technologies compare in terms of time and cost?
 

NW: The modern sequencing technologies – or next-generation sequencing platforms – fall into two main categories. Short-read sequencers can often sequence hundreds of millions, or billions, of fragments, each of them a few hundred base pairs long. Long-read sequencers allow us to sequence millions of fragments of 10,000–20,000 base pairs or more in length. However, the amount of time needed to produce those two types of datasets is similar. 

Typically, the short-read methodologies “parallelize” better. They yield billions of measurements from a single glass slide on a single surface that can be imaged by the specialist equipment. Currently, the long-read platforms achieve an order of magnitude less numbers of reads, but longer read lengths. So, overall, similar amounts of data could be produced, but with a slightly different form factor. 

Historically, there has been quite a differential in price between generating short-read versus long-read sequencing genomes. And it is really only in the last year or so, with the latest technologies, that those price points have started to converge. This price trend means that initiatives like the Estonian Biobank can start projects that would previously have been unaffordable with long-read technology. In short, we can now achieve a superior accuracy and read length – and at a price that’s becoming closer to that of the older technologies.

Credit: PacBio

How can the environmental impacts of data storage be mitigated?
 

NW: As technology providers, we have been working hard to minimize the data footprint from our DNA sequencing machines, while retaining the useful information. Actually, the amount of data for one individual’s genome record these days is relatively modest – around 50 gigabytes of sequencing data. That is relatively small in comparison to the image files from a PET scan or some other imaging modalities in the healthcare space. To achieve the biological insights required by the Estonian biobank, software is required to analyze big data, but we will continue to work on keeping the data footprints as small as possible. We also want to make the computer algorithms as efficient as possible to minimize the computing cost.

LM: We also took environmental impact into account when deciding which sequencing technology to use. We also had to consider that the costs of data storage and analysis can be close to the cost of the sequencing itself. On the system we selected, the amount of data generated is approximately ten times smaller than others we evaluated, whilst retaining excellent data quality. 

NW: I would agree that success lies in the quality of the data. When using AI algorithms or other tools to try and find insights from the data sets – the cleaner the data in the first place, the lower the computing costs downstream. More accurate data requires fewer corrections for false positives and errors in the dataset.

In what ways does the UK Biobank differ from that in Estonia?
 

NW: The UK Biobank is certainly an internationally renowned research project that’s been hugely beneficial for the genomics community. If we go back 10 years, the UK Biobank was used in a similar way to the Estonian one – they used genotyping microarrays to assay their 500,000 participants, and learnt a lot about how genetic variants can predispose us to certain conditions – or protect us from certain diseases. It has been a fantastic resource. And more recently, as sequencing costs have come down, UK Biobank has transitioned from exome to whole-genome sequencing, using short-read technology, on all 500,000 participants. That data set is now being analyzed by pharmaceutical as well as academic researchers around the world, resulting in some great discoveries. 

The big difference with the UK Biobank is that there is no feedback mechanism for the participants to directly benefit from the learnings on an individual level. People are essentially making purely philanthropic donations for research. It’s exciting that the Estonian Biobank is not only doing the research within that cohort, but also giving meaningful information back to the participants about their own individual risks. What’s more, they are studying how people can interpret the findings and possibly modify their lifestyle or therapy choices to minimize their risk of disease. That’s quite an interesting difference between the two biobanks.

What are the implications of the Estonian Biobank findings for health policy and strategy across Europe and beyond?
 

LM: Firstly, I hope the findings will lead to more centers – like the one that is now funded by this European Commission and Estonian government grant – that can see personalized medicine through all the necessary steps. 

Secondly, we have a lot happening in the research area – we have found thousands of associations between different genetic variants and disease, or treatment outcomes, or adverse drug reactions, and so on – but there is too little funding for clinical implementation studies. We need to be able to use the results to create robust health care products based on research. So one of the other goals of our new funding is to run clinical studies – randomized clinical trials implementing new polygenic risk scores or a randomized study of pharmacogenetics implementing genetic testing prior to prescribing drugs, for example.

Ultimately, we are trying to take all the necessary steps for building and implementing personalized medicine in Europe and internationally. This will require discovery, validation, cost-effectiveness analysis, and, finally, guidelines for clinical implementation of these new findings. And finally, of course, we need health care providers to support this with the funding and implementation of new health care services.

Receive content, products, events as well as relevant industry updates from The Pathologist and its sponsors.
Stay up to date with our other newsletters and sponsors information, tailored specifically to the fields you are interested in

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

About the Author
Helen Bristow

Combining my dual backgrounds in science and communications to bring you compelling content in your speciality.

Register to The Pathologist

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:
  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts
  • Receive print (and PDF) copies of The Pathologist magazine

Register