The Google Genome
The tech giant’s newest “moonshot” aims to create a complete genomic picture of the healthy human being
Michael Schubert |
At a Glance
- A new pilot study from Google[x] Life Sciences is collecting information to build a database of healthy human genomes
- The Baseline Study will record in-depth genomic and clinical laboratory data to identify new biomarkers for disease predilection or early onset
- The project raises concerns about ethics, privacy and commercial use of the data that Google is working hard to forestall
- Though controversial, if successful, this study could help us make the big leap from an era of treatment into one of prevention
With their newest project, the Baseline Study, Google is positioning itself as a key player in the world of healthcare Big Data. the study is a part of Google[x]’s life sciences division and aims to build a complete database of the human genome. If the Baseline Study is successful, it may yield not only the world’s largest and most detailed genome database, but also something no one else has yet attempted – a gene-by-gene picture of a healthy human being.
The study intends to take a proactive stance against disease by creating a genomic definition of human health. This could prove particularly valuable for pathologists, because reliable databases of disease-related genetic variations are needed to effectively design, carry out and interpret laboratory tests – and, more than ever, we need to be able to extrapolate from the results of those tests to inform patient treatment decisions going forward.
You might be wondering why this is big news, given that genomic databases already exist. Well, Google has several reasons to feel smug about this latest undertaking. First, current databases are built on limited populations – many genetic variants and biomarkers are detected only in patients with established disease – so their effectiveness in disease prediction and early-stage detection so far has had mixed results. Second, existing databases are assembled largely to support research, rather than clinical applications, so they’re not standardized and can be difficult to use in translational settings. Third, it’s very important that these large, complicated databases are well-indexed, searchable and standardized – and no one is in a better position than Google to deliver organized, accessible data. Google is claiming that its database will establish a set of baseline genetic markers for good health, as well as helping to identify and catalog new biomarkers that can be used to improve laboratory testing and treatment design.
How do they plan to do it?
The team of around 100 biomedical scientists is being headed by Andrew Conrad, former head of the National Genetics Institute and developer of a high-volume, low-cost HIV test for blood-plasma donations (1), and Vik Bajaj, an expert and innovator in using nuclear magnetic resonance (NMR) for early disease detection (2). They’ll be teaming up with Stanford and Duke Universities and members of the research elite – people who are actively involved in study design and data analysis and whose schools’ Institutional Review Boards (IRBs) are responsible for approving the experimental process.
Genetic and molecular information is already being collected anonymously from 175 people in a pilot project started this summer (3). Testing includes the collection of bodily fluids to create a tissue sample repository as well as to sequence participants’ DNA; information is also gathered on family genetic histories and on physiological traits like heart rate and metabolism. Clinical researchers will first collate and anonymize the data, then compare it with study participants’ age, lifestyle, habits and other physical factors (4). Expected to scale up to about 400 participants by the end of the year, the study is being assisted by new technology, like Illumina’s HiSeqX Ten, to facilitate low-cost genomic sequencing (5). It’s also likely that Baseline will incorporate genomic data collected by 23andMe, Inc., a personal genomics company providing direct-to-consumer testing, as well as by Calico, a Google startup aimed at extending longevity by, among other things, analyzing the genomes of healthy centenarians (6).
The good, the bad
But despite its thoroughness, the Baseline Study still has limitations – for instance, its ability to examine the complex interactions of physical, environmental and behavioral traits is limited, and the study isn’t geared toward short-term gain; progress will be incremental at first, and the ultimate payoff is many years away.
That isn’t to say that we shouldn’t be excited about it though. Even in its early stages, the study will make more human genome data available for research, expanding on the assemblies in existing databases. It will also focus explicitly on understanding the genomic characteristics of healthy humans – an area that isn’t currently well-studied. As the database grows, it will allow scientists to compare and contrast the genomes of healthy and disease states, which should hopefully lead to the discovery of new biomarkers. Because at the moment, biomarkers are identified using patient populations with established disease, known markers are typically indicative of ongoing or late-stage conditions. The Google[x] project hopes to locate new biomarkers that signal either a predilection for, or an early stage of disease, which would help pathologists and clinicians predict the onset of diseases far earlier than is currently possible.
Ultimately, the goal of the Baseline Study is to move medical science from a focus on treatment to a focus on prevention.
Should we be cynical?
A study with the scope and ambition of Baseline doesn’t come without reservations, though. Questions have been raised about the ethics of this research – how can the participants be guaranteed anonymity given the data to be collected for the study? Will the information ever be used for commercial purposes? Who will have access to the genome database?
Google is doing its best to forestall doubts about the study design by working with medical clinics and academic institutions. Health Insurance Portability and Accountability Act (HIPAA) regulations and the conditions of IRB approval impose ethical restrictions on use of the information – including a stipulation that it may never be mined for commercial purposes or connected to Google’s consumer products, despite company cofounder Larry Page’s expression of regret at the loss of these “really great possibilities” (5,6). It’s unclear exactly what uses these restrictions will and will not permit, but the hope is that the data will be used to further research and development for patient – rather than commercial benefit. Though the information will not be hosted publicly, it may be shared with academic researchers if their studies are IRB-approved. If Baseline’s data is merged with Calico’s, though, it could then also be shared with scientists in industry, for instance to accelerate the development of gene-targeted therapies in pharmaceutical research. Google claims that its goal for the genome study is to improve scientists’ and doctors’ understanding of human health and disease, but only time will tell what actual uses are found for the new information.
The involvement of data from outside sources complicates matters, too; the ability to cross-reference participants’ genomes with databases maintained by Calico, 23andMe and others could lead to a loss of anonymity for those whose genetic information is stored in more than one place (7). Additionally, both affiliated companies have close ties to Google: Calico is an independent subsidiary of the company, whereas 23andMe’s CEO, Anne Wojcicki, is married to Google[x] head Sergey Brin. Perhaps because of this list of concerns, Google has been somewhat closemouthed about the Baseline Study to date, meaning that much of the available information comes from only a few sources.
Nevertheless, despite these potential reservations, the study is proceeding as expected – the pilot project has begun, and if it’s successful, Google expects to add thousands of genetic profiles over time. Research will be conducted on a long-term basis, which may mean that participants are followed for as much as 10 years or more. As genomes are cataloged and information added to the database, Google hopes that the study will yield the most comprehensive picture of health and disease to date.
One thing is certain – with the Baseline Study, for the first time, pathologists involved in both research and treatment will be able to examine the complete, healthy human genome, which could help you to spot the early signs of disease or even to predict and prevent issues before they take hold. And, most importantly, as a pathologist, this could allow you to play an ever-larger role in disease monitoring and treatment. If Google delivers on its lofty ambitions, pathology and medicine could soon be moving from an era of “catch-up” treatment into an era of prevention and health optimization. Though much of what Google does is usually shrouded in a veil of secrecy, I think they’ll be shouting about it if they pull this off. Watch this space.
- L.B. Peddada et al., “Method of PCR Testing of Pooled Blood Samples”, US Patent 5780222 A, (1998).
- A. Barr, “Meet the Google X Life Sciences Team”, Digits, Wall Street Journal Blogs, (2014).
- A. Barr, “Google’s New Moonshot Project: the Human Body”, Wall Street Journal, (2014).
- S. Gibbs, “Google Calls for Guinea Pigs for Ambitious ‘Baseline’ Heath Study”, The Guardian, (2014).
- L. Sun, “Why Google Wants to Map Your Body With Its Baseline Study”, The Motley Fool, (2014).
- P. Kirk, “Google Takes First Steps to Create World’s Largest Human Genome Database as Part of Wider Strategy to Become a Major Player in Healthcare ‘Big Data’”, Dark Daily, (2014).
- J. Temple, “Google Wants Guinea Pigs for a New Medical Study. Here’s Why I’d Volunteer”, Re/code, (2014).