The Spice of Life
Precision medicine holds great promise – but if we don’t improve diversity in genomic research, minority groups could be left behind
Luke Turner | | Longer Read
Precision medicine is an exciting new approach to disease treatment and prevention that tailors therapy to an individual’s lifestyle, environment, and – most importantly – genetic background. As the push toward precision medicine picks up pace, one area of research is integral to its future success: genomics. Since the first human genome was sequenced in 2003, there has been an explosion in the number of genome-wide association studies (GWAS), with around 3,700 carried out to date (1). This ever-growing area of research has led to important discoveries about human health and behavior, including the identification of thousands of genetic risk variants and their biological function.
But precision medicine isn’t a one-size-fits-all approach – in fact, it’s often called by another name: “personalized medicine.” Its wider success fundamentally relies upon the collection of genomic data from diverse and representative study populations whose genetic backgrounds form the basis of new targeted therapies. For example, a number of complex diseases, such as diabetes and coronary heart disease, are associated with multiple genetic variants and their interaction with social and environmental factors. The complex interaction of polygenic traits means that thousands of variants, each with a small influence on an individual’s phenotype, combine to have a greater total effect – which may differ from one person to the next. Therefore, it’s vital to characterize genetic variants from as many individuals as possible to ensure that potential therapies are applicable to everyone.
Although the National Institutes of Health (NIH) introduced a policy in 1993 that requires minority groups to be equally included as clinical research subjects (2), current evidence suggests that little has changed over the past 30 years. As a result, the field of genomics could be facing a large-scale diversity problem. Despite non-European ancestry groups’ accounting for 40 percent of the total US population, not even two percent of the more than 10,000 cancer clinical trials since 1993 have included enough minority participants to satisfy the NIH’s own diversity policy (3).
The consequences of this underrepresentation of non-European ancestry groups could be severe. A lack of diversity in genetic databases is predicted to lead to health disparities when precision medicine research is translated into clinical practice. Ultimately, this could render certain treatments unsuitable for and even dangerous to underrepresented groups.
But why is there such a considerable lack of diversity in genomic research? Is it a reluctance among minority groups to participate? Is there an element of scientific racism? Or is there a lack of effort from researchers to expand their study populations beyond the easily accessible European ancestry samples? We hear from two experts who open up about the diversity crisis and discuss how we can combat it to ensure that the future benefits of precision medicine are felt by all, regardless of ancestry.
Genome-Wide to Worldwide
A lack of geographic, ancestral, and demographic diversity in genome-wide association studies is threatening the representation of certain populations
By Melinda Mills
Years ago, when I was conducting a genome-wide association study (GWAS) on reproductive choice in relation to genetic factors, I noticed something startling. With a background in demography, I have always been interested in the structure of human populations, so as I was collecting the data for this particular research, I looked around the room at the people we were studying. And that’s when alarm bells began to ring in my head about the diversity of our work – not only among the researchers, but also within our study populations. It was at this point that I decided we needed to investigate the level of diversity within genomic research more broadly.
To achieve highly targeted pharmaceuticals and personalized medicine, we first need to fully understand the specific mechanisms and functions of different loci. That’s what GWAS aim to achieve by isolating single nucleotide polymorphisms and conducting downstream analyses to find the most appropriate clinical interventions; for example, screening to identify and target individuals who are most likely to develop a certain disease, so that they can minimize their risk. These genetic discoveries offer exciting medical possibilities – but their potential efficacy is dictated by the diversity of participants on whom discoveries are made. Increasing evidence for the omnigenic model suggests that a single complex trait is the product of all genes interacting with each other (4). Therefore, to gain the greatest returns from GWAS, it is vital to maximize the ancestral and geographic diversity of individuals studied.
Although the scientific contributions of GWAS have previously been assessed qualitatively, a systematic scientometric study into their impact has thus far been missing. And so, we investigated all 3,700 GWAS completed between 2005 and 2018, analyzing the demographics of study participants, sample sizes, genetic ancestry, and the geographical distribution of participant recruitment (1). Despite the absence of any empirical evidence to confirm it before our study, personal experience meant that we were aware of some striking aspects of the diversity of participants in this kind of research and the way that it was being conducted. For example, there seemed to be a concentration of subjects across only a few countries, and the data itself appeared highly selective.
But even though we had an idea about the lack of diversity that exists, the extremity of our results still came as a shock to us. We found that ancestry in genetic discovery is highly unequal and has been dominated by individuals of European ancestry, who account for 83.19 percent of participants across all GWAS. In contrast, 12.37 percent of participants are Asian, 1.96 percent are African American or Afro-Caribbean, 1.30 percent are Hispanic or Latin American, and just 0.30 percent are of African ancestry (see Figure 1). This heavy imbalance in ancestry groups used for GWAS is really concerning because, despite a surge in sample sizes, traits, and diseases studied, there is a massive under-representation of non-European ancestry groups. Because this type of research is often used as a basis for pharmaceutical development, the fact that it only covers a very narrow and specific population will result in drugs that are less relevant – and in some cases even detrimental – to the health of certain groups.
Typically, there are two stages to any GWAS – discovery and replication. The discovery phase involves the identification of hundreds of different loci related to a particular trait, such as type 2 diabetes or breast cancer. During the replication phase, the original results are applied to various different populations to test whether the loci match up. Although some people in the field argue that non-European groups are increasingly included in GWAS, we found that this was largely in the replication phase – to confirm the findings that had mostly been drawn from European ancestry groups – and not in the initial genetic discovery phase. To combat this, there needs to be a shift in the types of ancestry groups that are used in the discovery phase, something we can only achieve through larger sample sizes. There has been a considerable increase in the use of samples from certain populations, such as Japanese, South Korean, and Chinese, but we are yet to see huge increases in African samples.
Until now, much of the discussion surrounding genomic research has been centered on ancestral diversity, without considering the geographic locations from which participants are recruited. But geography is crucial; a large amount of epidemiological evidence shows that disease prevalence and life expectancy are strongly linked to the place where a person lives (5). We found that a staggering 72 percent of all discoveries have come from people who live in just three countries – 40 percent from the UK, 19 percent from the US, and 12 percent from Iceland. This is rather shocking when you consider that approximately 76 percent of the world’s population resides in Africa.
The fact that so many participants come from Iceland is due to the presence of the headquarters for a large biopharmaceutical company called deCODE. Despite having a population of just 334,000, Icelandic participants represent 19.1 percent of all subjects included in genetic discoveries to date. Because every country is unique in regard to its social, cultural, and economic background, as well as the specific disease profiles that are prevalent, the use of so many participants from one geographical area is concerning. Differences in alleles between regions caused by population stratification mean that it can be dangerous to draw so many conclusions from a single, distinctive group of people.
Telling the whole story
When conducting a GWAS, researchers generally aim to obtain as much data as possible; however, when response rates are low, the resulting database may not be very representative of the population. For example, the UK Biobank is a long-term study investigating the contributions of genetic predisposition to the development of disease. But, in reality, it includes very few smokers, few people with high BMIs, and an overabundance of people of high socioeconomic status. Therefore, if we were to study smoking or other behaviors that are detrimental to health using the UK Biobank, we might wrongly conclude that certain genetic loci are protective or beneficial. In our study, we found that the most frequently studied people in GWAS are often older and more likely to be female (1). In regard to the over-representation of women, this could be a function of the traits that are being examined – such as breast cancer – but it could also be because men have a lower response rate to the studies themselves. Ideally, to tell the full story, we would need complete representation of individuals across all environments.
Not only is there an issue with the level of diversity among the participants of genomic research, but we also found a tightly knit group of researchers who dominate the ownership of very valuable genetic datasets. By holding richly phenotyped data that cover multiple diseases and traits, these people possess enormous power – and our social network analysis showed that the same core people consistently appear as authors of papers. What’s more, of the more senior last author positions, 70 percent were occupied by men, which shows that men are more likely to be principal investigators and are over-represented in these powerful positions. On the other hand, women are relatively better represented as (more junior) first authors, despite still being in the minority at 44 percent.
One way to tackle the dense web of interconnected authors is to rethink how we incentivize this type of research. The holders of large datasets understandably want to see a return on their significant investments – but, as things stand, the only way to achieve this is to appear as an author on any papers that make use of their cohort. I think we need to start to value the use of these datasets in a different way and provide new incentives for their owners. Unfortunately, the waters are further muddied by the introduction of “direct-to-consumer” companies, many of whom analyze their data in-house and don’t release it externally to researchers. It’s striking that three of the top 10 most prominent GWAS authors all come from deCODE genetics, which is one of these commercial institutions.
Interestingly, when we asked whether researchers of different genders tend to study different traits in GWAS, we discovered statistically significant results for two things. Women are more likely to study breast cancer, whereas men are more likely to study cardiovascular disease. This is interesting when you consider that, for many years, women’s heart disease and heart attacks were misdiagnosed because symptoms are so different between men and women. A number of people have picked up on this, and some believe that it occurred because male doctors and researchers were focusing on the male system.
A question of funding?
The main reason for such a lack of diversity within the field is down to a lack of funding. US-based agencies – primarily the NIH – fund 85 percent of the research in question. Organizations in the UK, such as the Medical Research Council and the Wellcome Trust, also make up a large proportion of funding, whereas in Iceland, deCODE is the main source. These organizations specifically provide funds for data to be collected in those countries, explaining why 72 percent of findings emanate from them.
I think that researchers in the field are aware that they are looking largely at populations of European descent, but they might be surprised if they knew the exact figures. Some people might say, “We’re looking at these basic biological and causal aspects, so it shouldn’t really matter,” but that doesn’t work because we know that there are different allele frequencies between different populations.
It’s one thing to identify this huge diversity problem within genomic research, but it’s a whole lot harder to know how to tackle it effectively – not least because it’s an issue that spans researchers, data providers, and national governments. One of the most important measures that we can take is to prioritize multiple types of diversity, including ancestral, environmental, geographic, and demographic. Now that we appreciate the importance of gene-environment interactions, we have to include populations that aren’t healthy, and those that come from low socioeconomic backgrounds. To achieve this level of diversity on multiple fronts, I think people have to begin to recognize the impact that a lack of diversity has on research findings.
Commercial companies have started to become more agile, and our paper has already been used to actively market and promote more diverse samples. These companies have a responsibility in the US and the UK to drive greater diversity. On the other hand, within Africa, national science foundations are leading the way with initiatives to collect more data in the region. But one thing we must avoid is helicopter science, because concerns have been raised by some African scientists that data is merely collected from the region, then taken elsewhere. The African Genome Variation Project, funded by the NIH and the Wellcome Trust, is one initiative aiming to collect vital genetic information; however, it’s unclear how much the community, researchers, and infrastructure in Africa are actually being developed. To achieve real expertise in these areas, we have to think on the ground about building networks so that we can really trust the data. We must help our colleagues in under-represented and under-resourced regions to conduct and publish their own research, rather than simply benefiting from the information they gather.
A different, more radical solution could be to impose funding sanctions and consequences when there are gaps in diversity. When carrying out our study, I contacted all of the science foundations involved and asked, “What is your diversity policy for researchers and study populations?” It was interesting to hear that some didn’t have anything in place – something that has to change before we can make any kind of progress. We can talk about greater diversity all we want, but until measures are put in place to enforce it and sanctions are imposed when it’s lacking, it will be difficult to achieve.
As things stand, not only are personalized medicine approaches going to be less suitable for minority groups in the future, but they could even be harmful. If we can’t increase the amount of diversity in genomic research, we could end up exacerbating existing divisions. Basing personalized medicine on findings from a group made up of 83 percent European ancestry – and generally quite healthy – will only serve to reinforce inequalities. Steps have been made in the right direction but, to succeed, we need to adopt a unified approach that prioritizes multiple types of diversity, devise strategies to monitor diversity, call for local participant involvement, and reform incentive structures that include the role of authorship, data ownership, and access to results. Only then can we truly fulfil the immense potential of the genomic revolution for people across the globe.
Melinda Mills is Nuffield Professor of Sociology at Nuffield College, University of Oxford, UK.
The Precision Medicine Crisis
The lack of diversity in cell lines is stifling the true potential of precision medicine
By Rick Kittles
As someone who conducts health disparities research, I know from personal experience how difficult it is to obtain diverse biospecimens. In my studies on prostate cancer, we commonly use cell lines as models to explore different pathways and test the effects of various treatments. But the problem with these cell lines is that there isn’t adequate representation from different ancestry groups. In the US, African Americans have almost twice the incidence of prostate cancer as the white population – so why isn’t the number of African American samples two times higher? Instead, they’re overwhelmingly of European descent – in fact, there’s almost a complete lack of cell lines of African American descent. It’s essential to have representative cell lines from disparate populations if we want to study them effectively.
Not only that, but my colleagues and I also started to wonder whether the few cell lines that were labeled as African American actually contained that ancestry. We never felt completely comfortable or confident that the classification provided for these cell lines was adequate; that’s why I wanted to investigate their origin in more detail.
Unfortunately, our results confirmed these fears. We characterized 15 commercial cell lines used to study prostate, breast, and cervical cancer based on their amount of West African, Native American, and European ancestry. After extracting and genotyping the DNA to identify ancestry-informative markers, we were able to determine the accuracy of their original ethnicity classification. Worryingly, although all of the cell lines previously classified as white/Caucasian were correct in their description (with a mean European ancestry of 97 percent), those previously classified as African American were not always accurate. Most notably, the cell line known as E006AA-hT – used for prostate cancer research – is classified as African American, but we found it to carry 92 percent European ancestry. In addition, there was high variance in other cell lines labeled as African American in terms of their true ancestry (6).
Although these results were shocking, they didn’t come as a complete surprise to me because I had spent several years trying to get to the bottom of these misclassifications. For example, I genotyped the E006AA cell line almost eight years ago and found that it was made up predominantly of white ancestry. At that point, some of my colleagues rightly stopped using that particular sample. However, for those who remained unconvinced, I decided to publish a study to demonstrate just how misleading the African American labeling really is. The concerning part is that there were federal National Institutes of Health (NIH) grants written that proposed to use the E006AA-hT cell line as a representative African American population, so those researchers will have wasted a lot of time and money. What’s more, the numerous papers that have been published using E006AA-hT can no longer claim to represent African Americans.
How did such a basic misclassification occur? That’s a question for the individuals who developed the cell lines, but it essentially boils down to sloppy science – whether caused by a labeling error in the lab or misreporting of the original patient.
To prevent this kind of mistake from happening in the future, I believe that we need to implement stricter and more robust rules. Currently, there are no guidelines in place regarding how cell lines are classified or how they are sold to the public. For example, the company that sells E006AA-hT now has a disclaimer on their website to say that it might not be African American – but they are still labeling and marketing it that way!
Historically, researchers have encountered issues when it comes to including African American and Hispanic populations in their biomedical studies. African Americans form a macro-ethnic group. The bulk of our gene pool comes from West Africa, but there has been a high level of gene flow over time; now, some African Americans have a higher proportion of European ancestry, whereas others have more West African ancestry. This heterogeneity previously meant that it was too difficult for researchers to manage and account for the variance in genetic background. But now, with the help of ancestry-informative markers, we can control for high levels of diversity in a sophisticated way. Although there are thousands of these markers in the genome that allow us to distinguish between major geographic regions such as European, West African, Asian, and Native American, we found that only about 100 markers need to be genotyped to estimate ancestry with fairly high confidence levels. After that point, we don’t learn too much more about ancestry by adding further markers, which is why we opted to use 105 such markers in our study.
For example, the Duffy null allele is almost 100 percent West African; it’s present in the genome as protection from malaria and not seen in European and Asian individuals. Therefore, when Duffy null is present, you can say with high confidence levels that the person has some degree of African ancestry. By combining 105 of these ancestry-informative markers, we can take genetic heterogeneity into account and accurately characterize populations around the world. As a result, the issue of heterogeneity is no longer an excuse for researchers. If you’re studying African American populations in terms of genetic risk and drug response, then it’s crucial to take differences in genetic background into account.
The diversity problem
In a general sense, there is nowhere near enough diversity in this area of research, and we need to be assertive and proactive in increasing the representation in our biospecimen samples. To rely on just two or three African American prostate cancer cell lines – and then find out that one of them isn't even African American – is a disservice to science. The obvious way to confront this issue is to increase representation in genomic research, but we must do so in a way that is robust and scientifically rigorous. Unfortunately, if we don’t address the diversity problem promptly, the impacts could be broad and the implications long-standing.
Cell lines are frequently used to test products and screen drugs. If all of the samples used for testing are of European ancestry, do we know that the drugs that we develop will benefit people of all ancestries? The simple answer is no – there’s not enough genetic diversity in the cell lines to represent the broader population. Hundreds of millions of dollars have been funneled into drug discovery, but a huge proportion of the human population is simply missing from the screening phase.
Ultimately, if these drugs have a higher level of toxicity or lower efficacy in people who weren’t represented in the screen, they could have deadly consequences. In some cases, we have already started to notice such effects. For example, the dosage algorithm for the anticoagulant warfarin was initially developed using genes common in European populations. Once that algorithm was applied to African American and Hispanic populations, these patients were treated inadequately. The dosage requirement for therapeutic anticoagulation is influenced by genetic variation, so the calculations were only relevant for individuals from the screening population.
This is not a problem we can tackle overnight. There’s a history of racism in science that understandably dampens the enthusiasm of African Americans for this type of research. Stories such as that of Henrietta Lacks, whose cancer cells were cultured and immortalized without her consent or any compensation, and the Tuskegee syphilis experiment, where researchers knowingly failed to treat African American patients, do not engender confidence. Although those fears aren’t as intense as they used to be, the lack of interest among the African American community is one of the reasons why we’re at this point today.
Making a difference
To make strides toward greater equality in cell lines, I think there needs to be a strong push at multiple levels – from the federal government, to institutions, to the scientists, to colleges and universities, right down to schools where these issues are discussed. We need to educate the younger generation about the history of race issues in science – and encourage them to do something about it.
There has been disappointingly little effort from scientists to overturn the disparity in sample diversity. Despite conferences, conversations, and published papers on the topic, there is no clear, strategic plan yet to do anything about it. To be completely honest, I think a lot of scientists simply don’t care enough. The majority of scientists and institutions are of European ancestry and would rather study the populations that are easiest to access.
If the correct steps are taken and people at multiple levels get on board, I believe we could turn things around within a single generation. For instance, there is a huge project in the US called “All of Us,” which aims to recruit one million individuals for a population-based study by collecting their medical records and blood samples. It was initially called the Precision Medicine Initiative, but the name was changed to reflect the need for greater diversity within samples. Projects like this demonstrate that there is funding out there to encourage the engagement of African American institutions and populations. But that’s just one project; the NIH and the federal government need to devise systematic and strategic plans to increase diversity going forward and invest in the required initiatives.
Hopes and fears
I know that precision medicine will have enormous benefits and a huge impact in terms of improving health in the future. But my fear is that it won’t help everyone. I don’t care how great the science and technology is; if we continue to go down the route of only looking at homogeneous populations in the research phase, then the benefits of precision medicine aren’t going to deliver on their promise.
With that in mind, I regularly make an effort to go out into the community to talk to African Americans about these issues. I am passionate about precision medicine, and I try to convey that excitement to the African American community. I tell people that it’s not just a matter of throwing their arms up in the air; instead, they can actively participate in the process and be a part of the decision-making that affects them and their community.
From my own experience, I see a great response from African American people – they want to get involved and be part of the solution. My greatest hope is that there are enough institutions prepared to engage with people in a way that will increase diversity in samples for the future.
Rick Kittles is Professor and Founding Director of the Division of Health Equities within the Department of Population Sciences at City of Hope and Associate Director of Health Equities in the Comprehensive Cancer Center, USA.
- MC Mills & C Rahal, “A scientometric review of genome-wide association studies”, Commun Biol, 2, 9 (2019). PMID: 30623105.
- AC Mastroianni et al., “Requirement of inclusion in research” (1993). Available at: bit.ly/2JXt6zx. Accessed June 5, 2019.
- SS Oh et al., “Diversity in clinical and biomedical research: a promise yet to be fulfilled”, PLoS Med, 12, e1001918 (2015). PMID: 26671224.
- EA Boyle et al., “An expanded view of complex traits: from polygenic to omnigenic”, Cell, 169, 1177–1186 (2017). PMID: 28622505.
- R Lozano et al., “Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010”, Lancet, 380, 2095–2128 (2012). PMID: 23245604.
- SE Hooker et al., “Genetic ancestry analysis reveals misclassification of commonly used cancer cell lines”, Cancer Epidemiol Biomarkers Prev, [Epub ahead of print] (2019). PMID: 30787054.