Genomics could soon outpace other scientific disciplines as the king of big data, but there could be problems ahead…
The genomics revolution is uncovering more insights into human biology than ever before, but some researchers foresee a problem on the horizon: just what will we do with all that information? A recent study by a team of US biologists and data scientists has concluded that the speed of genomic data generation has now outstripped that of YouTube (1), and at the current rate, the amount of genomic data produced every day is doubling every seven months.

Right now the storage and analysis of genomic information is manageable, but as sequencing becomes cheaper and more common, issues are likely to arise. It’s predicted that by 2025, up to a billion people may have had their genomes sequenced, creating the need for a huge amount of storage, and producing vast amounts of data on par with social media platforms, and disciplines such as astronomy (see Figure). Genomics is a “four-headed beast”, explain the researchers, with four key areas: acquisition, storage, distribution and analysis; all posing their own particular challenges. This means that no one solution will solve the impending problem – improved sequencing technologies, data storage and sharing solutions, and optimized computing infrastructures and data libraries will all need to play a part as genomics grows at lightning speed. “For a very long time, people have used the adjective ‘astronomical’ to talk about things that are really, truly huge,” says Michael Schatz, co-author of the associated paper, “but in pointing out the incredible pace of growth of data-generation in the biological sciences, my colleagues and I are suggesting we may need to start calling truly immense things ‘genomical’ in the years just ahead.”
References
- ZD Stephens, et al., “Big data: astronomical or genomical?”, PLoS Biol, 13, e1002195 (2015). PMID: 26151137.