Digital Pathology: Behind the Scenes
You may well be familiar with the digital interfaces used in pathology, but how well do you know the supporting infrastructure?
Digital pathology is a buzzword that conjures up high-end technology, striking computerized images, massive sequencing projects – the kinds of things one might see in a seemingly well-equipped laboratory in a TV show. But most viewers (and many pathologists) won’t be thinking: where are all those detailed images being stored? How are those pathologists accessing all that sequencing data? In reality, every successful digital pathologist needs the support of an intelligent, high-performance infrastructure designed specifically for large, data-intensive workflows.
The search for storage
Sequencing is currently placing a particularly high demand on data storage; not only has the cost of sequencing dropped dramatically (especially for high-volume approaches), but it has also become faster than ever before, resulting in a huge increase in sequencing operations and an ever-growing mountain of data.
The sequencing activities of the Swiss Institute of Bioinformatics (SIB) have massively increased over the last 20 years. Today, the organization handles about five separate projects each week, supporting approximately 300 active research teams across six different sequencing centers. With up to 43 terabytes of data generated each week, they have had to place storage at the heart of their infrastructure. With their current system, SIB researchers get high-speed access to sequencing and analysis data through multiple separate storage systems – nearly 1.5 PB of primary storage and 5 PB of economic tape archives, along with high-performance processing for genomics data. SIB’s tiered approach keeps active data on primary storage for complex analysis and automatically moves it into the long-term archive as it ages. Over 600 users access the sequenced genomic data, either locally by tapping into the network in one of the SIB-affiliated data centers, or via a remote interface.
More recently, pathologists have begun seeking solutions to handle the data demands of high-resolution microscopy – a need that is likely to increase. For instance, earlier this year, researchers from the Massachusetts Institute of Technology developed technology to make extremely high-resolution images at a fraction of their former cost (1). As imaging technology becomes more capable and easier to use, pathologists will need higher storage capacity to handle their microscopy needs.
Remote Resources
By Yasmine Lahoubi
Digital pathology has had a tremendous rise in the last few years, and has proven that it can be a viable alternative to working with conventional slides. Thus far, it has mostly been used in education, meetings and consultations – but now, with the first FDA-approved solution, we will see digital pathology’s potential for primary diagnosis begin to unfold. Working digitally – sharing images and discussing them online – can be exceedingly helpful, especially in remote regions where young pathologists are often forced to work alone, with no experts in direct reach. It’s a particularly significant issue in cytopathology, where pathologists frequently have access to limited tissue but must still conduct technically demanding examinations. Remote consultation allows young pathologists to discuss cases with experts, enabling them to confidently proceed with diagnosis, prognosis and treatment recommendations.
In many cases, though, smaller laboratories lack the sophisticated hardware (like virtual slide scanners) and software they need to truly take full advantage of telepathology. Pathologists working without those resources welcome any way of sharing and discussing images easily – especially if they can do so using just a standard web browser, or even via mobile phone (so that they can share snapshots taken directly with the phone’s camera). For these pathologists – just as for those working with extensive resources – digital pathology is a huge advance that can only bring benefits.
Yasmine Lahoubi is a fourth-year pathology resident at Mustapha Bacha University Hospital, Algiers, Algeria, and a USCAP ambassador.
Things to think about
Instead of considering storage as individual silos, we need to take a broader view and accept it as a key part of the infrastructure that supports our operations. What is “infrastructure,” from a data point of view? The term refers to a system that includes networking topology, computing resources, and storage. When we discuss storage, we have to consider attributes such as capacity, performance, cost, and connectivity, and the demands that any given laboratory places on those attributes. We must carefully think about current data needs, of course, but also future demand and how to manage data as simply and efficiently as possible.
One of the most common mistakes laboratories make when transitioning to a digital workflow is investing in a “closed” infrastructure that doesn’t interface seamlessly with the lab’s existing technologies (or those they may need to add in the future). To build storage infrastructure capable of handling a growing volume of scientific data, research institutions must find ways to blend different storage technologies: high-speed primary disks, object storage, tape archives, and the cloud. Many institutions begin by purchasing high-performance storage that meets the requirements of their initial, small-capacity environment, and are then forced to keep adding expensive storage as their needs increase. Eventually, those labs reach a point where costs are too high and backup isn’t working well. And then what? Sometimes, their data is exposed because it lacks sufficient protection. Most of the time, they simply end up unable to expand their services because they can’t afford the necessary storage space. The bottom line? Digital pathology is here to stay – and laboratory setups must be able to keep pace.
Data storage is not the whole story. Once the initial storage space has been established, you still need to organize, manage and maintain your data. There are a number of tools available to help users manage files logically and efficiently – not according to assumptions made by non-medical professionals, but in ways that make sense for them and their workflows. Consider that – on average – 70 to 80 percent of stored data files are not in active use. Empowering users to decide which data can be archived to lower-cost media creates extra space on more expensive primary storage for information that pathologists need to keep at their fingertips. Such user-friendliness is key; instead of relying on the IT department to take actions like archiving, accessible software puts data organization and management into pathologists’ own hands so that they can make decisions based on expert knowledge that they alone have.
To cloud or not to cloud?
Which is better – physical or virtual data storage? Ultimately, the answer comes down to the institution’s overall storage strategy and the desired balance of capacity, performance, accessibility and cost. The elasticity and remote aspect of cloud storage are tremendous advantages for some applications, like short-term temporary workflows, but they aren’t the best fit for every application. Cloud-based solutions are useful for “flexible demand,” when storage needs increase suddenly or at an unpredictable rate; they’re also good for situations where users need an off-site backup for their data to protect against potential disaster. The cloud also provides cost flexibility; most vendors offer a low price-per-gigabyte rate – but, typically, there are separate charges for activities such as data movement, file retrieval, deletion, and support, so contracts can be complicated, and costs can add up quickly. Cloud-based options can also create difficulties if you want to change vendors; data migration tools are usually provider-specific and can be tricky to use.
In comparison, on-site storage can grow with laboratories while keeping their data safe, secure and accessible. For ongoing, large-scale data storage, it’s far more cost-effective than virtual storage, because there are no recurring fees – only a single, up-front investment. There’s also the matter of moving your data; constantly porting it between your on-site system and the cloud can be time-consuming and carry with it high bandwidth and retrieval charges. But not all physical media are equal. Are you sure that flash drive is the best place to keep all of your most valuable images? Is that stack of 3.5” floppy disks in your desk drawer really what you want to use for your sequencing data? What exactly is object storage? With so many options, it can be difficult to choose the best – and most secure – storage solution for your needs.
My best advice for laboratories considering a digital transition is to work out what resources they have available – and then carefully consider the needs of the pathologists and laboratory professionals who will be using the system. They don’t want to waste valuable time searching for data, or worry about whether or not they’ll be able to store and protect the information they use. The key to successful digital pathology is to make data management as simple, secure and user-friendly as possible.
- A Trafton, “High-resolution imaging with conventional microscopes (2017). Available at: bit.ly/2zGphsA. Accessed November 15, 2017.
Mark Pastor is director of data intelligence solutions at Quantum. He is responsible for driving Quantum’s data intelligence and storage solutions for high performance computing, AI, research and other large unstructured data environments, and also represents Quantum within the Active Archive Alliance and in the LTO Consortium.