Which Assay for Atezolizumab?
A response to “Welcome To Our Kitchen,” by David Rimm
Emina Torlakovic, Allen M. Gown | | Opinion
We read, with great interest, David Rimm’s comments regarding the FDA-approved Roche Ventana PD-L1 (SP142-based) assay, which is intended to determine triple-negative breast cancer patients’ eligibility for treatment with the immune checkpoint inhibitor drug atezolizumab. Rimm makes very serious allegations regarding this immunohistochemistry (IHC) assay and serious allegations regarding IHC assays in general. We would like to share our views on relevant general concepts regarding IHC methodology and quality assurance, as well our response to some specific comments made regarding the FDA-approved Roche Ventana PD-L1 (SP142) assay.
Rimm states, “It’s the (IHC) protocol – not the recipe – that leads to a high level of reproducibility.” Clearly, some IHC protocols may be more robust and easier to reproduce; however, it is not the protocol alone that leads to reproducibility. The IHC protocol is not the entirety of the analytical phase; the other important component is the readout, which is mostly done by pathologists with or without the assistance of image analysis (1). Such readouts may or may not be highly reproducible – an issue that applies not only to IHC scoring, but also all areas of pathology practice. Reproducibility depends on both the complexity of the task and the nature of the readout’s subject. Some readouts require more training and experience, whereas others are intuitive and simple, even for the novice. Nonetheless, it is a truism that more training and experience will lead to more reproducible results.
The reproducibility of the IHC assay also depends upon how tightly controlled the other components of the assay are. For many IHC assays, preanalytical conditions must be monitored and controlled to achieve reproducibility. Quality assurance measures for instruments, reagents, and operator training could also have a major impact on reproducibility. Therefore, although Rimm’s statement is partly true, the multiparametric nature of IHC assays and each parameter’s impact on total assay reproducibility cannot be forgotten. The “total test approach” Clive Taylor introduced to IHC methodology definitely applies to its reproducibility and is even more relevant today, in the era of precision medicine, than before (2).
Another statement with which we take issue is that the developers of the FDA-approved companion diagnostic assays have “relegated pathologists to the role of short-order cook.” Most pathologists are not directly involved in IHC assay development and do not participate in the validation of IHC laboratory-derived tests (LDTs). Even pathologists who serve as medical directors of IHC laboratories may not be applying sound principles of validation for any given use of the assay. Those that do apply these principles understand that validation for predictive assays may be complex and costly, and that there is a growing need to ensure that predictive assays are properly clinically validated, directly or indirectly (3–5). When using an FDA-approved kit, the IHC laboratory must only verify the assay, which is much easier. One way or the other, the pathologists receive IHC-stained slides to perform a readout and interpretation – their usual starting point. Pathologists are rarely involved with the “cooking” phase of IHC assays. Indeed, less cooking is generally better for both laboratories and pathologists involved in clinical IHC assays.
We also disagree with Rimm’s assertion that IHC laboratories “create our own tests (known as lab-derived tests, or LDTs), rather than use a kit, so that we know exactly what is in each component of the assay.” LDTs are employed for numerous reasons, usually because there is no other choice or because the FDA-approved assay kits are perceived as too expensive in some settings. From the standpoint of laboratory accuracy and reproducibility, the published data (for instance, from the NordiQC QA program) clearly demonstrate the superior performance of FDA-approved assay kits versus LDTs (6). Furthermore, LDT protocols can vary widely between laboratories, yielding poor inter-laboratory reproducibility. This is in direct contrast to FDA-approved assays, which are far more reproducible from laboratory to laboratory. Importantly, pathologists do not necessarily know what is in an LDT assay – even one that is run in their laboratory – given the proprietary nature of certain reagents and the closed nature of many auto-stainers used in this context.
The tissue tools that laboratories use to determine the performance characteristics of an LDT IHC assay are far from standardized and can vary widely in their ability to provide information on analytical sensitivity or reproducibility around important cutoffs. FDA requirements for industry often far exceed those for LDTs when it comes to providing evidence about performance. For most assays, pathology laboratories can perform validation studies in which “gold standard” positive and negative controls are employed, usually looking for the assay’s ability to detect expression of a cell-specific marker. We have not yet achieved standardization of such controls – a requirement that was recently emphasized by an ad hoc international expert group (7). In biomarkers such as PD-L1, which predict response to a specific therapy in a specific subset of patients treated with a specific drug, laboratories cannot validate the assays – only verify them. It is rather exceptional for any clinical IHC laboratory to have the means to do either direct or indirect clinical validation.
And why is Rimm so exercised about this “limitation” of the Roche Ventana PD-L1 (SP142) assay? The situation is identical with the FDA approved 28-8, 22C3, and SP263-based PD-L1 assays from various vendors, all using specific instruments. Only clinical trial studies can serve as true gold standards for the clinical validation of predictive biomarkers. We have no better benchmark than clinical outcomes, even when the assay is not 100 percent predictive of that outcome. The next best thing to verification of a clinically qualified biomarker is having pathologists use this exact clinically validated biomarker as a “designated gold standard” (or reference standard) for a specific purpose when setting up an LDT IHC assay with the same purpose – a practice referred to as “diagnostic validation” or “indirect clinical validation” (5). It is therefore critical to understand that it is not possible to design a “better LDT” – that is, an LDT that would be better for its specific purpose than the one approved by the FDA (based on the clinical trial evidence) – without the clinical trial. Furthermore, it is difficult (if not impossible) to harmonize LDT performance to the level that some FDA kits have achieved.
We would also like to comment on the specific allegations regarding the Roche Ventana PD-L1 SP142 Assay. The first is that the readout of the latter has been shown to be non-reproducible “… between the 13 or 25 pathologists participating in statistically powered, prospective studies done in the real world.” As discussed above, readout agreement between pathologists is poor for many scoring systems that are still clinically applied (such as Gleason grading), but it improves with education and training (8). We agree that PD-L1 testing has introduced another level of complexity in IHC readout for pathologists. Poor readout results are a real obstacle only if they continue to be poor after proper education and training. There is no evidence of this in the published literature. The question Rimm asks – “What will happen when thousands of pathologists around the world are expected to read this assay?” – is a good one. We do need more education and training – and we believe pathologists accept that continuing medical education is important. As we are well aware, specialty certification does not ensure proficiency in any skill or task, including reading the IHC-stained slides of predictive assays.
The second allegation is that the assay is less sensitive than another that also detects PD-L1. It may be true that tumor cells are stained to a lesser degree with the SP142-based IHC assay, but the opposite is true of immune cells, which appear to be “overstained” compared with other IHC PD-L1 assays; but it seems the intent of the assay was lower analytical sensitivity for tumor cells and higher analytical sensitivity for inflammatory cells – a goal that was achieved. Most importantly, this is not a universal PD-L1 IHC assay. Clearly, the purpose of the assay should always be considered when its performance is judged. The Roche Ventana assay was designed with a specific purpose and its performance for that purpose was assessed by the FDA before approval. Any use of the assay beyond the purpose for which it was qualified should be considered “off-label” and should be validated before use in clinical practice.
Thus, the most significant mistake in Rimm’s analysis is his confusion between analytic sensitivity and specificity and diagnostic sensitivity and specificity. It is not possible to make direct assumptions of one from the other. In the case of PD-L1 detection using the Roche Ventana PD-L1 (SP142) assay, Rimm is correct that this antibody has decreased sensitivity when assessing PD-L1 on tumor cells in tissue sections, but not on inflammatory cells, as noted above. However, the assay’s analytic sensitivity for PD-L1 detection on tumor cells is not relevant to the assay’s diagnostic sensitivity for most purposes for which the kit was approved. And, as gleaned from the clinical trials, the assay’s diagnostic power appears high, and PD-L1 expression in the immune cell population of triple-negative breast cancers remains the best predictor of clinical response to atezolizumab combined with nab-paclitaxel (9).
The Roche Ventana PD-L1 (SP142) assay has been designed to highlight and favor signaling on immune cells and, to these two observers, it does a great job. We have been trained to do readout for this SP142 IHC assay in triple-negative breast cancer (TNBC), and we find it both simple and highly reproducible, paralleling the assessment of this assay in a study (not cited by Rimm) involving six pathologists and three sites (10). Indeed, in our opinion, the readout with the Roche Ventana PD-L1 (SP142) assay is far easier and more reproducible than any of the other PD-L1 assays. Rimm’s motivation to harmonize this assay (designed to highlight PD-L1-positive immune cells in TNBC) with one that is designed to identify PD-L1-positive lung cancer tumor cells flies in the face of logic and good laboratory practices. These other PD-L1 assays have not been shown in any clinical trials to be predictive of response to immunotherapy in TNBC, and it would therefore be irresponsible to replace the Roche Ventana PD-L1 (SP142) assay with one of the other FDA-approved PD-L1 kits or with an ad hoc LDT. Although an LDT that has not been clinically validated might show excellent signal-to-noise ratio and produce aesthetically excellent results, it may nevertheless fail at its main purpose – that is, to accurately predict a patient’s response to atezolizumab.
This article has a response.
- EE Torlakovic et al., “Evolution of quality assurance for clinical immunohistochemistry in the era of precision medicine. Part 3: Technical validation of immunohistochemistry (IHC) assays in clinical IHC laboratories”, Appl Immunohistochem Mol Morphol, 25, 151 (2017). PMID: 28187030.
- CR Taylor, “The total test approach to standardization of immunohistochemistry”, Arch Pathol Lab Med, 124, 945 (2000). PMID: 10888767.
- PL Fitzgibbons et al., “Principles of analytic validation of immunohistochemical assays: Guideline from the College of American Pathologists Pathology and Laboratory Quality Center”, Arch Pathol Lab Med, 138, 1432 (2014). PMID: 24646069.
- JW Lee et al., “Fit-for-purpose method development and validation for successful biomarker measurement”, Pharm Res, 23, 312 (2006). PMID: 16397743.
- EE Torlakovic, “How to validate predictive immunohistochemistry testing in pathology?”, Arch Pathol Lab Med, 143, 907 (2019). PMID: 31339752.
- NordiQC, “Assessments” (2019). Available at: https://bit.ly/2Q5UkWY. Accessed November 11, 2019.
- EE Torlakovic et al., “Standardization of positive controls in diagnostic immunohistochemistry: recommendations from the International Ad Hoc Expert Committee”, Appl Immunohistochem Mol Morphol, 23, 1 (2015). PMID: 25474126.
- TL Lotan, JI Epstein, “Clinical implications of changing definitions within the Gleason grading system”, Nat Rev Urol, 7, 136 (2010). PMID: 20157302.
- K Kalinsky, D Hershman, “Impassion130 Trial: Changing the treatment landscape in metastatic triple-negative breast cancer” (2019). Available at: https://bit.ly/2NzvQ6J. Accessed November 11, 2019.
- B Vennapusa et al., “Development of a PD-L1 complementary diagnostic immunohistochemistry assay (SP142) for atezolizumab”, Appl Immunohistochem Mol Morphol, 27, 92 (2019). PMID: 29346180.