What does astronomy have to do with biomarker discovery? At Johns Hopkins, pathologists and astronomers collaborated to develop AstroPath, an innovative approach which analyzes large multiplex IF datasets with celestial object–mapping algorithms to identify optimized predictive phenotypic signatures. In a recent webinar, Janis Taube, MD, MSc, Director of the Dermatopathology Division and Fellowship and Alexander S. Szalay, PhD,, Bloomberg Distinguished Professor of Physics, Astronomy and Computer Science at Johns Hopkins, introduced the platform and shared how AstroPath is laying the foundation for rapid, efficient biomarker discovery.
Read on for an overview of the highlights from their presentation. Watch the webinar on-demand here.
Janis Taube, MD, MSc
Director of the Dermatopathology Division and Fellowship
Johns Hopkins University School of Medicine
Alexander S Szalay, PhD
Bloomberg Distinguished Professor of Physics, Astronomy, and Computer Science
Johns Hopkins University
Assessing biomarker modalities
There are number of different biomarkers and technologies currently available for predicting response to immunotherapy. Standard PD-L1 IHC represents the traditional surgical pathology workflow. Gene expression profiling and tumor mutational burden (TMB) have also received a lot of attention. Now that multiplex immunofluorescence/immunohistochemistry (IF/IHC) is being brought in, said Dr. Taube, the question is, how does this technology stack up?
A number of investigators have used multimodality approaches, which combine some of these approaches (e.g. PD-L1 IHC + gene expression profiling, PD-L1 IHC + TMB, etc). But is there any advantage to using one technique or combination over another?
To answer this question, Dr. Taube’s team conducted a meta-analysis of biomarkers for anti-PD-1/PD-L1 response, which was published in JAMA Oncology. They analyzed 44 unique publications which had performed 55 individual analyses on one of the different biomarker groupings. For each study, the team extracted sensitivity and specificity values for predicting immunotherapy response. These values were mapped on a ROC curve
Both weighted and unweighted analysis (by number of patients), showed that mIHC/IF had significantly greater area under the curve (AUC) than other modalities and even multimodal approaches.
Validating a multiplex immunofluorescence assay across multiple institutions
Is it possible to standardize these assays and use them across institutions in clinical trials and clinical care? Dr. Taube’s team has been working across multiple sites to test an optimized 6-plex Opal multiplex IF assay. The panel includes cytokeratin, PD-1, PD-L1, CH8, CD68, and FoxP3. The sites have tested reproducibility on tonsil and tumor tissue, as well as tumor tissue microarrays (TMAs). Both inter- and intra-site reproducibility of cell density for each marker has shown an r-squared value of approximately 0.8, indicating good reproducibility across all sites.
They saw similar results for more complex parameters, such as co-expression. The sites had concordance above 0.8 for %PD-L1 co-expression in cytokeratin cells and CD68+ cells. The reproducibility of proximity assessments, specifically PD-1 to PD-L1 also showed a robust concordance of ~0.8 between sites.
“It’s really a time and data usage issue. To map the entire tumor takes over 1000 fields and over 300 GB of disk space.”
From stars to cells
One of the challenges in using multiplex IF is the amount of data generated. Early papers tend to asses 5-10 high power fields per tumor. However, one tumor typically has over 1000 fields. What is holding researchers back from assessing more fields? “It’s really a time and data usage issue. To map the entire tumor takes over 1000 fields and over 300 GB of disk space,” said Dr. Taube.
When Dr. Taube’s team ran into issues dealing with such large amounts of data, they turned to Dr. Szalay who, from his work in astronomy, has a wealth of experience with large datasets.
Dr. Szalay began working with big data when Johns Hopkins joined the Sloan Digital Sky Survey (SDSS), an effort to create a map of the galaxy distribution of the Northern Sky. It resulted in a publicly available database called “Skyserver”, which enabled visual navigation of all the spatial data.
There are strong parallels between medicine today and astronomy 25 years ago, according to Dr. Szalay. Stars and galaxies can be equated to cells in pathology. In astronomy, data acquisition is performed through techniques like multicolor photometry and image segmentation, with strong emphasis on locality and spatial relationships. This is similar to pathology, where the goal is understanding the tumor microenvironment.
Astronomers learned from the challenges they faced when tackling these big data projects, said Dr. Szalay. They discovered that statistical analysis and collaboration was facilitated by using a singular database, rather than scattered files. They had to find a common processing level that was considered “good enough” to process all images while providing homogenous reduction of the data. Automation was necessary to achieve statistical reproducibility at scale, because human involvement leads to subtle differences in how data is processed.
When Dr. Taube and Dr. Szalay began working together, they first considered scalability. In order to scale up, they realized it would be necessary to take a thousand times more images and dramatically decrease human involvement by working on automating tissue segmentation and implementing deep learning for cell recognition. They built a spatial database scalable to many billions of cells. They eventually plan for the database to integrate with genomics and medical records.
The Johns Hopkins team has developed a number of algorithms for pre-processing of tissue images captured on the Vectra systems, including enabling automated alignment of images, and improving flat fielding for more uniform pixel intensity. Spatial polygons are created for every cell in the database. There are a number of geometries represented, including primary regions, manual annotation of good tissue and tumor boundaries, automatic tumor boundaries, and membrane and nucleus outlines for each cell. Distance from the tumor boundary is calculated for each cell. The database can also compute the spatial neighbors for each cell within a specified distance.
In its current state, the database contains approximately 200 slides, or 100 million cells. The automated processing workflow is close to completion, and experts in machine vision are working on deep learning for cell recognition. The team has also been working on creating an interactive spatial browser for cells, called CellView. The goal of this project is to provide visual tools for whole slide images that provide detailed spatial information.
We are at this point of information explosion. The question is going to be how best to boil these down to an actionable biomarker which we can optimize and deploy in a clinical context.
Identifying phenotypic signatures
With a robust analysis workflow in AstroPath, the team sought to answer some key questions: (1) Do PD-1 and PD-L1 expression intensities on specific cell subsets associate with clinical outcomes following anti-PD-1 therapy? and (2) What is the optimal tumor microenvironment sampling strategy?
Their first cohort contained pre-treatment specimens from 73 advanced melanoma patients. The samples were stained with a six-plex assay containing PD-1, PD-L1, CD8, FoxP3, CD163, and tumor marker.
Tiled images were captured across the entire TME and were ranked by CD8+ cell density in decreasing order, representing a hot spot approach. However, this was not user-driven; ranking was performed algorithmically and should therefore be highly reproducible, said Dr. Taube. The team assessed representative fields, where there was a mixed selection between inflamed and non-inflamed, as well as the central and peripheral areas of the tumor.
The team then determined how cell densities positive for a given marker predicted response to anti-PD-1 using ROC curves. They looked at the AUC for an increasing number of hotspots for their six-plex multiplex IF assay and found that the optimal sampling strategy to find biomarkers with the most predictive values involves using 20-25 high-powered fields (HPFs) with a ranked hot spot approach to HPF selection.
The key phenotypic signature they identified for anti-PD-1 response was CD8+/FoxP3+/PD-1+ cells, which represent early, effector T-cells at the earliest stages after priming and make up approximately 3% of all CD8+ cells in the tumor microenvironment in melanoma.
With TMB, gene expression profiling, and multiplex IHC/IF, we have more tools than ever to predict immunotherapy response. “We are at this point of information explosion,” said Dr. Taube. “The question is going to be how best to boil these down to an actionable biomarker which we can optimize and deploy in a clinical context.”