The D3b Data Tracker Tool: Streamlining Data Collection and Management through Innovative Technology
The Center for Data Driven Discovery in Biomedicine (D³b) is dedicated to accelerating cures and increasing the well-being of children battling the rarest and most difficult-to-treat conditions. D³b is well-poised to lead the way in the creation and innovation of medical databases and portals, as well as a seamless infrastructure to maximize collaboration across global research communities.
Co-Director of D³b Dr. Adam Resnick sees personalized medicine that integrates real-time data sharing as not only an important goal in biomedicine, but an attainable one through the utilization of cloud-based analysis tools and resources. Large-scale, multi-omic data environments managed and coordinated by D³b, like the Children’s Brain Tumor Network (CBTN) and the Gabriella Miller Kids First Data Resource Center (Kids First DRC), make it the perfect testing ground for these exciting goals through the development of projects like the Data Tracker. Data Tracker is a new tool designed to bring together data across many sources, accelerating the pace at which data can be utilized for further study.
The data generated through medical research and in clinical settings are broad, complex, and collected over time, which makes putting them into a usable context more energy-intensive and time-consuming. Dr. Resnick refers to these intersecting types of data as “asynchronous” and “multi-modal,” including data types such as genomics, imaging data, and even clinical records and biospecimens.
For a data environment like the Kids First DRC, which facilitates data sharing between investigators in the fields of pediatric cancer and structural birth defects, the Data Tracker will standardize the way data are integrated, allowing them to be harmonized quickly and with much less time-consuming manual effort.
SIMPLIFYING & DEMOCRATIZING THE DATA INTEGRATION PROCESS
D³b’s Director of Data Technology and Innovation Dr. Allison Heath not only led the
development of the Kids First DRC, but has also been a guiding voice and decision-maker in the development of the Data Tracker. Dr. Heath and her team observed the challenges they and other researchers face in the pediatric cancer and structural birth defect research space through the Kids First DRC, and began to understand that their problems were far from isolated. She recognizes that an often-overlooked problem is that there is a lack of tools for researchers to easily track and facilitate the coordination and release of the data they share.
Currently, if a patient is enrolled in a study, there may be a large lag in time between data generation and entry into any database. In that process, researchers need the ability to initiate the data integration process, track where data are within the process pipeline, and continually touch-base on all requirements. This process often takes place over email and with significant manual effort. However, the Data Tracker is meant to function as a truly collaborative data hub, connecting people across teams and institutions to turn their research data into rich, usable information with immediacy.
Dr. Heath says her team is aiming to lower the energy barrier for data entry, making it simpler for even the rarest types of data to be easily integrated into the framework. If it is simpler and more efficient to enter data, less data will be lost over time.
It is true that while there is much standardization across the practice of science and particular medical research, the breadth and depth of data, even within disease areas, still leaves room for many differences in how researchers collect, organize, and categorize their data. However, Dr. Resnick notes that years of manual standardization has yielded and continues to yield invaluable experience related to the needs and challenges of data management. It is the experience of those who have worked to manually input data that allows ³for the development of automated processes within Data Tracker.
IMPLEMENTATION ACROSS THE D³B RESEARCH ENVIRONMENT
The current iteration of the Data Tracker project now expands well beyond its original intentions as a data entry point to a full collaborative hub where all stakeholders have immediate access to updated processes and latest information. Its utility for data management and integration within the Kids First DRC has led to expanded use across D3b-managed platforms and datasets. Of particular importance is the Data Tracker’s integration with the Pediatric Brain Tumor Atlas (PBTA), which includes all the data generated through CBTN efforts, making it the largest source of pediatric brain tumor data in the world.
A key strength of the PBTA is its richness—it provides a multidimensional view of subjects by connecting longitudinal clinical data with molecular characterization data, imaging data, methylation data, and preclinical models, providing researchers with the opportunity to understand which treatments worked and didn’t for patients. At the same time, these data allow researchers to examine the molecular characteristics of each tumor, to identify trends and potential clinical strategies. CBTN is integral to the continued development of the Data Tracker tool, particularly as the dataset grows with new data generated as a result of NIH’s support of sequencing the remaining collection of biospecimen in CBTN’s cohort.
The sequencing of these 7,000+ specimens represents the largest effort of its kind in the history of the NIH’s data generation program and will effectively quadruple the size of the PBTA. When the new data is returned, each subject’s molecular data will need to be linked to the corresponding clinical data, imaging, and preclinical models. The success of future research rests on ensuring that all data is harmonized efficiently and effectively.
D³b’s Data Tracker project is now beginning its next evolution with a Data Tracker 2.0 effort supported by the National Cancer Institute (NCI), aiming to build the extensive data infrastructure needed to support the real time integration of asynchronous sources. Responding positively to the Data Tracker proof-of-concept submitted by the D3b team, the NCI intends to incorporate the Data Tracker into their existing data process flow within its Childhood Cancer Data Initiative (CCDI).
As part of this developing partnership with NCI, the team has also begun to utilize more data available through D³b to develop ways for the Data Tracker to handle more and more types of data. This includes more Children’s Brain Tumor Network’s (CBTN) data focusing on pediatric cancers of the central nervous system, and the INCLUDE (Investigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) Data Coordinating Center.
The vision for the Data Tracker 2.0 framework is to enable patients enrolled into new studies to have their existing clinical data immediately entered into D³b’s data infrastructure. As more data around that patient are generated, through genomics, sampling, imaging, etc., they can be incorporated into the database in real time.
Expanding the types of data being tested through the Data Tracker will allow developers to adapt to a broader range of condition types. The Data Tracker is filling a foundational need for D³b, but its impacts will support innovation within and beyond the field of pediatric cancer research.