The Pistoia Alliance’s Semantic Enrichment of ELN Data (SEED) project is moving forward, to enter an exciting new third phase. The SEED project addresses the challenge facing R&D from the vast volumes of captured experimental data locked in Electronic Lab Notebooks (ELNs). It is still common for scientific research to be recorded in an unstructured text format (non-machine-readable format) with inconsistent context (vocabularies), which vastly reduces the potential for direct intelligent analysis. These unusable and unsearchable data entries have been a significant barrier, resulting in duplicated experiments and vast amounts of time spent tracking down and extracting data.
Project Champion Steve Penn, Director, Medicinal Sciences Information Strategy Lead at Pfizer, said the SEED project is important to the pharma and life sciences industries because it helps accelerate digital transformation. When project member companies collaborate on data standards and ontologies together, improvements to the common data infrastructure create richly annotated and structured data. This, subsequently, is a precursor to project member data being unambiguously interpreted and used throughout the drug development lifecycle.
The project contributors include Pfizer, AstraZeneca, Bristol Myers Squibb, Scibite, Bayer, Biogen, Southampton University, GSK, CDD, Elsevier, Linguamatics, Merck, Sanofi, and Takeda.
Pistoia Alliance member Sanofi is already realizing value from the project and plans to align its ADME assay metadata with the new ontology classes added by the SEED project to BAO. This will make Sanofi’s assay data compliant with the FAIR principles.
“Phase 3 will build on the success of phase 2, by agreeing upon a high-level experimental data schema, establishing a consistent top-down framework for building all experiment ontologies,” Penn said. “The goal is to make semantically rich data linked to a backlog of priority domains across industry members.”
Phase 2 offered additional data in the form of attributes, mappings, and annotations to create relationships between ontology classes to help to describe and define them. Delivery was centered around the ADME, Pharmacology, and Drug safety Domains, Penn said.
The first phase delivered publicly available ADME and Pharmacology assay ontology (industry-standard in BAO). The ontology work for ADME and PD assays mapping incorporated the hierarchical taxonomy in addition to integrating synonyms and mappings to CDISC SEND Study type (SSTYP) and eCTD M4. Following the ontology development was a working prototype of an agnostic solution for semantic enrichment of unstructured data within an eLN using the ontologies developed (thanks to Scilligence and Scibite for this collaborative effort). This solution unlocked the data’s value for future analysis, aligning with the FAIR principles.
SEED is an excellent example of the benefits Pistoia Alliance members can gain through collaboration, Penn said. “Before, enhancements on the data standards and ontologies were being developed individually, and eLN companies received numerous overlapping requirements across pharma,” he said. “Now, co-development of data standards, ontologies, and strategies is having significant positive impacts and leaning towards influencing eLN providers roadmaps. The Pistoia Alliance continues to be the fulcrum for this delivery.”
Penn is calling on Pistoia Alliance member companies to join this exciting new phase of the SEED project, to bring ideas, funding, and user stories to the table as new participants. “With relatively small investments across industry members we can make huge collaborative strides on the data standards and common data infrastructure to the greater benefit of the drug development lifecycle,” Penn said.
To find out more about the SEED project, email Project Manager Gabrielle Whittick via ProjectInquiry@PistoiaAlliance.org.