Semantic Enrichment of Electronic Lab Notebook Data (SEED)

This project helped the life sciences industry unlock the value of data in Electronic Lab Notebooks (ELN) using data standards and semantic enrichment

Overview

The Semantic Enrichment of ELN Data (SEED) project addressed the challenge of the R&D industry having vast volumes of captured experimental data locked in Electronic Lab Notebooks (ELNs). Unusable and unsearchable data sets have presented a significant barrier to digital transformation; resulting in duplicated experiments and time spent tracking down and wrangling data.

This project was an important milestone in helping life sciences to accelerate digital transformation. It demonstrates the power of project member companies collaborating on data standards and ontologies to improve the common data infrastructure and create richly annotated and structured data subsequently used throughout the drug development lifecycle.

Project contributors included Pfizer, AstraZeneca, Bristol Myers Squibb, Scibite, Bayer, Biogen, Southampton University, GSK, CDD, Elsevier, Linguamatics, Merck, Sanofi, and Takeda.

SEED enables a FAIR (Findable, Accessible, Interoperable, Reusable) aligned, comprehensive, semantic capture and translation of data across ELN (Electronic Lab Notebook) providers at the point of entry. Computer-readable, standardized data increases the capacity for provenance and attribute connection for better insights and analysis by researchers. This results in higher quality experiments, reduced rework, and better decision-making.

The first phase of the project delivered publicly available ADME and Pharmacology assay ontology (industry-standard in BAO). The ontology work for ADME and PD assays mapping incorporated the hierarchical taxonomy in addition to integrating synonyms and mappings to CDISC SEND Study type (SSTYP) and eCTD M4. Following the ontology development was a working prototype of an agnostic solution for semantic enrichment of unstructured data within an eLN using the ontologies developed to unlock the data’s value for future analysis, aligning with the FAIR principles.

Phase 2 offered additional data in the form of attributes, mappings, and annotations to create relationships between ontology classes to help to describe and define them. Delivery was centered around the ADME, pharmacology, and drug safety domains.

“Before, enhancements on the data standards and ontologies were being developed individually, and eLN companies received numerous overlapping requirements across pharma. Now, co-development of data standards, ontologies, and strategies is having significant positive impacts and leaning towards influencing eLN providers roadmaps.” Project Champion, Steve Penn, Pfizer.”

The Journal of Biomedical Semantics published a paper on our SEED project. “An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows”. This highlights the rich set of expertise that the project brought together.