Delivering Data Driven Value

Data Curation as a Key Element in Successful Data Science Strategy

Data science is becoming a key discipline in pharmaceutical research. A successful Data Science strategy requires high-quality structured, integrated data collected from internal sources (for example bioassays) and external sources (literature, patents, drug labels, etc). Manual curation by domain experts is the key approach that allows companies to get from unstructured data spread across thousands of sources to high-quality datasets in order to gain new insights, make discoveries, and speed up drug development.

Curation at scale is a complex process that involves the development of strict protocols and standards, sophisticated infrastructure, and management of a large number of curators. The complexity of manual curation makes it a costly – but necessary – process, and that is why it is so important to get it right. This webinar will explore the processes, best practices, and pitfalls of manual data curation.

Data Science is the future: Join this webinar to learn about how to get there with high-quality, manual curation.

Featured Topics

  • Key learnings about manual curation best practices from the leader in manual curation based on 20 years of experience in curating biological knowledge and data.
  • FAIRification of Clinical Trial Data at Roche

Learning Objective

At the conclusion of this session, participants should be able to:

  • Employ key strategies and avoid pitfalls when curating data to generate high-quality structured knowledge and datasets.

Sponsored by:

Speaker Bios

Frank Schacherer, PhD, VP Products and Solutions, QIAGEN Digital Insights, QIAGEN GmbH
Dr. Frank Schacherer leads QIAGEN’s program for information systems, knowledge content and machine learning in discovery research. He joined QIAGEN in 2014 with the acquisition of BIOBASE where he served as a managing director. Dr. Schacherer has more than 20 years of experience in management, software, and database development. He holds a Ph.D. in bioinformatics. His current interest is in translating the promise of data science and AI into useful solutions for understanding biological systems.

Rama Balakrishnan, PhD, Biomedical Ontology Specialist, Genentech
Rama received her Ph.D. in Biophysics from SUNY Buffalo(NY) and was a post-doctoral researcher in the Biochemistry Department at Stanford University(CA). She then moved to managing genomics databases and developing ontologies for biomedical domains also at Stanford. She continues to contribute to data curation and ontology development at Genentech/Roche.

Joshua Bernal, Data Curator, Genentech
Josh studied Biology at UC Berkeley and moved into Data Management shortly after. He has 15 years of combined CRO, Vendor, and Pharma Data Management and Data Curation experience.

HELM Wiki

Home of the HELM notation – representing complex biomolecules

Collaborative Observational Health Research Using OHDSI Methods

This webinar will present clinical data standardization and harmonization using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), standard vocabularies, and standardized analytics. It will include contributions to the European Health Data & Evidence Network (EHDEN), an Innovative Medicines Initiative (IMI 2) consortium with 22 partners operating in Europe.

Speaker

Maxim Moinat is a Data Engineer, specialized in Medical Informatics at The Hyve, in Utrecht NL.

The Architecture of FAIR Data Platforms: COLID and EDISON at Bayer and Roche

Bayer and Roche are leading biopharmaceutical companies, which each have a diverse and distributed ecosystem of platforms to manage data and metadata used by different parts of each organization.


Corporate Linked Data Made FAIR
 (COLID) was developed at Bayer as an open-source technical solution for corporate environments that provides a FAIR metadata repository for corporate assets based upon semantic models. COLID assigns URIs as persistent and globally unique identifiers to any resource. The incorporated network proxy ensures that these URIs are resolvable and can be used to directly access those assets. The data model of COLID uses RDF and provides content through a SPARQL endpoint to consumers. COLID is both a management system for resolvable identifiers and an asset catalog. It is the core service to realize Linked Data in corporate environments and therefore an essential cornerstone for FAIR data management at Bayer.

The EDISON platform at Roche enables prospective FAIRification of data at the point of entry to the company, by harmonizing, automating, and integrating very heterogeneous and complex processes across multiple departments, building in data standards and quality checks at every step of the process. The EDISON platform is built as an ecosystem of self-contained microservices to ensure maximum performance, scalability, and low maintenance. The current scope of EDISON is clinical non-CRF data, but the platform is scalable and flexible to cover a large variety of data models, both clinical and non-clinical.

This webinar will present the technical details for each of these FAIR data platforms. They each enable seamless access across their respective corporate data ecosystems. They both exploit machine-readable, FAIR Knowledge Graphs to allow for accessing and combining multiple and disparate reference data systems which serve non-experts with intuitive and user-friendly ways of finding and exploring FAIR data. The two webinar presentations will be followed by a Q&A panel discussion.

Speakers

  • Goekhan Coskun, Principal Information Strategist, Bayer, Germany
  • Holmfridur Thosteinsdottir, Head of Clinical & Biomarker Informatics, Roche, Switzerland

For more information about the Pistoia Alliance’s FAIR Implementation project, please contact us.

Collaborative Observational Health Research Using OHDSI Methods

This webinar will present clinical data standardization and harmonization using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), standard vocabularies, and standardized analytics. It will include contributions to the European Health Data & Evidence Network (EHDEN), an Innovative Medicines Initiative (IMI 2) consortium with 22 partners operating in Europe.

Speaker

Maxim Moinat is a Data Engineer, specialized in Medical Informatics at The Hyve, in Utrecht NL.

DataFAIRy Bioassay Annotation Pilot Project

In 2020, a team of scientists from AstraZeneca, Bristol Myers Squibb, Novartis, and Roche set forth to find a way to convert unstructured biological assay descriptions into FAIR information objects.

In this talk, we will present the lessons learned in the pilot project to annotate bioassay descriptions (bioassay) en masse and will chart a way to expand this effort in the future.
 

  • Isabella Feierberg, Associate Principal Scientist, AstraZeneca
  • Dana Vanderwall, Director of Biology & Preclinical IT, Bristol Myers Squibb
  • Rama Balakrishnan, Biomedical Ontology Specialist, Genentech
  • Martin Romacker, Senior Principal Scientist in Scientific Solution Engineering and Architecture, Roche
  • Samantha Jeschonek, Research Scientist, Collaborative Drug Discovery
  • Timothy Ikeda, Automation Principal Scientist, AstraZeneca
  • Gabriel Backiananthan, Novartis
  • Anosha Siripala, Technical Associate Director, Scientific Products, Novartis Institutes for BioMedical Research (NIBR)

Lynx: A FAIR-Fueled Reference Data Integration & Look up Service at Roche

Roche, as a leading biopharmaceutical company and member of the Pistoia Alliance, has a diverse and distributed ecosystem of platforms to manage reference data standards used at different parts of the organization. These diverse reference data standards include ontologies and vocabularies to capture specifics of the research environment and also to describe how clinical trial data are collected, tabulated, analyzed, and finally submitted to regulatory authorities. In the context of the EDIS program, Roche has bridged these parts to improve reverse translation from studies into research and also embraced FAIR to emphasize machine-actionability and data-driven processes.

In this webinar, we will present and provide technical details of Lynx, a FAIR-fueled system to enable seamless access across that ecosystem. On the one hand, Lynx exploits machine-readable, FAIR Knowledge Graphs to allow for accessing and combining multiple and disparate reference data systems. On the other hand, Lynx bridges the gap for non-experts with an intuitive and user-friendly way of finding and exploring FAIR data.

Speakers

Dr. Javier D. Fernández is a Senior Information Architect at Roche in Basel, Switzerland.

Ontologies Mapping Resources

This area contains public resources for ontology consumers and providers to support practical application and mapping.

The Ontologies Mapping project was set up to create better mapping tools and services, and to establish best practices for ontology management in the Life Sciences. For our purpose, ontologies can include hierarchical relationships, taxonomies, classifications and/or vocabularies which are becoming increasingly important for support of research and development.

Informed Consent in Clinical Trials – Application of Blockchain Technology

Patient ownership and control of personal data and increased regulatory compliance are key areas of improvement in clinical trials. Blockchain is a form of Distributed Ledger Technology (DLT) that supports trust, immutability of transactions and prevents single point of failure.

Blockchain technology involves the implementation of Decentralized Identifiers (DID), ‘virtual’ wallets, Verifiable Credentials (VC), smart contracts, and a blockchain layer to record transactions between parties during the clinical trial Informed Consent process. Blockchain technology puts patients in control of their identity and their data and has the potential to significantly change how patients are enrolled and participate in clinical trials. Further, blockchain technology puts sponsors in control of the Informed Consent process and documentation and has the potential to increase speed and compliance through ‘anytime’, real-time auditing.

QDatE Best Practice Guidelines

These Best Practice Guidelines are created to provide a strategy that derives as much value as possible to
the study collecting the data, the participants who supply the data and any external users who are reusing
the data outside of its original intended use.

QDatE Code of Ethics

This Quality Data Generation and Ethical Use (QDatE) code of ethics is complementary to the Best Practice
Guidelines and will ensure that the sensor-generated data from remote monitoring technologies (SDRM) is collected, stored, governed, used, and reused in a way that utilises the data to the best of its potential.