Pre-competitive Real-World Benchmarking of AI Solutions

Date Submitted: November 23, 2024
Authors: Vladimir Makarov, Pistoia Alliance, and Alexandros Karargyris, MLCommons
Idea Originators (and Companies): Abhishek Pandey, Abbvie
Other supporting Individuals/Companies: Bayer, Amgen, AstraZeneca, Roche, US FDA
Identified Funders: Abbvie

Problem Statement

Many AI tools and software solutions are used for drug discovery and manufacturing today, with evidence of growth of investments in the AI field by the biotechnology and pharmaceutical industries. However, performance of many AI solutions is unclear increasing the risk of investment, hindering adoption slowing progress and business impact of AI.

Verification of performance claims made by the authors of such software requires effort. In the ideal world this testing would only need to be done once, however, in the competitive environment of the pharmaceutical industry it has to be performed individually by each company, resulting in duplication of effort. Available public leader boards address only the easiest use case of testing of publicly available models on publicly available data sets, ignoring using cases with real-world business impact. To test proprietary models against proprietary data requires strong IP protection, confidentiality and integrity measures. Without objective AI model performance data, the FDA and other regulatory bodies have limited ability to assess AI models in medicinal product approvals.

The proposed project aims to close these gaps through a collaborative effort organized by two major non-profit organizations (Pistoia Alliance and MLCommons) to enforce neutrality while enabling commercial opportunities

Idea Proposal and Value Proposition

We propose establishment of a system for benchmarking of AI tools on private assets (models and data) used for a multiplicity of use cases along the drug discovery, manufacturing, and clinical research value chain. The proposed vendor-neutral system should include benchmark data sets, specifically designed for the named use cases, evaluation metrics, and infrastructure for secure compute and IP protection, that would allow for benchmarking of AI tools without disclosure of the model code and the test data sets.

Pharmaceutical, biotechnology firms, and contract research organizations will benefit because they will be able to make AI technology investment decisions based on the objective evaluation of technologies, including the ability to assess the performance on the proprietary data sets without the need to disclose the data themselves. This de-risks AI investments and provides critical information before procurement and development decisions.
Technology vendors will benefit by knowing where their offerings stand in comparison with other solutions, thus being able to plan product improvements
Regulators will be able to assess the AI model quality with little duplicated effort
Patient advocacy groups can provide input and direction on benchmark use cases and metrics that capture patient population interests
The overall quality and speed of drug discovery R&D will be improved

Targeted Outputs

List of candidate AI/ML use cases in drug discovery, manufacturing, clinical research, that would benefit the most from benchmarking
List of benchmarks for these uses cases that are already in existence
White paper describing the problem space and the proposed solution
A community system for benchmarking, including the infrastructure for secure compute, homomorphic data transformation, or other appropriate IP protection tools; benchmark data sets for the specific use cases; and metadata management tools necessary for review of models and benchmark data sets without disclosure of any proprietary information
Metrics useful for evaluation of specific use cases

Example Use Case(s)

Possible Topics for Future Benchmarking of AI models

ADMET prediction
Compound melting point prediction
Solubility prediction
pH alteration prediction
Phenotypic profiling
High content imaging
Rare event detection

This proposal is not limited by these ideas, but includes actions to understand which use cases in the drug discovery industry would benefit the most from benchmarking of AI/ML tools.

Critical Success Factors:

Financial and/or intellectual participation of:

Multiple pharmaceutical firms
AI software vendor firms
Technology companies (e.g. Intel, Google, Nvidia, Microsoft)
Regulatory agencies
Patient advocacy groups

Why this is a good idea / Why Now

As of today, AI solutions are widely used in life sciences research. Over 75 AI-discovered molecules entered clinical trials since 2015, with CAGR of over 60%, and success rate in the Phase I trials of 80-90%, significantly higher than the historical average of 40-65%. (Reference: https://www.drugdiscoverytrends.com/six-signs-ai-driven-drug-discovery-trends-pharma-industry).

As a result of these early successes, investments in AI in biotech and pharma are increasing. 62% of respondents plan to invest in AI in the next two years, based on the Pistoia Alliance survey (200 expert opinions across Europe, the Americas, and APAC, reference:https://www.drugtargetreview.com/news/153454/the-pistoia-alliance-key-findings-on-ai/). Despite this success, the growth of the field is limited by the lack of standardized verification techniques for AI quality. Leaderboards like Hugging Face require standardized public benchmarks, which are not readily available or even allowed by privacy considerations for pharma R&D. (See the limitations of ADMET property prediction and related use cases above).

Testing of private AI/ML models or testing of models on private data require IP protection measures that are not possible on public leader boards. However, requiring each individual company to require such testing and benchmarking behind in- house firewalls results in duplication of resources and effort.

Last Updated on December 5, 2024 by Helen Taylor
Categories: New Ideas

Search...

Pre-competitive Real-World Benchmarking of AI Solutions

Problem Statement

Idea Proposal and Value Proposition

Targeted Outputs

Example Use Case(s)

Critical Success Factors:

Why this is a good idea / Why Now

Discover more...

Events

04 Oct 2024 - 01 Jan 2025

10 Mar 2025 - 11 Mar 2025

25 Mar 2025 - 26 Mar 2025

30 Sep 2025 - 01 Oct 2025