Project Charter
Explore the use of Large Language Models for biological research, and define the best practices for doing so, using target discovery and validation as the initial use case
Process: From ideation to delivery
Project Workflow
-
Ideation
-
Discussion & Validation
-
Problem Statement & Business Case
-
Member Funding
-
Project Live
The Challenge
The use of Large Language Models (LLMs), such as GPT-4, presents a transformative opportunity for pharmaceutical R&D, particularly in target discovery and validation. Target discovery—a foundational process in drug development—requires the synthesis of large, complex datasets and the integration of proprietary research within the broader context of public information.
Proposed Approach:
We aim to harness prompt-tuned LLMs combined with Retrieval-Augmented Generation (RAG) methodologies to generate plain-English answers to typical target discovery questions. By focusing on highly structured public datasets, this project will establish a scalable, open-source pipeline tailored for the demands of target discovery.
Key Objectives:
- Guidelines for LLM Integration: Develop a robust framework outlining the most effective strategies for deploying LLMs in target discovery, ensuring reproducibility and transparency.
- Open-Source Innovation: Deliver a practical, community-driven pipeline leveraging LLMs for target discovery, reducing redundancy and promoting collaborative advancements in biological research.
Why Target Discovery?
This process is universally relevant across pharmaceutical R&D and exemplifies the challenges LLMs can address—namely, mining vast and intricate datasets to produce actionable insights. By solving these challenges, we pave the way for broader applications of LLMs in the scientific and industrial research landscape.
Through this initiative, we aim to define the role of LLMs in pre-competitive research, demonstrating their potential to accelerate drug discovery and enhance collaboration across the life sciences sector.