Designing Trust: A step-by-step guide for applying Human-Centered Design principles in creating benchmarking datasets for training and testing large language models to be used in clinical decision support
Purpose and objectives
This playbook provides a methodology for applying human-centered design (HCD) to develop locally relevant datasets for training and evaluating medical large language models (LLMs). The objectives of this playbook are to:
- Establish processes for gathering contextually appropriate medical scenarios from frontline health care workers (FLWs).
- Ensure artificial intelligence (AI) benefits reach underrepresented populations
- Address bias in existing LLMs by incorporating diverse medical knowledge
- Create frameworks for ongoing dataset improvement
- Empower local stakeholders to shape AI tools reflecting their needs
How to use this playbook
Use this resource as:
- A guide for creating inclusive medical datasets
- A complement to evaluation frameworks
- A roadmap for replicating methodologies across contexts
Publication date: September 2025