Using human-centered design to shape locally relevant AI in health care

September 23, 2025 by Stella Wanjiru and Tara Newton

By embedding human-centered design in large language model development, Living Labs is pioneering a replicable model to ensure artificial intelligence benefits are more equitably distributed.

Photo: Oscar Macharia (left) and Naomi Nduitha (right), clinical staff at Penda Medical Centre in Nairobi, use AI to help diagnose patients and determine treatment plans.

Oscar Macharia (left) and Naomi Nduitha (right), clinical staff at Penda Medical Centre in Nairobi, use AI to help diagnose patients and determine treatment plans. Photo: PATH.

In many parts of the world, health care workers deliver care under challenging conditions, often facing time constraints, staff shortages, and limited resources. As artificial intelligence (AI) tools become more integrated into health care, they offer new opportunities to support diagnosis and treatment. Among these innovations are tools powered by large language models (LLMs), which are applications of artificial intelligence trained on vast amounts of data to respond to queries and interact with users like humans.

At PATH, the LLM for Health Equity initiative is exploring how these technologies can be used safely and effectively in low-resource settings to support clinical decision-making, with a focus on health care systems in Africa.

Adapting innovations to local context

Most LLMs are trained on datasets dominated by Western medical knowledge, which often excludes the nuances of health care delivery in low- and middle-income countries (LMICs). For instance, existing AI for health tools may not be trained to consider diseases not common in Western countries or clinical practices common in African health care settings

In Kenya, the Living Labs team is addressing this challenge by co-creating locally relevant datasets and tools. By working directly with health care providers, the team aims to ensure that LLMs are trained on data that reflects the clinical realities and needs of frontline workers, particularly in LMICs.

The Living Labs team focused on identifying and engaging the right end-users, frontline health care providers, and technicians to co-develop scenarios that would train the models. The aim was to have health care workers share common scenarios or questions they encounter in their daily workflow. This included, for example, the questions they would ask a patient to determine what tests, treatments, or referrals to consider. These scenarios and questions are then used to teach the LLM to think like an expert-level Kenyan health care provider.

To kick off this work, Living Labs weighed important decisions: how many health care workers to involve, from which level of the health care delivery system, from which regions, and how to balance their time and availability with project resources. The goal was to root the dataset in real-life service delivery while ensuring inclusivity and feasibility.

Co-creating with nurses

The Living Labs team applied its human-centered design (HCD) approach to every phase of the project:

  • Introduction and discovery: The team began by introducing the concept of AI and LLMs for clinical decision support and gathering real clinical scenarios during workshops held in three Kenyan counties.
  • Engagement: We partnered with primary health care (PHC) nurses, who helped us develop realistic clinical scenarios and questions they have during patient encounters. A group of clinicians reviewed the scenarios and responded to the accompanying questions. These “vignettes” captured the nuances of care in Kenyan facilities, like deciding which lab test to order when working alone or when to refer a patient with overlapping symptoms.
  • Testing: We tested five LLMs using these real-life cases. A panel of expert clinicians reviewed both the nurses’ questions and the LLMs’ answers, evaluating them against national standards.
  • Evaluation: With support from the University of Birmingham, we applied a rigorous framework to assess the quality, relevance, and clinical safety of each of the LLMs' responses.

A unique approach

Being one of the first of its kind, the project broke new ground by integrating HCD into the development of datasets and testing of AI-enabled tools. It was also a personal milestone for team members; working at the intersection of AI and HCD was a new, rewarding experience. The project showed how LLMs, if designed correctly, could support overstretched, under-resourced health systems.

For nearly all participating nurses, this was their first experience with AI. The team began with the basics by explaining what LLMs are and how they could support daily tasks. Despite the learning curve, the response was overwhelmingly positive. Nurses were excited about the potential for support, especially in situations where they had to make clinical decisions without immediate access to a doctor. For example, deciding which medication to prioritize for overlapping symptoms or what further tests could be ordered.

88

Workshops allowed nurses to articulate their knowledge and decision-making needs, using audio recordings to capture scenarios in their own words. Photo: PATH/Wilkister Musau.

Outputs, impact, and next steps

This work will generate three key outputs:

  • A contextualized dataset of clinical scenarios grounded in Kenya’s medical realities and guidelines.
  • A playbook documenting the process, tools, and evaluation criteria to support replication in health care contexts.
  • A manuscript detailing the study protocols, design, methods, and outputs.

We have already shared lessons with teams in Rwanda and Nigeria, where they conducted similar studies. In Kenya, the focus has been on PHC nurses, while in other the countries, the work targets community health workers. Each context requires unique datasets, but the co-creation approach remains the same.

By embedding HCD in LLM development, Living Labs is pioneering a replicable model to ensure AI benefits are more equitably distributed. The project serves as a blueprint for designing AI that understands and serves the realities of health care providers across the globe.

We hope the newly released playbook will serve as a resource for creating inclusive medical datasets, a complement to evaluation frameworks, and a roadmap for replicating methodologies across contexts.