Data science: Generating insights for better health

January 31, 2022 by Emily Carnahan and Anna Volbrecht

Data science is an essential ingredient in the pandemic response—and for health system strengthening.

PATH Zambia malaria data collection. Photo: PATH/Gabe Bienczycki

A health worker in Zambia uses a mobile phone to gather data about malaria transmission. Photo: PATH.

Editor's note: this article was originally published on on December 14, 2021.

When health system managers need to make decisions quickly, they often rely on data and the insights that they generate. While the proliferation of digital technologies has put such insights at their fingertips, the underlying data science makes those insights rich and meaningful.

Data science—the tools and approaches used to collect, manage, analyze, and use data—is a building block for data-led health systems and services. As the global health community focuses on building the digital future of health, we must include data science tools and approaches.

Data science and the pandemic

On 9 January 2020, the WHO published a statement about a cluster of pneumonia cases—which has since evolved into the COVID-19 pandemic. But on 31 December 2019, a small Canadian start-up called BlueDot received an alert about a potentially spreading novel coronavirus. Using an AI-based algorithm, BlueDot analyzed media reports and other resources to identify a potential outbreak. Then, they used travel data to understand how this new virus might spread.

Fast-forward to December 2021: the world is entering the third year of a global pandemic. And while BlueDot made headlines early in the pandemic, data science tools have supported our global response every day since.

However, access to data science tools and expertise varies across contexts.

“Despite decades of investment, COVID-19 has revealed the great gaps that exist in the world’s ability to forecast, detect, assess and respond to outbreaks that threaten people worldwide.”
— Dr. Michael Ryan, Executive Director, WHO Health Emergency Programme*

To better understand how data science can be used for pandemic and epidemic response, PATH’s Digital Square initiative mapped different types of digital and data tools across each phase of an outbreak. The Digital Applications and Tools Across an Epidemiological Curve (DATEC) provides a guide to how digital (and data science) tools should be leveraged throughout the pandemic.

Digital Applications and Tools Across an Epidemiological Curve. Seattle: PATH/Digital Square; 2021.

Digital Applications and Tools Across an Epidemiological Curve. Seattle: PATH/Digital Square; 2021.

There are many examples of how data science is used in real life. Geospatial modeling can identify “hot spots,” and predictive models identify the most at-risk communities. Screening tools, like THINKMD’s COVID-19 app, allow users to assess their symptoms. Organizations like Praekelt and Johns Hopkins have created chatbots to answer questions and address misinformation.

The future of infrastructure

The COVID-19 pandemic has also highlighted the need for digital health and data tools and the need for a robust, sector-wide backbone—referred to as digital public infrastructure (DPI).

Similar to physical infrastructure like roads, bridges, and water systems, DPI is widely used and benefits the whole of society. DPI includes the connections between digital systems (interoperability and global standards), the digital tools and approaches themselves (like global goods or digital public goods), and—critically, the data science assets and expertise required to create data-driven insights.   

By ensuring data science is a central component of DPI, the global community will create new opportunities to strengthen the use of data science across health systems.

Expanding access to data science assets

In late 2019, Digital Square and the Rockefeller Foundation brought together an interdisciplinary group of experts to discuss the need for a shared approach to accelerate the use of data science in public health. Experts in digital health, data science, and public health, representing the public and private sectors from countries worldwide, iterated on a concept for a Health Data Science Exchange.

A Health Data Science Exchange: Value Proposition and Compendium of Assets. Seattle: PATH/Digital Square; 2020.

A Health Data Science Exchange: Value Proposition and Compendium of Assets. Seattle: PATH/Digital Square; 2020.

With the COVID-19 pandemic, it became clear that countries exist along a spectrum of data science expertise and use. India, for example, brought together the government, private sector, and NGO partners to gather and analyze data to forecast COVID-19 cases, model potential scenarios, and monitor mobility patterns to reduce the virus’s spread. We also saw countries learning from one another, sharing tools and approaches that allowed them to use data for action quickly.

This type of inter-country sharing was precisely what a Health Data Science Exchange was envisioned as—a platform, or exchange, where stakeholders could convene to learn about existing data science tools and approaches, understand how they have been applied in other contexts, and share their own experiences with data science.  

With greater recognition of the importance of data science, a platform like the Health Data Science Exchange could create new mechanisms for countries to learn from one another and for the global community to coordinate support. Such coordination is happening across our sector—from the new WHO Berlin Hub for Pandemic and Epidemic Intelligence to the Digital Health Center of Excellence (DICE) launched by UNICEF and the WHO. These initiatives strive to support greater sharing and use of data and capacity enhancement in countries around the world.

As countries continue to mature their data capabilities and demand for data insights continues to grow, we have an opportunity to promote and share high-quality, reusable, and proven data science assets.

*Quote from WHO, Germany open Hub for Pandemic and Epidemic Intelligence in Berlin