High-quality data is essential for identifying at-risk populations and guiding effective prevention. In our latest blog, ARC East of England researchers Jonathan Goodman and Dr. Jo Reid explore the challenges of incomplete datasets, the important role of ethnicity data, and how we can better measure and understand people’s health and care needs.
Image (left to right): Jonathan Goodman and Dr. Jo Reid
Jonathan Goodman is an ARC East of England researcher, and Dr Jo Reid is the Deputy Theme Lead of the ARC East of England Measurement in Health and Social Care theme. Both are based at the University of Cambridge.
An ongoing challenge in public health is determining the most effective ways to identify risks and factors contributing to health inequity. For example, some populations living in a lower socioeconomic context are more likely to develop chronic illnesses such as type II diabetes. Similarly, a person’s ethnic background can predispose them to various ailments, including diseases like cancer.
In theory, analysing health data can help researchers and clinicians to identify risks at the population level. For example, if we know that elderly people living in a particular part of the country are at a greater risk of respiratory illnesses, we can target interventions to those areas.
“By tailoring preventative practices to local populations, we can reduce the risk of many negative health outcomes, both physical and psychological, and lessen the societal burden of diseases. However, in practice there is an absence of complete, representative datasets, which makes projects like this difficult and, in some populations, impossible.”
Jonathan Goodman and Dr. Jo Reid, ARC East of England researchers
Datasets may be incomplete because data are not collected or were not provided by individuals or particular groups. This is especially problematic if these absences are more common in some populations than others.
Tackling the challenges of data collection for equality
To address this issue, our research aimed to better understand how to improve data quality for detecting, monitoring, and addressing health inequalities. We found that gaps in data can arise from various factors, including technical challenges in linking existing datasets and social problems such as low trust in health and scientific institutions, especially among individuals with lower levels of education and historically marginalised communities.
These problems cause a serious roadblock in our collective efforts to improve health equity through the effective use of data and data-driven models. If we cannot see the specific health-related issues faced by specific communities, we cannot introduce preventive measures against these issues, putting the broader notion of data-powered public health in jeopardy.
However, we are working on addressing this. Over the last year, we have consulted with colleagues working in healthcare in the East of England to develop case studies about how to address issues related to data collection, maintenance, and linkage for the betterment of population health.
The important role of ethnicity in data collection
The first of these relates to ethnicity. We know that this characteristic is a critical element of understanding population-specific health-related risks. Ethnicity can indicate a likelihood of developing particular genetic diseases, such as Tay-Sachs disease in people of Ashkenazi Jewish descent, or it can help clinicians target measures, such as mental health interventions, towards historically mistreated populations. While ‘ethnicity’ has no universally accepted definition in the clinical or public health sciences, its use as a category in data collection and analysis is essential for population subgroups facing shared challenges who may benefit as a group from targeted interventions.
Recognising this, the Performance and Analytics team at the Cambridgeshire and Peterborough NHS Foundation Trust (CPFT), who specialise in using data to identify risks for specific subgroups, learnt that different clinical departments within their Trust varied in the quality of reporting and collecting ethnicity data. To improve reporting, they developed a dashboard so that teams could view their ethnicity data collection rates. They also ran workshops to train staff about the importance of ethnicity data collection. These efforts were positively received by staff, encouraging improved data collection. In 2022, 18% of patients had their ethnicity recorded as ‘not stated’ in the Trust’s databases; by the end of 2023, this had dropped to less than 5%.
Led by staff from the Suffolk and North East Essex Integrated Care System, another initiative aimed to link datasets across the local region to improve ethnicity data completions at the integrated level. Partnering with the private technology firm Optum, the project leads successfully linked GP, Trust, and other local datasets, increasing ethnicity data completeness from 70% locally to nearly 94% across the area.
“Both cases highlight the importance of senior-level staff buy-in, effective training, and improved dataset linkage and compatibility for improving the completion rates of critical information (e.g., ethnicity) for combatting health inequities.”
Jonathan Goodman and Dr. Jo Reid, ARC East of England researchers
Utilising data for health equity
Our research highlights the need for two more important steps that are essential for improving health equity across the country and beyond. First, we recommend establishing Data Co-ordinator positions working across NHS Trusts, GP services, and universities to drive improved data linkage and to design and deliver workshops training staff on the importance of accurate data collection. Establishing such roles would improve the completeness of recorded data, strengthening the robustness of analysis used to inform public health interventions. In addition, it will save significant resources for different organisations, as currently senior staff often have to dedicate a significant amount of their time to dealing with administrative tasks associated with linking datasets.
For example, a recent study in New Zealand found that linking administrative datasets for a cohort of 859 participants required around 26 staff hours for coordination, with data access taking between 96 and 854 days and manual review needed for 6-78% of records depending on dataset quality. While no equivalent UK figures exist, these results serve as a useful benchmark for estimating the time and resources required for similar projects.
Second, we highlight the critical role of engaging with patients about their concerns of data collection and storage, with the hope that through effective policy and ethical practices, we can rebuild trust with the public and especially those from groups with historic reasons for mistrust, e.g., Roma communities.
We are currently working on a project exploring patients’ preferences regarding how they place trust in people and institutions with their data, relying on a mixed-methods approach using focus groups and a discrete choice experiment, which is a type of survey used to understand people’s preferences on a particular topic. Through both projects, we aim to develop the groundwork necessary for building trust in healthcare institutions, with the hope that by doing so, we’ll be able to help address the gaps in data that prevent us from finding the groups that need our help the most.
To hear more about the studies, email Jonathan Goodman via jrg74@cam.ac.uk.