Our programs produce a vast amount of data. There is huge potential to unlock insights from this data in new and innovative ways. Last year, together with Dimagi, we tested a new algorithm designed to be run on community health datasets. Tracey Li, D-tree senior data lead, shares her reflections from the test below.
Recently, we have been thinking a lot about how to maximize the impact of the data that is collected within the digital community health programs that are at the heart of D-tree’s work. We have been investing more effort in building capacity within governments so that they are able to more effectively monitor programs using the available data and we are actively working to improve data governance within the Zanzibar health system. We have also been exploring the use of machine learning to enable differentiated care to be provided to clients, based on individuals’ health needs.
Continuing our focus on data, we partnered with Dimagi, as part of a Community Health Impact Coalition initiative, to test an outlier detection algorithm that Dimagi co-developed, which is based on sophisticated data science techniques that are described in Ben Birnbaums’s 2012 research paper. The algorithm is designed to be run on a community health dataset, and identifies whether the data reported by each Community Health Worker looks ‘normal’ or not, compared to the data reported by all the other Community Health Workers. It will flag data that looks unusual. For example, if most Community Health Workers reported last month that around two percent of the children they visited had symptoms of malnutrition, the algorithm should highlight data from those that reported a much higher rate. However, the algorithm had limited testing in real life and therefore no one knew, for sure, what sort of outliers it would identify, and whether that information would be useful in any way. We were excited to be one of the first organizations to try it out and test it on data from the Jamii ni Afya program in Zanzibar.
The algorithm generated some insights about the population that surprised us, and many insights about the performance of Community Health Workers. One surprising insight about the population was revealed when the algorithm highlighted a Community Health Worker who had enrolled a lot more boys than girls into the community health program. Most enroll roughly equal numbers of boys and girls, which is what we expect because boys and girls are distributed roughly equally throughout the population and the health workers should not have any bias towards either gender. Therefore, it is strange if there is a large imbalance. We wondered whether this worker was experiencing a technical problem which was resulting in her incorrectly reporting the gender of some children, or whether she was biased in her service delivery. When we looked into this, it turned out that there were actually just a lot more boys than girls in that particular village.
“We wondered whether the health worker was experiencing a technical problem which was resulting in her incorrectly reporting the gender of some children, or whether she was biased in her service delivery”
In other cases, the algorithm identified Community Health Workers who were issuing referrals to an unusually high percentage of their clients. We thought this might be an indication of a high prevalence of serious health conditions in those areas, which is something that health officials should be notified about. However, when we interviewed them, we found that they did not fully understand how to screen a client for symptoms to identify whether a referral is required, and therefore had been issuing some referrals unnecessarily. We noted that they require additional training and supervision on the topic of referral screening.
“Our findings showed that the algorithm has the potential to be extremely valuable if it can be operationalized, as it would enable us to systematically monitor data at the level of individual Community Health Workers.”
Our findings showed that the algorithm has the potential to be extremely valuable if it can be operationalized, as it would enable us to systematically monitor data at the level of individual Community Health Workers. Although we have systems to continuously monitor indicators at the district and national levels, we do not currently have an efficient system for monitoring at a more granular level. The example described above, where a health worker is found to be issuing an unusually high number of referrals, is one case that would be useful to the supervisor. In the event that they had been correctly issuing referrals, meaning that there really were a high number of clients experiencing serious symptoms, the supervisor would be able to take the necessary actions if the cause was, for example, a disease outbreak. And in the case that we found, where the Community Health Worker had been wrongly issuing referrals, the supervisor would know that they need to provide that particular person with additional training. In either case, the information would help the supervisor to know how to use their time effectively. This is vitally important in places like Zanzibar where all supervisors are full-time members of staff at health facilities and have heavy workloads due to a lack of resources.
We see great potential for a type of algorithm such as this to be operationalized and help leverage the power of machine learning to help systematically monitor the data captured. While supervision is often adequate at the health system level, there is need within digital health programs to flag these types of outliers to identify where more supervision or oversight is needed. This is especially true in places like Zanzibar where supervisors are FT members of staff in health facilities and have heavy workloads.
We thank Dimagi and CHIC for providing us with the opportunity to participate in this initiative, as well as the Johnson & Johnson Foundation for its generous support, and look forward to continuing this work on behalf of all DH partners.