AI4TB: Moving forward on technical and decision-making challenges

Jspiegel
6 min readDec 11, 2019

As explained previously [link to previous blog post: Harnessing artificial intelligence systems to detect tuberculosis among ex-miners in South Africa], vast numbers of people from across southern Africa who came to work in South Africa’s gold mines were left with devastating occupational lung disease — particularly silicosis and tuberculosis (TB). Until 2012, little was done to address this legacy of injustice, leaving hundreds of thousands of active and former miners without the health social benefits to which they are entitled. Despite advances to improve the processing of compensation claims — which took over 4 years not long ago, the current average time from assessment to adjudicative decision is still more than 500 days! (See video interview with Dr. Barry Kistnasamy).

The overall goal of this project is to improve the efficiency of finding and assessing these miners, and hastening the processing of their claims, using artificial intelligence (AI) and other state-of-the-art technology. To accomplish this task, careful attention is needed to ensure accuracy and be scrupulously transparent about scientific and technical details that underlie the decision-making, as well as identify and begin to address the political, social, economic and ethical issues involved.

We have had an exciting time in advancing this project over the last six weeks — over several fronts. Specifically, we advanced four aspects of the challenge:

1) Using demographic and exposure information to determine likelihood of having a compensable lung disease, so this could be used to both prioritize clinical assessment (or what our lead knowledge user calls “eligibility” for assessment), as well as potentially improve accuracy of clinical diagnosis;

2) Using computer-assisted detection (CAD) of TB and silicosis on chest X-rays to determine a preliminary diagnosis of TB and/or silicosis or other, so this could be used to triage as part of a more efficient process requiring less scarce medical expertise;

3) Using lung function test results more efficiently, so this could speed up the assessment of degree of impairment, also necessary for determining benefits; and finally, and perhaps most importantly:

4) Identifying the arguments for and against using AI in this context, and exploring what, if any, action can be taken to mitigate residual concerns.

The centre-piece of the last 6-weeks (“Sprint 2”) was a trip by the international team to South Africa, which included meeting with a vast variety of stakeholders — mining company personnel, occupational hygienists, occupational medical experts, database managers, front-line nurses and medical practitioners and an array of others. This included visits to a mine site (use photo of one you took at the mine), interacting with mining-company based occupational medical practitioners at the health centre onsite (photos Y- use photo of Rodney and the two mining drs.), as well as a One-Stop centre in Carletonville (photo Z — the sign). We saw first-hand the diversity of resources available as well as the legacy of challenges (piles of files photo Jerry to send). The visit culminated with a presentation (see attached powerpoint) and discussion with the trustees and technical advisors of the Tsiamiso Trust that was established following the class action suit calling for social justice for gold miners — potentially major users of the technology being developed. Lively discussion occurred within an atmosphere of collaborative commitment to the tasks at hand. (photo Q- photo of the Nov 14th meeting — select the best)

Regarding the demographic/exposure algorithm, considerable progress has been made, including developing of a web-based prototype for determining — or self-determining — the likelihood of having compensable disease. The accuracy (balancing sensitivity, specificity), however, hovers only at about the 70% mark so far. This is far better than the considerably less than 30% of claims reviewed by a highly labour-intensive process turning out to being compensable. Nonetheless, more work is needed to improve the tool.

Some important observations were made, which were consistent with what we learned on the ground. For example, as the mining company occupational medical experts are comfortable with their clinical judgement, a much higher proportion of the cases sent to the MBOD for compensation of active miners turn out to be compensable, compared to cases that arrive at the MBOD for ex-miners — whose files often have not been pre-assessed at all by a medical practitioner. As such, while intuitively it would be expected that ex-miners would have a higher risk of having a compensable disease, our algorithm showed the opposite. This underlines the importance of separating out these diverse situations rather than combining data from disparate sources, and calls for more analysis — and especially training of the system on other databases besides “files sent to the MBOD”. A field study (reaching out and inviting ex-miners in Stilfontein to be assessed) was also conducted during the last six weeks. This will help in testing the tool developed, albeit the numbers are small (~ 70 ex-miners). The results also suggested the need to ascertain whether incorporating job risk data improves the models, and if so, how best to structure the job exposure data. This will be the subject of new experiments to come.

With respect to the chest x-ray CAD, our work in his second Sprint began by engaging an independent expert to review the false negatives in the AI trial of 330 chest x-rays completed in Sprint 1. A key finding from that work was that false negatives were not just milder cases of disease that were missed — but some very severe cases as well, including cases of silico-tuberculosis. As such, the need for more training of the AI systems is paramount. We note that the academic article we submitted about this experiment has now been peer-reviewed and approved for publication in a highly-ranked journal devoted to TB and other ling diseases. To address some of the concerns we identified, as well as identified by other experts worldwide, an experiment was conducted in which the CAD was done first and used to triage the cases to “2-panel” versus “radiologist+ 4-panel” assessment. While there has been a delay in completing this work due to resource limitations of the existing adjudicating body itself to complete the work, a modification has been proposed to maximize likelihood of successful completion of this experiment in the next round of experiments.

The state of affairs regarding lung function testing was presented and discussed with technical advisors and top experts in South Africa. Good progress was made in understanding the issues, and it was concluded that more development of the AI is needed in this area before our group can evaluate it — and especially provide important feedback to technicians to ensure adequate quality of their readings. As such, we will not likely include an experiment in this domain in the next 6-week period, but will come back to it in a few months. Meanwhile, we have decided to double our efforts at this stage on focusing on CAD for CXRs and predictions based on exposure, and less on lung function.

A key observation from our work in this period was that various stakeholders have considerable concerns that needed to be addressed before any widespread implementation can be pursued — as is appropriate for the introduction of any new technology. Amongst the concerns include the potential impact on de-skilling in existing medical personnel as well as dampening the emphasis on development of more occupational lung disease experts to meet ongoing and future diagnostic needs; concerns about transparency and accountability for clinical decisions made; issues related to data security; impact of private ownership of the technology on public sector funding; as well as biases and reliability of the AI, with the implications of inaccuracy or particular concern. In favour of AI, however, is the sheer magnitude of the task at hand, and the need for technological assistance to meet the challenge of reducing a 500-day backlog. Other arguments in favour of AI is the potential to create tools that ex-miners can themselves use to assess their own likelihood of having a compensable disease, and potentially more consistent adjudication than when decisions are left to diverse practitioners. We have developed a plan to ensure an even-handed assessment of each of these issues, their counter-arguments, the rebuttals to these points, and importantly, a careful description of how these issues apply in the current context and what mitigating measures can be taken.

Our overall conclusion for Sprint 2 is that while further experiments are needed, excellent progress is being made towards the development, implementation and evaluation of AI to meet the challenges at hand. Moreover, the various parties are committed to working together; and, as such, the prospect for a major breakthrough is very much alive.

--

--

Jspiegel

I am a Professor in the School of Population and Public Health at the University of British Columbia where I co-direct the Global Health Research Program.