Is clinician diagnostic performance associated with the use of machine learning–based systems?

The utility of clinical support systems is well established but what is less clear, is their adjunctive value in supporting human decision-making.

There have been many advances in artificial intelligence in recent years, especially in the development of algorithms to support clinical decision-making. In fact, some research has gone as far as suggesting that the diagnostic performance of deep learning models are equivalent to that of healthcare professionals.

Given that machine learning-based clinical support systems are designed to help health professionals, there are clear advantages for such systems, especially for population-based diagnostic screening tasks.

Nevertheless, until support systems fully replace encounters with healthcare professionals, the latter retains ultimate responsibility for any diagnostic or future care plans and it would perhaps be unwise to follow, without further evidence, the advice provided by the support system. In other words, it is still the healthcare professional’s assessment and interpretation of the support system’s output, not the output itself, which is the final arbitrator of future patient care.

Rather than focusing on the ability of clinical support systems themselves, which is well documented, a team from the department of surgical sciences, Oxford university, UK, wondered about the available evidence on the adjunctive value of clinical support systems in practice, i.e., a comparison of the human vs human plus support system.

The researchers conducted a systematic review of the literature, searching for articles in which machine learning-based diagnostic support systems were used as the intervention and where this had been compared to human performance alone.

The support systems were considered to be diagnostic it the output was qualitative information, e.g., benign vs malignant.

Findings

A total of 37 eligible studies were identified, each of which described support systems based on imaging modalities with breast and pulmonary diseases being the most common medical conditions examined.

In 107 results that were reported as being statistically significant (p < 0.05), there were 54 (50%) instances of where human decision-making was enhanced by the use of a support system, four (4%) where it decreased but 49 (46%) where there was no difference.

In discussing these findings, the authors commented on how the data clearly showed an absence of robust evidence that machine learning-based algorithms improved clinician diagnostic performance. However, they also reported on how many of the studies were at a high risk of bias and may not be generalisable because of a small number of participants involved.

They concluded that rather than simply dismissing the value of machine learning support systems, future studies should include better methodologies and evaluations to fully showcase the value of these systems.

Citation
Vasey B et al. Association of clinician diagnostic performance with machine learning-based decision support systems. A systematic review. JAMA Netw Open 2021.