Here we present the mI-CLAIm checklist, a tool intended to improve transparent reporting of AI algorithms in medicine.
A schematic representation of the six components of a clinical AI study.
The application of artificial intelligence (AI) in medicine is an old idea, but methods for this in the past involved programming computers with patterns or rules ascertained from human experts, which resulted in deterministic, rules-based systems. The study of AI in medicine has grown tremendously in the past few years due to increasingly available datasets from medical practice, including clinical images, genetics, and electronic health records, as well as the maturity of methods that use data to teach computers. The use of data labeled by clinical experts to train machine, probabilistic, and statistical models is called ‘supervised machine learning’. Successful uses of these new machine-learning approaches include targeted real-time early-warning systems for adverse events, the detection of diabetic retinopathy, the classification of pathology and other images, the prediction of the near-term future state of patients with rheumatoid arthritis, patient discharge disposition, and more.
These newer machine-learning methods have clear advantages, including higher levels of performance, adaptability to more complex inputs (such as images), and scalability, over older rules-based systems. However, older rules-based systems had one clear advantage: by definition, the methodologies implemented in the programming code were more interpretable by medical professionals, as these actually came from experts. Newer methodologies have the danger of becoming more complex and less interpretable, even when sophisticated interpretation technique are used. Indeed, the potential lack of method interpretability has been called out as an area of worry. Unclear documentation on training and test-cohort selection, development methodology, and how systems were validated has added to the confusion. This is particularly important as more of these models make their way into clinical testing and into medical products and services, with many of these already bring approved by the US Food and Drug Administration in the past few years. More calls for transparency of the ‘explainability’, and probably the interpretability, of machine-learning models can be expected, as these models in other fields have shown serious shortcomings when researchers have attempted to generalize across populations.
As the field progresses, an increasing number of machine-learning models are being tested in interventional clinical trials, and new reporting guidelines have now been proposed for clinical-trial protocols and trial reports involving AI as an intervention. However, there is still a need for guidelines that better inform readers and users about the machine-learning models themselves, especially about how they were developed and tested in retrospective studies. In the past, ‘minimum information’ guidelines have substantially improved the downstream utility, transparency, and interpretability of data deposited in repositories and reported in publications across many other research domains, including data on randomized control trials, RNA (gene) expression, diagnostic accuracy, observational studies, and meta-analyses.
Here we propose the first steps toward a minimum set of documentation to bring similar levels of transparency and utility to the application of AI in medicine: minimum information about clinical artificial intelligence modeling (MI-CLAIM). With this work, we are targeting medical-algorithm designers, repository managers, manuscript writers and readers, journal editors, and model users.
A public Github repository (https://github.com/beaunorgeot/MI_CLAIM) has been set up to coincide with the release of this manuscript, which will allow the community to comment on existing sections and suggest additions.