BIDS Senior Fellow Bin Yu — along with her current and former group members W. James Murdoch (Clientelligent), Chandan Singh, Karl Kumbier (UCSF), and Reza Abbasi-Asl (UCSF) — have recently published a paper on Definitions, methods, and applications in interpretable machine learning in PNAS.
Machine-learning models accurately predict a wide variety of complex phenomena. However, these models are "black-boxes'' whose reliability is difficult for humans to judge, particularly given new data (or in a transfer learning setting). As such, model interpretations are crucial in high-stakes settings, such as medicine, policymaking, and science. Moreover, interpretations can help humans audit models for the purposes of ensuring their safety and fairness. In light of these issues, the field of interpretable machine learning has emerged to make machine learning understandable for humans.
To address the confusion caused by this recent surge in interpretability research — in particular, what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models — Yu and her team construct a unifying framework that highlights the underappreciated role played by human audiences. The article provides a review of existing work, as well as suggestions for directions for future work, by organizing existing methods into two classes (model-based and post hoc) and providing guidance in selecting and evaluating interpretation methods using three requirements: predictive accuracy, descriptive accuracy, and relevancy.