We use persistent homology as a way to generate features for a database of drug-like molecules. Our goal is to separate out likely drugs, and the particular problem we tackle is complicated by multi-species effects. We are able to meet state-of-the-art computational chemistry accuracy for this problem while giving a new geometric view on the space of compounds. The techniques have broader applicability to machine learning on spaces of shapes, and no knowledge of persistent homology, topological data analysis, or drug discovery is assumed for this presentation.
Anthony Bak is in the machine learning group at Palantir, where he solves client problems and develops machine learning products. Prior to Palantir, he was data science R&D lead at Ayasdi—a machine learning platform company using topological and geometric methods in conjunction with standard methods. He has a PhD in mathematics from the University of Pennsylvania and has held academic positions at Stanford University, The American Institute of Mathematics, and the Max Planck Institute for Mathematics. He is a frequent speaker at academic and industry conferences.