Toward Natural Language Query Processing for Bioinformatics Polystores

BIDS Data Science Lecture


July 18, 2018
3:00pm to 4:00pm
190 Doe Library
Get Directions

When medical doctors, biologists or pharmaceutical researchers want to analyze gene data, they typically need to query complex bioinformatics data stores. These data stores are mostly either based on relational database technologies that are commonly used in industry or on so-called semantic web technologies that have recently gained attraction due to the wide use of linked open data by numerous practitioners in life sciences, medicine and health care around the globe. In order to efficiently query these heterogeneous data stores (so-called polystores), end users need to know the specific query languages SQL or SPARQL. However, both query languages require significant technical know-how, as well as deep insights into the structures of the underlying data stores. These technical hurdles make it practically impossible for non-technical end-users to query data efficiently.

In this talk, we present a novel solution and hence a new query interface called Bio-SODA which enables end-users to query polystores using keyword queries. In particular, we demonstrate how to intuitively query one of the world’s most commonly used bioinformatics knowledge bases called UniProt using keyword queries. Moreover, we discuss how to automatically translate keywords into the technical query languages SQL and SPARQL. Our proposed approach is not specific to bioinformatics databases, but is generic and can therefore be applied in different settings where research institutions, enterprises or governments need to get aggregate insights for queries that span multiple, heterogeneous data stores in an intuitive way.


Kurt Stockinger

Professor of Computer Science and Director of Studies in Data Science
Zurich University of Applied Sciences (ZHAW)

Prof. Dr. Kurt Stockinger is Professor of Computer Science and Director of Studies in Data Science at Zurich University of Applied Sciences (ZHAW). His research focuses on Data Science with emphasis on Big Data, data warehousing, business intelligences and advanced analytics. He is also on the Advisory Board of Callista Group AG. Previously Kurt Stockinger worked at Credit Suisse in Zurich, Switzerland, at Lawrence Berkeley National Laboratory in Berkeley, California, at California Institute of Technology in Pasadena, California, as well as at CERN in Geneva, Switzerland. He holds a Ph.D. in computer science from CERN / University of Vienna.