Beyond clinical data mining: data integration and electronic phenotyping for research cohort identification

by John Holmes (University of Pennsylvania)

16:00 (60 min) in Daysh G.07

The availability of ever-increasing amounts of highly heterogeneous clinical data poses both opportunities and challenges for the data scientist and clinical researcher. Electronic medical records are more prevalent than ever, and now we see that other data sources contribute greatly to the clinical research enterprise. These sources provide genetic, image, and environmental data, just to name three. The opportunity for enhanced clinical research is manifest in this expanding data and information ecosystem. Taken together, these data are syntactically and semantically heterogeneous on a scale that is still not fully appreciated, nor is their integration fully realized. This is because few tools currently exist to harmonize these data in a way that captures and preserves the concepts they contain and makes them usable by clinicians and researchers. For the most part, these tools focus on manually-created ontologies that require substantial skill and effort to create, and even then, they tend to be specialized to one domain or another.

This presentation will present an approach to ontology creation for data harmonization and integration that is grounded in artificial intelligence methods of concept formation and discovery. We will explore in detail the opportunities and challenges posed to informatics and clinical researchers as they are faced with these seemingly endless sources of data. We will also discuss novel approaches to mining these complex, heterogeneous data for the purpose of constructing cohorts for research.