Engineering ontologies for big data

by Phil Lord

16:00 (40 min) in USB 5.008

Ontologies are widely used in biomedicine, but they come with a significant problem. Using them tends to be slow, particularly when they are highly expressive. This also makes them harder to develop, since this must be done in a disconnected manner -- the author often has to wait for the software to catch up to see whether their changes make sense. Or they have to introduce performance hacks -- for instance, biological ontologies rarely include the taxonomy of the species because it is too big. Part of the reason for this, is that most of the software is written in Java...

In this seminar, I will describe some early attempts to build a new library for representing ontologies, written in Rust. This is a systems programming language which is written to be fast. And my library has also been written to be fast: several layers of abstraction are missing compared to the Java equivalent. This speed also means it can be simpler -- in memory caching is minimal, for instance. Although in early development, it appears to be between one and two orders of magnitude faster than the Java equivalent.