One issue which may haunt a researcher attempting to integrate and explore data across ontologies, is that it can be very difficult to tell where a class originates from. Naturally, many ontologies import classes from other ontologies, reducing the unnecessary reproduction of data and so on.
In some cases, the ontologies being referenced are actually included, and all is well. However, ontologies using the MIREOT system, reference classes without actually importing the external ontology. What’s worse, is that in practice, an important tenet of the MIREOT guidelines have not been adhered to. Namely, the inclusion of the IRI of the source ontology the class was referenced from.
This presents a problem for those wishing to seek the source definition of a class. One cannot even search other ontologies for the class, because it is likely that other ontologies will also include the class!
For example, we find that the ERO ontology references a class ‘oryctolagus cuniculus’ (sometimes known as a ‘rabbit’), with the IRI http://purl.obolibrary.org/obo/NCBITaxon_9986. Using the tainted MIREOT system, a class reference simply looks like a class definition, and since many ontologies make use of the rabbit, we will find equivalent class references in the following ontologies: PR, NCBITAXON, VO, MICRO, PXO, NIFSTD, ERO, NMOBR, NMOSP, CCONT. We know they are all referring to one rabbit from one ontology, but how are we to know which one?
To solve this problem, I have created a dataset of namespaces used by ontologies when defining their classes. This will allow researchers to find the source ontology of a class reference by matching the class IRI with a namespace found in this dataset.
So to solve the problem of our example above, we can simply use this dataset and simple string matching techniques to discover that the IRI used to define this rabbit belongs to the NCBITAXON, and thus so does the class!
This may seem like a simple matter, given in this particular example the IRI includes the name of the source ontology, but this is not always the case - it is clear from the dataset that ontology namespaces vary wildly in style and taste. Furthermore, class IRIs are not required to be and are very often not resolvable links to anywhere on the web (this is also the case for our example).
The dataset attempts to cover all consistent ontologies contained in AberOWL (around 447 (note that ontology acronyms correspond to AberOWL ontology IDs)), though it was not possible to determine namespaces for all of these - usually because the ontology didn’t actually define any classes of its own. Development of this data was partially automated, and partially curated manually.