HeTOP Developer's guide

If you want to understand how HeTOP works and what's in its guts.

Let's rock

If you want to use HeTOP for the best, you have to learn how HeTOP handles its data. The main difficulty is to understand the meta-model. Then you will have to manipulate "CISMeF ids". Finally, there's a lot of stuff to know but it's very easy, just follow me (if you dare).

The multi-terminology meta-model

Let's see how this thing works, it's not very complicated but it's important to understand it to go further.

Because ontologies give headaches...

As real (formal) ontologies are very complex to handle natively in a system and as HeTOP purpose is not to use ontology functions and rules, the HeTOP data model only includes resources on a terminological level. Well, it means that real ontologies (if any actually exist) are degraded to a terminology structure to be usable. HeTOP cares about the lexicons and the semantic relations. That's all. If you care about ontologies principles and their original purposes, don't go further (and don't use HeTOP).
For the rest of the lesson, we will use the acronym T/O as terminology/ontology denomination, this will be easier.

Terminology models

Each T/O is based on a specific model made of concept types, relationships and data types (attributes). We can group theses things into the term of metadata.
As HeTOP is designed to integrate all T/O into a unique system, we made a choice: use a single model that will gather all T/O models. We largely inspired by the UMLS meta-thesaurus which is a great tool to deal with multi-terminologies. But we wanted to go further and we couldn't afford the creation/maintenance of new CUIs (if you don't know what I'm talking about, there's no big deal).

One (meta-)model to rule them all

The meta-model is pretty simple: T/O are composed by Concept Types which contain Concepts that are defined by attributes (data properties) and relations to other concepts (hierarchies and other relations). Just take a look at the figures below; Figure 1 is a meta-model representation and Figure 2 shows a example of instantiation for an ICD-10 code.

Too bad, this figure is broken...
Fig1. HeTOP multi-terminology meta-model
Too bad, this figure is broken...
Fig2. HeTOP multi-terminology meta-model instantiation example

CISMeF ids

CISMeF ids are internal unique ids to designate (T/O) concepts or metadata types.

Concept ids

As HeTOP is based on a multi-terminoloy model, all CISMeF concept ids are prefixed by a short string that refers to the T/O of each concept. In addition, as each T/O is based on a dedicated model, CISMeF concept ids also includes a short string that refers to the concept type of each concept. For example, the ICD-10 chapter (which means that the T/O is "ICD-10" and the concept type is "ICD-10 chapter") C00-D48 (neoplasms) is stored in the HeTOP database as the unique concept id ICD_CH_C00-D48 (ICD stands for ICD-10 and CH for ICD-10 chapter). The general syntax is: <T/O acronym>_<concept type acronym>_<original concept id> (there are 2 exceptions to this rule: a) the MeSH ids, just because of historical reasons, have their own rule: MSH_D_001249 stands for original MeSH id D001249; b) slashes "/" in original ids are replaced by dashes "-").

Metadata ids (TYPE_IDs)

Each HeTOP metadata is represented by a TYPE_ID. This is a short name for any concept type, relation type, hierarchy, datatype property or T/O. As the HeTOP data model has been build on a generic level, the whole complexity of the database schema is deported to the TYPE_IDs. It is almost impossible to know them all but some tricks allow to understand quickly the use of each TYPE_ID. And don't worry, convenient labels are always available for each TYPE_ID. The following table gives an example of TYPE_ID for each category of HeTOP metadata:

Metadata category TYPE_ID Description Rule
T/O TER_ICD The ICD-10 terminology TER_ prefix for all T/O
Concept type T_DESC_ICD10_CHAPTER The ICD-10 chapter concept type T_DESC_ prefix for all concept types except: GENE and PROTEIN
Relation type T_REL_PTS_TO_PTS CISMeF automatic mappings T_REL_ prefix for all relation types
Hierarchy type BTNT_ICD ICD-10 hierarchy BTNT_ prefix for all hierarchy types
Datatype property RDFS_LABEL Preferred label Well, there's no real rule for datatype properties except these: RDFS_LABEL is the common TYPE_ID for preferred labels, T_UF prefix is used for each used for terms which means synonyms or every entry terms, otherwise the T_ATT_ prefix is commonly used for all other datatype properties

Want more?

If you want to go deeper, have a look at our publications page.