If you want to understand how HeTOP works and what's in its guts.
If you want to use HeTOP for the best, you have to learn how HeTOP handles its data. The main difficulty is to understand the meta-model. Then you will have to manipulate "CISMeF ids". Finally, there's a lot of stuff to know but it's very easy, just follow me (if you dare).
Let's see how this thing works, it's not very complicated but it's important to understand it to go further.
As real (formal) ontologies are very complex to handle natively in a system and as HeTOP purpose is not to use ontology functions and rules, the HeTOP data model only includes resources on a terminological level.
Well, it means that real ontologies (if any actually exist) are degraded to a terminology structure to be usable. HeTOP cares about the lexicons and the semantic relations. That's all. If you care about ontologies principles and their original purposes, don't go further (and don't use HeTOP).
For the rest of the lesson, we will use the acronym T/O as terminology/ontology denomination, this will be easier.
Each T/O is based on a specific model made of concept types, relationships and data types (attributes). We can group theses things into the term of metadata.
As HeTOP is designed to integrate all T/O into a unique system, we made a choice: use a single model that will gather all T/O models.
We largely inspired by the UMLS meta-thesaurus which is a great tool to deal with multi-terminologies. But we wanted to go further and we couldn't afford the creation/maintenance of new CUIs (if you don't know what I'm talking about, there's no big deal).
The meta-model is pretty simple: T/O are composed by Concept Types which contain Concepts that are defined by attributes (data properties) and relations to other concepts (hierarchies and other relations). Just take a look at the figures below; Figure 1 is a meta-model representation and Figure 2 shows a example of instantiation for an ICD-10 code.
CISMeF ids are internal unique ids to designate (T/O) concepts or metadata types.
As HeTOP is based on a multi-terminoloy model, all CISMeF concept ids are prefixed by a short string that refers to the T/O of each concept. In addition, as each T/O is based on a dedicated model, CISMeF concept ids also includes a short string that refers to the concept type of each concept.
For example, the ICD-10 chapter (which means that the T/O is "ICD-10" and the concept type is "ICD-10 chapter") C00-D48 (neoplasms) is stored in the HeTOP database as the unique concept id ICD_CH_C00-D48
(ICD stands for ICD-10 and CH for ICD-10 chapter).
The general syntax is: <T/O acronym>_<concept type acronym>_<original concept id>
(there are 2 exceptions to this rule: a) the MeSH ids, just because of historical reasons, have their own rule: MSH_D_001249
stands for original MeSH id D001249
; b) slashes "/" in original ids are replaced by dashes "-").
Each HeTOP metadata is represented by a TYPE_ID. This is a short name for any concept type, relation type, hierarchy, datatype property or T/O. As the HeTOP data model has been build on a generic level, the whole complexity of the database schema is deported to the TYPE_IDs. It is almost impossible to know them all but some tricks allow to understand quickly the use of each TYPE_ID. And don't worry, convenient labels are always available for each TYPE_ID. The following table gives an example of TYPE_ID for each category of HeTOP metadata:
Metadata category | TYPE_ID | Description | Rule |
---|---|---|---|
T/O | TER_ICD |
The ICD-10 terminology | TER_ prefix for all T/O |
Concept type | T_DESC_ICD10_CHAPTER |
The ICD-10 chapter concept type | T_DESC_ prefix for all concept types except: GENE and PROTEIN |
Relation type | T_REL_PTS_TO_PTS |
CISMeF automatic mappings | T_REL_ prefix for all relation types |
Hierarchy type | BTNT_ICD |
ICD-10 hierarchy | BTNT_ prefix for all hierarchy types |
Datatype property | RDFS_LABEL |
Preferred label | Well, there's no real rule for datatype properties except these: RDFS_LABEL is the common TYPE_ID for preferred labels, T_UF prefix is used for each used for terms which means synonyms or every entry terms, otherwise the T_ATT_ prefix is commonly used for all other datatype properties |
If you want to go deeper, have a look at our publications page.