Terminology

[Source:  D3.1 Best practice report - Terminology (PDF, 3451 kb)]

The main outcome of the Athena WP4 activity consisted in elaborating recommendations addressed to the European Museums. We give here a short reminder of these recommendations that we have updated thanks to the larger scope of the Linked Heritage project. These reviewed recommendations are of high importance for the finalisation of the technical specifications of the WP3 Terminology
Management Platform.

The conclusion made consequently to the analysis of the survey results make it clear that monolingual in-house terminology resources are a reality that we can’t ignore at European level.

Thus these recommendations take into account this reality and give guidelines to institutions so they can keep their in-house terminologies adapted to their needs and means and however make it compliant with the requirements of the Semantic Web.

These recommendations have been published as a booklet “Your terminology as a part of the Semantic Web: recommendations for design and management” within the Linked Heritage project.

You can find a detailed presentation of these recommendations with examples and tools in this booklet which is both available in printer and digital form.

We structured the recommendations according to the main stages corresponding to the “lifetime” of a
terminology resource. The following schema presents you these main stages:

Terminology - Steps

CONCEIVE YOUR TERMINOLOGY

The first one “Conceive your terminology” gives the main considerations and requirements to keep in
mind to create in the best way a terminology resource, ie as recommended a thesaurus. Here are the
tasks inherent to this first step:

INSERT IMAGE

INSERT IMAGE

A1 : Define your domains

This step is important to define the overall strategy of the terminology. If the domain of a terminology is too large it won’t be efficient for the descriptions of collections. On the contrary if a terminology is too specialised and focused on one domain, another terminology might be useful because this one is too limited because too specific. The definition of the domains covered by your cataloguing and indexing process is then important to create the general structure and hierarchy of the terminology.

A2 : Identify your user’s expectations

The target of the terminology is important as well. Indeed a terminology aiming at professionals only will be much more accurate than a one aiming at the general public. It is therefore important to define at the conception step if the terminology will be used only for cataloguing and indexing used by professionals or if the general public will also use the terminology to access the collections of the institutions. This could be also important regarding the choice of the license for the terminology.

A3 : Define your connection with the datamodel

Institutions use terminology for describing a collection or an object. This description is generally ruled by a datamodel. Some fields of this datamodel require terms from a controlled vocabulary. At the conception step it is important to define which fields of the datamodel will use the terminology in order to settle the domains and terms of the terminology.

A4 : Choose the terms for the semantic description of your digital resources

This task is consequent to the previous ones. Indeed the choice of the terms depends on the
domain(s) covered by the terminology, the users that will be using the terminology and the fields from
the datamodel that require a controlled vocabulary. This task is crucial both for indexing process and
retrieval of information and but not definitive as a terminology almost like languages needs evolution
through the time.

A5: Organise your terms into a thesaurus structure

As the thesaurus is the kind of terminology that we recommend, a logical recommendation is to organise terms and domains within a thesaurus structure. The more a term is connected to another one the more your terminology will be exploitable by human users and machines as well. Thesaurus offer both hierarchical and associative relationships. Exploiting in the best way these features can improve the efficiency of the terminology.

A6: Find equivalent terms in other languages

Very few terminologies described in the survey results are multilingual. Some countries dealing with several official languages have to provide multilingual content, then multilingual terminologies as well.

One best practice would be to enrich a terminology with equivalent terms in other languages even if it is not something mandatory according to the policy of the country. Reference terminologies and other terminologies corresponding to the domains and available in the terminology registry could be used to
proceed with this multilingual enrichment.

A7: Implement your thesaurus

The final task for the conception stage is the technical implementation of the thesaurus. Indeed the technical format (Spreadheet, XML, database, …) has to be defined here in order to make the thesaurus technically available. Several norms exist in order to cover the whole process of conception of a terminology but the latest one ISO 25964-1 that we already mentioned is the most adapted as it takes into account the technological reality of the institutions. After this serialization process, the terminology can be integrated into the collections/objects' management system.

B : MAKE IT INTEROPERABLE

The second stage consists in making a terminology interoperable. This consists mainly in SKOSifying,
e.g. converting into SKOS the thesaurus that was technically implemented in the previous stage.

B1: Evaluate how far SKOS is compliant with your terminology features

The first task is to define is SKOS is the most convenient format for the kind of terminology you may have. Indeed an authority file with author names may need a more appropriate format such as FOAF. So there must be an evaluation of the benefits using SKOS without losing any information or implying wrong information or inferences because of the SKOS datamodel.

B2: Roughly SKOSify your terminology

Here is the SKOSification task. We suggest to roughly SKOSify as some tools exist and help to proceed automatically with the SKOSification of a thesaurus. By rough SKOSification we mean an automatic process for converting a terminology into SKOS. A detailed SKOSification would be the one validated by the human expert. The Terminology Management Platform (TMP) of Linked Heritage will have a dedicated module for the SKOSification so this step could be done with the least cost and mean possible.

B3: Define with precision the labels expressing concepts

This task is directly correlated with the task A4 : Choose your terms. Indeed the terms of the thesaurus will be the labels expressing the concepts. So this task must be done with attention since the SKOS datamodel has some requirements regarding the labels and their languages. You can refer to the second deliverable of the Athena project, ‘D4.2 Guidelines for mapping into SKOS, dealing with translations’ to get more detailed information on SKOS and precise guidelines helping for SKOSification.

B4: Identify your concepts and validate the structure

This task results from the transition from a descriptor/term based resource to a concept-based kind of resource. Indeed with the thesaurus terms were descriptors, keywords used for description but according to the SKOS model, these terms and descriptors become labels expressing concepts. This little difference of perception may imply some modification in your modeling. This is why the concepts of a terminology have to be identified in order to consolidate the organisation of the concepts of the terminology.

The question of the persistent identifiers in order to give a unique identifier to each concept of a terminology has been raised several times in the framework of the Thematic Working Group. This unique identifier is required by the principles of the Semantic Web and Linked Data. Therefore we strongly recommend to use a persistent identifier system for the identification of concepts within a terminology.

You can refer to the booklet that was published in the framework of the Athena WP3 (Workpackage dedicated to the standards) on ‘Persistent identifiers: recommendations’.

B5: Ensure the documentation of concepts

As we already mentioned it, a terminology will evolve through time as the language evolves as well. This is why it is important to keep track of the details and information that might be useful for an obsolete label or to remove the ambiguity between two identical labels expressing two different concepts. SKOS offers a large choice of notes in order to ensure the documentation of the concepts.

Elements inherent to the language issue (orthography, grammar, …) can be recorded here.

B6: Map your concepts

This task is correlated with the A5 task (A5: Organise your terms into a thesaurus structure). Indeed for that task, the general structure and organization of terms within the thesaurus have been defined. Then the mapping of concepts is a refinement stage of this structure thanks to the features of SKOS.

This mapping can be done through the possible hierarchical (skos:broader, skos:narrower) or
associative (skos:related) relationships.

B7: Map your (multilingual) terms

As the mapping of the concepts has been done in the framework of the previous task, the mapping of terms can be done. It mainly consists in arranging the labels. This task is particularly important for the multilingualism as the mapping of terms can help enriching the terminology with multilingual labels.

This task is correlated with the A6: find equivalent terms in other languages. It is about transposing these equivalencies in the SKOS structure of the terminology respecting its datamodel and keeping all the relevant information of your thesaurus.

B8: Validate your SKOSification
The benchmark done in the framework of the Athena WP4 showed that several tools exist for the validation of the final SKOS output of the terminology. The simplest one is Pool Party30 which can proceed with a syntax validation online from an RDF file uploaded from a local repository. The upcoming SKOSification module of the TMP will perform this validation of the SKOS consistency on the go with the SKOSification process.

C : LINK IT TO A NETWORK

This last stage is the one which allows an institution to publish a terminology and make it available to the Web. As the previous stage ensured the interoperability and the SKOSification of the terminology, this one is fully compliant with the principles of the Semantic Web and the Linked data. This final stage gives the final recommendations to make the terminology part of the Semantic Web by linking it to existing networks of terminologies.

C1: Definition of metadata on your terminology

This task intends to give the basic information about the terminology so it can be searched and retrieved easily within a terminology registry. Indeed the first step to link a terminology to a network of terminologies is to provide a description of it especially the date of creation, the authors, the domains covered by the terminology. Usually the fields of the Dublin Core are relevant and complete enough to provide quality metadata of the terminology. The terminology registry of the TMP will also provide a metadata form so institutions when uploading their terminology can feed the terminology registry with the terminology and its metadata.

C2: Identification of resources for mapping

This task consists in identifying all the terminology resources that could be mapped with the terminology just created. It supposes to browse terminology registries and find resources that cover the same domains for enriching your own terminology with missing concepts or ensuring multilingualism with equivalent terms in other languages. Another use case can also be the integration of a related domain in your terminology if it is in the same language than your terminology. This task is connected to the A1: Define your domains and A2: Define your users’ expectations since other terminology resources can help achieving these tasks.

C3: Mapping with other resources

This task has a direct reference to the B3: Define with precision the labels expressing your concepts and B6: map your concepts. Indeed this task is about finding manually or automatically all the concepts that could be relevant to be integrated or just mapped with because these are concepts from the same domain, or concepts from a domain that is not the same but related or because the concepts are expressed in several languages and the terminology can then be enriched and become multilingual. In this perspective, you can notice that the use of a unique and persistent identifier is crucial for the mapping of two different terminology resources.

C4: Validation of the interoperability

This validation step as the B8: Validate your SKOSification is the final task to get a terminology interoperable and part of a network of terminologies. The only way to check and validate the interoperability is its integration within a search engine and making queries and then test all the semantic inferences that could done through the semantic mapping done thanks to the SKOSification
and the mapping. The Terminology Management Platform intends to provide all the necessary features for these stages of the terminology especially for ensuring the interoperability and providing the needed mapping features. As a search and visualization interface will be developed the SKOSification and
interoperability would be easily validated within a same user interface.