Publication of linked data

[Source: D2.1 Best practice report on cultural heritage linked data and metadata standards (PDF, 992 kb)]

The publication of linked data is still at the experimental stage. Best practice can only be said to be emerging. Therefore the recommendations given in this section are based on:

  • Common practice in the general linked data community, as represented by The Cloud;
  • The practice of cultural heritage organisations that have published linked data;
  • The general practice of the cultural heritage sector.

Some of the recommendations offer a range of options, with no „right‟ choice. The choice an organisation makes is dependent on individual circumstances, and may be affected by legal and ethical considerations.

The recommendations can be separated into three „choice areas‟:

WHAT INFORMATION TO PUBLISH AS LINKED DATA

Looking at what kind of information is being published as linked data in The Cloud, and especially the
relatively small part which is about cultural heritage, two main types of information should be considered:

Collections information

This will be the bulk of the information that will be published by cultural heritage organisations. However they should also consider publishing information about:

  • Surrogates – the results of digitisation;
  • Supporting material – including exhibition catalogues, books, history files, and learning units;
  • User generated content – reactions to the collections (permissions having been gained to publish).

Terminological information

Looking at The Cloud a large component is from terminological resources being used by cultural heritage
organisations. These can be the result of international, national, thematic, organisational initiatives. The
effort to do this is strong in the library and archive domains. It includes the publication of name authorities.

Also this work gives the opportunity for cooperative, possibly international and multilingual, publication,
perhaps in the context of EC-funded projects. Topics for terminological publication include: object types;
event methods (e.g. creation method); places; organisations; events; materials; iconography; and many others.

The primary advice in choosing what kind of data to publish as linked data is:


  • Consider publishing information about all aspects of collections and their related
    materials;
  • Consider publishing terminological information, and seek partners to cooperate with in
    order to avoid duplication.

WHAT LICENCE SHOULD THERE BE FOR THE LINKED DATA

This section deals with the licensing arrangements that are associated with the publication of linked data.

Choices made in this are affected by general considerations of how much control the publisher of linked
data wants to have over its data, but are also affected by what kind of data is being published.
As was seen by the analysis of The Cloud a large part of published linked data does not seem have a
licence for its use. The result is that it is unclear what can be done with this data. In these litigious times
users are particularly careful not to do anything that will leave them exposed to a possible loss of
organisational reputation or even a lawsuit.

The primary advice about licensing is:

  • Any publication of linked data must be accompanied by a licence which makes it clear
    what uses can be made of the data.
  • The licence may be standard, e.g. provided by Creative Commons, or one created
    specifically by the publisher.

In general terms the two classes for the licence are:

  • Open licence – This allows any use of the data, especially including commercial use, sometimes with restrictions about attribution and misuse.
  • Not-open licence – This restricts uses to non-commercial only, with similar requirements for attribution and misuse.

With both classes there are a range of standard licences, e.g. those provided by Creative Commons and
GNU, and the option of a specific organisational licence.

For an organisation to decide which particular licence it should give with its publication of linked data it is
suggested that they follow these steps:

Step 1 – Decide what uses of the metadata you want to allow

An organisation may use the chart below to come to a decision about the licence it should use:

Linked data licence decision tree

Figure: Linked data licence decision tree

Note that it is helpful for a user of the linked data to know if non-standard licence is compatible with a
standard one.

Step 2 – Examine the rights environment of the data to be published

Step 1 assumes that an organisation is able to make a choice of licence without restriction. To test this
assumption the organisation should seek to find answers to these questions:

  • Is the data the organisation’s intellectual property?

Usually the data that an organisation uses for describing its collections is its intellectual property.
However some partners in the Linked Heritage project have said that this is not the case. This is
the situation for aggregators where the metadata comes from their providers. The aggregator
should already have a licence for its use of its providers‟ data. This can restrict the licence the
aggregator can give for its linked data publication.

It is also sometimes the same situation where volunteers (i.e. not members of staff) have been
involved in creating the data. Best practice here would be for volunteers to have assigned their
rights, or given an open licence to the organisation.

Another situation that arises is where data from two sources is mixed. An organization may begin
with externally-supplied data and enrich it. Rights over the enriched data are complex

If either situation is present then the advice is:

The organisation must either accept the restrictions imposed by the original creation of
the data or seek to renegotiate the licence it has.

  • Are there any legal or other restrictions on the type of licence that can be offered?

It may be that an organisation is operating in a rights environment which forbids the use a type of
licence. This seems to be particularly the situation where a standard licence, e.g. Creative
Commons is being considered.

If this situation is present then the advice is:

Consider using a non-standard licence that meets local needs.

Also commercial may be specifically excluded for some types of data by law.

If this situation is present then the advice is:

The organisation cannot use a licence which allows commercial use.
  • Is the organisation able to make a decision about licensing even when it has the rights in the data?

The survey of partners also brought to light the situation where a cultural heritage organisation
does not have the authority to decide on licensing independently of its superior body. This is
particularly the case where the cultural heritage body is owned by a national or regional
government.

The superior body may mandate a more or less restrictive licence than wished for by the cultural
heritage organisation. It is possible that the cultural heritage organisation‟s data is viewed as an
exception to general rules, and it might be possible to negotiate an exception.
If this situation is present then the advice is:

The organisation must use the licence that its superior body supports.

HOW TO PUBLISH THE LINKED DATA

In this area a potential publisher of linked data has three choices to make:

Which format standards to use

It is inconceivable that they will not use the basic standards like: RDF, RDFS, and OWL. However for the
„descriptive‟ formats it is advised to:

  • Not to create a proprietary format which is only intended to be used for your package;
  • Use standard format(s) appropriate for the type of data being published. Looking at what is being used a few formats seem to be good suggestions:

Web resources: Dublin Core;
Persons: Friend of a Friend;
Terminological resources: Simple Knowledge Organization System;
Bibliographic resources: Bibliographic Ontology;
Music: Music Ontology.

These recommendations are based on the current, in-use, formats. However there is a „gap in the market‟ for a format for cultural heritage linked data.

Consider using a cultural heritage specific format for linked data. Possible candidate formats,
ones based on: EDM, CIDOC CRM, and LIDO.

RDF serialisations to publish

On the basis of the common practice it is advised that to:

Publish the linked data in the RDF/XML and N-Triples serialisations.

How to link the package into The Cloud

One issue that was brought out by discussions of the WP 2 Working Group was: Which are the „trusted‟
packages in The Cloud? A measure of trust is if one knows the publisher of a package. This type of
linking seems to be very common in all parts of The Cloud and leads to the formation of mini-clouds of
interlinked packages. There seems to be a cultural heritage mini-cloud forming. A possible reason for this formation is the Europeana initiative.

Other very important issues are:

  • The identification of resources. Are the identifiers you use compatible with the identifiers used in a potential package to link to;
  • How compatible are the semantics of the packages. For example, if one wishes to identify
    „personas‟ (public identities), is that the same as FOAF, which says it identifies people.
  • A package has to be accessible to queries of it.

Therefore we advise:

  • Link to packages, of a general nature, which are often linked to: DBpedia; GeoNames Semantic Web; national sources of terminology (e.g. UK Postcodes);
  • Link to known packages in the cultural heritage, e.g.: Library of Congress Subject Headings; VIAF: The Virtual International Authority File; and Dewey Decimal Classification);
  • Provide a SPARQL endpoint to the package.

Obviously the final task is to make an entry for the package into The Data Hub registry!

CONTRIBUTING TO EUROPEANA

In the context of the Linked Heritage project (or any Europeana Group project) the requirements of
Europeana are important as they will be publishing the metadata that it has aggregated as linked data.

From December 2011 contribution of metadata will be governed by the Europeana Data Exchange
Agreement45. Metadata aggregated before this date will have to conform to this by the end of June 2012 at the latest.

It is worth stating those requirements as they impact on providers:

Licensing requirements

Europeana wishes to publish providers‟ metadata as linked data using the CC0 licence. As mentioned
above this means that any use of the metadata, including commercial use, is possible. Also the use does not require attribution.

If a provider intends to publish their metadata, e.g. as linked data, with the CC0 licence then there are no difficulties and they can easily sign the agreement without difficulty.
However this situation seems, from the Linked Heritage partners‟ survey, to be not common. The result is that providers to Europeana have two options:

  1. Remove all their metadata from Europeana.
  2. Only give Europeana the metadata that agree to publish under the CC0 licence.

Option 1 is a difficult option to take, both reputational and sometimes contractually. Partners may have
contractual obligations with the Commission which means they must give metadata to Europeana.

Option 2 is more attractive, but providers should be aware that the Agreement requires that metadata
supplied to Europeana must conform to their published metadata specifications.