Main Content
What are metadata, metadata schemas, controlled vocabularies and documentation?
Metadata are data on other data or resources, in this case research data. They describe the research data in order to optimise their retrievability, to ensure the understanding of the data for subsequent users and also enable the linking of similar research data when using the same, standardised metadata schemes. The most basic metadata information includes, for example, title, author/primary researcher, institution, persistent identifier, location & time period, subject, rights, file names, formats, etc.
Metadata schemas (or metadata standards) are compilations of categories for describing data. A distinction is made between interdisciplinary or independent standards and discipline-specific or dependent standards. Metadata schemas are intended to ensure that all researchers use the same descriptive vocabulary in order to guarantee interoperability and thus comparability of data sets.
The following table lists some examples of metadata standards from different disciplines. If your discipline is not listed, the listing of the Digital Curation Centres (DCC) can usually provide information on which standards are applicable to your field of science.
Scientific or technical discipline | Name of the standard(s) |
---|---|
interdisciplinary standards | DataCite Schema, Dublin Core, MARC21, RADAR |
Humanities | EAD, TEI P5, TEI Lex-0 |
Earth Sciences | AgMES, CSDGM, ISO 19115 |
Climate science | CF Conventions |
Arts & Cultural Studies | CDWA, MIDAS-Heritage |
Natural sciences | CIF, CSMD, Darwin Core, EML, ICAT Schema |
X-ray, neutron and muon research | NeXus |
Social and economic sciences | DDI |
Before starting to document your data, you should search for existing metadata schemas. This approach ensures better interoperability of the research data you are creating with data already created in the same discipline and saves you the work of developing your own metadata schema. If an existing metadata standard does not provide the descriptive categories that are necessary for your research, it is still worthwhile to use a reputable, already existing subject-specific standard as a basis and build on it, for example by incorporating additional categories and communicating this to those responsible for the standard so that they can extend the schema. This is because metadata standards are living entities that can be adapted or enriched with new categories depending on the needs of the researchers. Be careful not to make any changes to existing elements or attributes so as not to jeopardise interoperability.
It is possible and usually necessary to use several metadata schemas. At the very least, you should always use a subject-independent metadata schema (preferably Dublin Core) to describe your data, as this covers the general categories of description mentioned in the first paragraph of this section about the research data created. Fig. 4 shows a section for metadata in the Dublin Core standard for illustration purposes. Subject-specific metadata standards, on the other hand, allow you to structure your data with descriptive categories that are on a more content-related level and may differ from discipline to discipline.
Fig. 4: Example metadata in the Dublin Core Standard
Type of controlled vocabulary | Name of the controlled vocabulary |
---|---|
Clear identification of persons, Objects or places |
Gemeinsame Normdatei (GND) GeoNames / International Standard Name Identifier (ISNI, ISO 27729) / |
General, interdisciplinary Classification systems |
Dewey-Dezimalklassifikation (DDC) / |
Subject-specific classification systems | |
Subject-specific vocabularies |
Agricultural Information Management Standard (AGROVOC) Standard-Thesaurus-Wirtschaft (STW) Thesaurus Sozialwissenschaften (TheSoz) |
Metadata schemas thus determine what information is to be provided. For optimal search and use of the data, however, it is also important that this information is rendered using a vocabulary that is as uniform as possible. For this purpose, a number of discipline-specific and cross-disciplinary controlled vocabularies are available in the form of thesauri, classifications and standard data.
An overview of different systems is provided, for example, by the Basel Register of Thesauri, Ontologies & Classifications (BARTOC).
Documentation usually goes beyond the description of the data via metadata. It represents a deeper (scientific) indexing, in the framework of which, for example, the context of origin, variables, instruments, methods etc. are described in detail and thus the provenance of the data becomes apparent. In many cases, such a description is indispensable in order to understand, verify and, if necessary, use the data.
Introductions to the topic of metadata are offered, for example, by the JISC Guide or the interactive Mantra course of the University of Edinburgh.