Thing 12: Vocabularies for data description

Overview

Teaching: min
Exercises: min
Questions
  • What are controlled vocabularies and how are they applied to research data?

Objectives
  • Learn about controlled vocabularies for data description

Data descriptor, keyword, subject … these are all terms commonly used when discussing metadata. Learn about the use of controlled vocabularies to enhance data discovery.

Getting started: Controlled vocabularies for data description

In addition to selecting a metadata standard or schema, whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data - location, time, place name, subject.

Controlled vocabularies significantly improve data discovery. It makes data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data, such as plants, animals, medical conditions, places etc.

Browse Controlled Vocabularies

  1. Start by browsing Controlling your Language: a Directory of Metadata Vocabularies from JISC in the UK. Make sure you scroll down to 5. Conclusion - it’s worth a read.
  2. We are going to see some controlled vocabularies in action in the Atlas of Living Australia.
    • We are going to use Browse Datasets: type magpie in the search box. Choose your favorite magpie species and click on the (red text) View record link.
    • Any metadata field where you see Supplied… tells you that the information supplied by the person who submitted the record (often a ‘citizen scientist’) has been changed to the controlled vocabulary being used in metadata fields, such as Observer, Record date and Common name.

How do you think we could encourage people to use controlled vocabularies in their data descriptions?

Learn more: What controlled vocabularies exist?

Think about (or find out!) what standard vocabularies are used, or could be used, by research groups in a discipline which interests you. Note that there may be more than one vocabulary used in a discipline.

#Explore controlled vocabularies

  1. Choose a vocabulary and determine if it would be of use to people working in a specific field. Try JISC’s Directory of Metadata Vocabularies or University of North Carolina’s Metadata Tutorial to browse for discipline and general vocabularies.
  2. Research Vocabularies Australia (RVA) is a service that helps you find, access, and reuse vocabularies for research. Go to the RVA Portal, use “Browse all Vocabularies” and see if your chosen vocabulary is included.

Consider: why your vocabulary should (or shouldn’t…) be included in RVA.

Supporting multiple vocabularies

The Science Keyword Aggregator has been developed by CSIRO and released as open source to allow others to adapt and reuse it. It is a service that allows users to search for defined keywords across a range of managed vocabularies.

A widget that uses the service is also available for system owners to embed in their application and thus provide a term search there. Start by viewing the metadata record describing the KWA noting the rich description eg links to the source code and related materials. Now go to KWA web page to read more about the KWA and try out the widget Then take a look at the service documentation for the KWA web service.

Consider: Is this a tool that could be implemented in your organization? How would you use it?

Key Points

  • First key point.