Thing 1: Ready Set Data
Overview
Teaching: 10 min
Exercises: 20 minQuestions
What is research data?
Where can you find research data in your discipline?
Objectives
Learn what research data is.
Explore and examine discipline based research data.
Explore the technical aspects of data management and assess your own skills.
Data is the central currency of science, but the nature of scientific data has changed dramatically with the rapid pace of technology.
– Ted Hart (@emhart), et. al., Ten Simple Rules for Digital Data Storage. PLoS Comput Biol 12(10): e1005097. doi:10.1371/journal.pcbi.1005097
Access to the computational steps taken to process data and generate findings is as important as access to data themselves.
– Victoria Stodden, et. al., Enhancing reproducibility for computational methods. Science09 Dec 2016 : 1240-1241. 10.1126/science.aah6168
Getting started: What is research data?
The concept of research data is complex and fluid. Virtually all types of digital information have the potential to be research data if they are being used as a primary resource for research.
– Research Data Strategy Working Group, Mapping the Data Landscape: Report of the 2011 Canadian Research Data Summit
What “research data” are we talking about?
- Read Research Data - Definitions from University of Leicester (5 min)
-
Watch this video on research data produced by University of California Librarians (5min):
Research data can be
-
Observational: Captured in real-time, typically outside the lab
- Examples: Sensor readings, survey results, images, audio, video
-
Experimental: Typically generated in the lab or under controlled conditions
- Examples: test results, clinical trials, field experiments in international development
-
Simulation: Machine generated from test models
- Examples: climate models, economic models
-
Derived /Compiled: Generated from existing datasets
- Examples: text as data (data mining), compiled database, 3D models
Research data come in many formats
- Text: field or laboratory notes, survey responses
- Numeric: tables, counts, measurements
- Audiovisual: images, sound recordings, video
- Models, computer code, geospatial data
- Discipline-specific: FITS in astronomy, CIF in chemistry, FASTA files in bioinformatics
- Instrument-specific: equipment outputs
Let’s take a look research data
Note: We’ll be using the UCSD Library Digital Collections for this exercise. Feel free to swap out with other institutional or discipline based repositories and collections.
- Open up one of these collections:
- Data from: A Living Vector Field Reveals Constraints on Galactose Network Induction in Yeast - The research data collected are microscopy images of field cells, a database file, and analysis code used to investigate how cell-level gene expression dynamics produce population-level phenotypes.
- Santa Fe Light Cone Simulation research project files- The project consists of data in three broad categories: the simulation data (“Data at Redshift” components); analysis tools and example scripts (Data Processing Tools) for processing the data; and project administration and background documents (Historical Documents) related to the project.
- Write down the specific data formats, software used, and wether the collection has a code book or data dictionary. If you aren’t familiar with the data formats or software codes, try a web search and see if you can figure out what it is. Capture your findings.
- Explore the metadata representing the collection. Besides the title and description, what other elements are described?
- Share with the class what you’ve found.
Heavy Metals
One interesting and unique data collection in the UCSD Digital Collections is the Heavy Metals in the Ocean Insect, Halobates collection. This is an unique example of ~40 year old computer printed data that was digitized for sharing and preservation.
Complexity and formats affect on re-use
How does complexity and range of data formats affect access and re-use possibilities?
Learn More: Data across research disciplines
- Choose one of the 4 specialized data repositories below, or find another data repository of interest - particularly one in a discipline you are unfamiliar with and spend some time browsing around your chosen repository to get a feel for the data available.
- Think about how the data here differs from data you are familiar with. Consider for example, format, size and access method.
- Record your reflections in the class etherpad or through discussion.
Discipline based data conventions
How could cross disciplinary research be affected by discipline data conventions? Also, think about one way cross disciplinary data access can be facilitated.
Challenge me: Let’s talk tech!
Get the ball rolling to expand awareness of the technical aspects of data management and this rapidly growing community of tech-aware data enthusiasts. What is….
- your favorite research data tech or software story or experience
- a software tool or service for research data you think others might be interested in
- a question or research data problem to crowdsource a solution.
Personal technical data audit
Conduct a personal audit on what data technical skills you have, and what skills you want to learn.
Key Points
Research data is heterogeneous in form.
Research data can be categorized into many types, such as observational, experimental, simulation, derived or compiled, and reference.
Research data are often contextualized within communities.
Technical data skills are increasingly needed to create, work with, understand and make sense of research data.