Data Intro for Librarians

Introduction to Data - Handout and Quiz

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • what does Fr[ea]nc[eh] match?

Objectives
  • understand terms, phrases, and concepts in software development and data science

  • identify and use best practice in data structures

  • use regular expressions in searches

Library Carpentry Week One: Introduction to Data

Schedule

Regular Expressions

Check your regex with:

Test yourself with:

Exercise

What does Fr[ea]nc[eh] match?

What does Fr[ea]nc[eh]$ match?

What would match the strings French and France only that appear at the beginning of a line?

How do you match the whole words colour and color (case insensitive)?

How would you find the whole-word headrest and or the 2-gram head rest but not head rest (that is, with two spaces between head and rest?

How would you find a 4 letter word that ends a string and is preceded by at least one zero?

How do you match any 4 digit string anywhere?

How would you match the date format dd-MM-yyyy?

How would you match the date format dd-MM-yyyy or dd-MM-yy at the end of a string only?

How would you match publication formats such as British Library : London, 2015 and Manchester University Press: Manchester, 1999?

Next Week

Installation (to be completed before the session)

Windows users, see the section entitled ‘Installing Git Bash’ in the Programming Historian lesson Introduction to the Bash Command Line. OS X and Linux users, simply make sure you know how to find your ‘Terminal’.

Multiple Choice Quiz

This multiple choice quiz is designed to embed the regex knowledge you learned during this module. We recommend you work through it someone after class (within a week or so). Answers are on the answer sheet.

Q1. What is the special character that matches zero or more characters

Q2. Which of the following matches any space, tab, or newline?

Q3. How do you match the string Foobar appearing at the beginning of a line?

Q4. How do you match the word Foobar appearing at the beginning of a line?

Q5. What does the regular expression [a-z] match?

Q6. Which of these will match the strings revolution, revolutionary, and revolutionaries?

Q7. Which of these will match the strings revolution, Revolution, and their plural variants only?

Q8. What regular expression matches the strings dog or cat?

Q9. What regular expression matches the whole words dog or cat?

Q10. What do we put after a character to match strings where that character appears 2 to 4 times in sequence?

Q11. The regular expression \d{4} will match what?

Q12. If brackets are used to define a group, what would match the regular expression (,\s[0-9]{1,4}){4},\s[0-9]{1,3}\.[0-9]?

References

James Baker, “Preserving Your Research Data,” Programming Historian (30 April 2014), http://programminghistorian.org/lessons/preserving-your-research-data.html. The sub-sections ‘Plain text formats are your friend’ and ‘Naming files sensible things is good for you and for your computers’ are reworked from this lesson.

Owen Stephens, “Working with Data using OpenRefine”, *Overdue Ideas” (19 November 2014), http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/. The section on ‘Regular Expressions’ is reworked from this lesson developed by Owen Stephens on behalf of the British Library.

Andromeda Yelton, “Coding for Librarians: Learning by Example”, Library Technology Reports 51:3 (April 2015), doi: 10.5860/ltr.51n3

Fiona Tweedie, “Why Code?”, The Research Bazaar (October 2014), http://melbourne.resbaz.edu.au/post/95320810834/why-code

Key Points