Working with Persistent Identifiers - Curl

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Key question

Objectives
  • First objective.

This lecture takes you through the steps to create and administer PIDs employing the HTTP restful API.

Warming-up: Using PIDs

Below you find three different PIDs and their corresponding global resolver

You can either go to the resolver in your web browser and type in the PID to get to the data behind it. You can also concatenate the resolver and the PID.

Try to resolve the handle PID with the DOI resolver and vice versa.

In the handle resolver you will find a box “Don’t redirect to URLs”, if you tick this box, what information do you get?

Each PID consists of a prefix which is linked to an administratory domain (e.g. a journal) and a suffix. The prefix is handed out by an issuer such as CNRI for handle or DataCite for DOIs. Once you are admin of a prefix, you can register as many data objects as you want by extending the prefix with a suffix. Note, that the suffixes need to be unique for each data object. The epic client helps you with that.

Managing PIDs

Prerequisites

The code is based on cURL. cURL is an open source command line tool and library for transferring data with URL syntax. cURL is used in command lines or scripts to transfer data.

Install cURL dependencies

CURL: is an open source command line tool and library for transferring data with URL syntax. On the training machines

apt-get install curl
apt-get install uuid-runtime

Own laptop

In case you are working on your own laptop with your own python, please install:

easy_install curl
easy_install uuid-runtime

Final check

curl --help

If you write the code described below in a file, don’t forget to change the permissions. You should make each file executable.

Suppose you have a file filename.sh

you can make it by doing as: chmod +x filename.sh

so it will execute when you type ./filename.sh

Example workflow

  1. Obtain a prefix from an resolver admin
  2. Set up internet connection to the PID server with a client
  3. Create a PID
  4. Link PID and location of the data object

In the tutorial below we will work with a test handle server located at SURFsara. That means the PIDs we create are not resolvable via the global handle resolver or via the DOI resolver.

For resolving PIDs please use:

http://epic3.storage.surfsara.nl:8001

Main Parameters of CURL

The main command

curl [options] [URL...]

Before we start, lets explain the main parameters of CURL used as options

These are the main parameters we are going to use in our examples. For more parameters please check [cURL]Chttps://curl.haxx.se/)

Connect to the SURFsara handle server

To connect to the epic server you need to provide a prefix and a password. If you use the example files, this information is stored in a config.txt file and should look like this:

USERNAME=846
PASSWORD=xxxx
FILENAME=surveys.csv #the file (and its location) we are going to use in the examples
PID_SERVER=https://epic3.storage.surfsara.nl/v2_test/handles/846 #be carefull not to add the trailing slash
PID_SUFFIX=XXXX #the suffix of the first created handle
PID2_SUFFIX=YYYY #the suffix of the second handle

You can find the the username and password on the user interface machine in credentials/cred_epic/cred_file.json.

-H "Accept: application/json" -H "Content-Type: application/json"
https://epic3.storage.surfsara.nl/v2_test/handles/846/

Registering a file

We will register a public file from figshare.

First prepare the data in a json format to register.

'[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"}]'

We are going to create a new PID by using the PUT request

So the request method is -X PUT followed by the actual json data

-X PUT --data '[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"}]'

Building the PID:

We now have an opaque string which is unique to our resolver (846/$SUFFIX ) since the prefix is unique (handed out by administrators of the resolver).

The URL in the CURL request:

https://epic3.storage.surfsara.nl/v2_test/handles/846/$SUFFIX
#!/bin/bash

SUFFIX=`uuidgen`

curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X PUT --data '[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/$SUFFIX

The result of this request is a new handle with the name HANDLE where HANDLE = 846/UUID1

Responses

Let’s go to the resolver and see what is stored there http://epic3.storage.surfsara.nl:8001. We can get some information on the data from the resolver. We can retrieve the data object itself via the web-browser.

Download the file via the resolver. Try to use wget when working remotely on our training machine.

How is the data stored when downloading via the browser and how via wget?

Have a look at the metadata stored in the PID entry.

What happens if you try to reregister the file with the same PID?

Dont forget to change the UUD1 to the correct suffix value.

curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X PUT --data '[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/UUID1

(Use example2.sh)

Store some handy information with your file

Lets say that we want to add a new type with data ‘Data Carpentry pandas example file’. We have to update the json data

[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"},
 {"type":"TYPE","parsed_data":"Data Carpentry pandas example file"}]

And the actual request is:

curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X PUT --data '[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"}, {"type":"TYPE","parsed_data":"Data Carpentry pandas example file"}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/UUID1

Use example3.sh

md5value=` md5 surveys.csv | awk '{ print $4 }'`
curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X POST --data '[{"type":"URL","parsed_data":"https://ndownloader.figshare.com/files/2292172"},
 					    {"type":"TYPE","parsed_data":"Data Carpentry pandas example file"},
 					    {"type":"MD5","parsed_data":$md5value}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/UUID1

Use example4.sh

Reverse look-ups

The epic API extends the handle API with recursive look-ups. Assume you just know some of the metadata stored with a PID but not the full PID. How can you get to the URL field to retrieve the data?

We can fetch the first data with a certain checksum:

curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X GET \
		https://epic3.storage.surfsara.nl/v2_test/handles/846/?MD5=MD5VALUE

Use example5.sh

Responses

Updating PID entries

curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X POST --data '[{"type":"URL","parsed_data":"/<PATH>/surveys.csv"}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/UUID1

Use example6.sh

Responses

Try to fetch some metadata on the file from the resolver.

Try to resolve directly to the file. What happens?

We updated the “URL” with a local path on a personal machine. That means you can no longer download the data directly, but you have access to the data stored in the PID.

Linking two files

We know that the file in the figshare repository and our local file are identical. We want to store this information in the PIDs.

SUFFIX=uuidgen

curl -v -u “YOURUSERNAME:YOURPASSWORD” -H “Accept:application/json” \ -H “Content-Type:application/json” \ -X PUT –data ‘[{“type”:”URL”,”parsed_data”:”https://ndownloader.figshare.com/files/2292172”}]’\ https://epic3.storage.surfsara.nl/v2_test/handles/846/$SUFFIX


- Leave information that local file should be the same as the figshare file. Update the data of handle

First update the json
[{"type":"URL","parsed_data":"/<PATH>/surveys.csv"},{"type":"SAME_AS","parsed_data":"846/newhandle"}] ```
curl -v -u "YOURUSERNAME:YOURPASSWORD" -H "Accept:application/json" \
		-H "Content-Type:application/json" \
		-X POST --data '[{"type":"URL","parsed_data":"/<PATH>/surveys.csv",},{"type":"SAME_AS","parsed_data":"846/newhandle"}]'\
		 https://epic3.storage.surfsara.nl/v2_test/handles/846/UUID1

These examples are adjusted to the functionality in the EUDAT B2SAFE service, but can serve as reference implementation for other use cases.

Key Points

  • First key point.