Exercises - Provenance

Exercises - Provenance#

1) ORCID#

If you don’t already have an ORCID, go to the website and register now. If you do have an ORCID, log in and make sure that your details and publication record are up-to-date.

2) A FAIR test#

An online questionnaire for measuring the extent to which datasets are FAIR has been created by the Australian Research Data Commons. Fill in the questionnaire for a dataset you have published or that you use often.

3) Evaluate a project’s data provenance#

This exercise is modified from [Wickes and Stein, 2016] and explores the dataset from [Meili, 2016]. Go to the dataset’s page http://doi.org/10.3886/E17507V2 and download the files. You will need to make an ICPSER account and agree to their data agreement before you can download.

Review the dataset’s main page to get a sense of the study, then review the spreadsheet file and the coded response file.

Who are the participants of this study?
What types of data were collected and used for analysis?
Can you find information on the demographics of the interviewees?
This dataset is clearly in support of an article. What information can you find about it, and can you find a link to it?

4) Evaluate a project’s code provenance#

The GitHub repository borstlab/reversephi_paper provides the code and data for the paper @Leon2017. Browse the repository and answer the following questions:

Where is the software environment described? What files would you need to re-create the software environment?
Where are the data processing steps described? How could you re-create the results included in the manuscript?
How are the scripts and data archived? That is, where can you download the version of the code and data as it was when the manuscript was published?

To get a feel for the different approaches to code provenance, repeat steps 1-3 with the following:

The figshare page that accompanies the paper [Irving et al., 2019].
The GitHub repo blab/h3n2-reassortment that accompanies the paper [Potter et al., 2019].

5) Making permanent links#

The link to the UK Home Office’s accessibility guideline posters might change in future. Use the Wayback Machine to find a link that is more likely to be usable in the long run.

6) Create an archive of your Zipf’s analysis#

A slightly less permanent alternative to having a DOI for your analysis code is to provide a link to a GitHub release. Follow the instructions on GitHub to create a release for the current state of your zipf/ project.

Once you’ve created the release, read about how to link to it. What is the URL that allows direct download of the zip archive of your release?

What about getting a DOI?

Creating a GitHub release is also a necessary step to get a DOI through the Zenodo/GitHub integration (Section @ref(provenance-code-scripts)). We are stopping short of getting the DOI here, since nobody reading this book needs to formally cite or archive the example Zipf’s Law software we’ve been developing. Also, if every reader of the book generated a DOI, we’d have many DOIs pointing to the same code!

7) Publishing your code#

Think about a project that you’re currently working on.

How would you go about publishing the code associated with that project (i.e., the software description, analysis scripts, and data processing steps)?