Open research / April 2, 2020

Credit for the institutional data repository

As well as providing data repository functionality for organisations ranging from the NIH to Springer Nature, we also have a free product offering, which we refer to internally as ‘figshare.com.’ This allows researchers from around the world to make data available at no cost, get a DOI for their published research, and track the impact. What it does not come with is curation.

In an ideal world, researchers would make their data available as follows:

  • In a curated subject specific repository, with community defined metadata best practice and curation by subject experts
  • In a university, funder, or publisher repository with metadata improvement (e.g. https://nih.figshare.com)
  • In a generalist repository like Figshare or Zenodo

Giving researchers choice is a fantastic way to ensure that all research outputs can be made as open as possible, as closed as necessary. However, in the academic world, the importance of credit for research impact is a key driver both at the individual and institutional levels. Similar to traditional academic publishing, academic institutions end up with their content distributed in many platforms and struggle to a) get a copy of their own research or b) measure the impact of all of their research outputs.
Inspired by the wonderful work done by the team at Unpaywall, we want to make it easy for all institutions to have a copy of all of their research outputs in their repository

“Institutional Repository managers can use Unpaywall data to find OA resources that faculty have posted online, without depositing in their IR. These can be automatically ingested, significantly increasing IR coverage without needing to convince faculty to deposit. Repositories of all sizes have used Unpaywall data in this way.”

Fortunately in the research data world, lots of data is made available with no restrictions. The data repository Dryad insists on giving all published data a CC0 license. This allows users of the data to aggregate content and map it to institutions to provide an auto-populating repository.

Of course, it will be difficult to aggregate all outputs in this manner as there are still some grey areas around institutions hosting CC BY-NC copies of their own data (the “NC” stands for non-commercial).

A recent session at RDA on this topic discussed 3 key areas where attention should be focussed – Authority, Identity, Ethics. This session has helped guide our approach to aggregating content from other generalist open data repositories and sources of open research outputs. As such, our efforts will focus on these principles:

  • Aggregation of only CC0 or CC-BY content
  • No new persistent identifiers. In most cases, this means a consistent DOI
  • Mirroring of metadata so as to not lose any context
  • Updating new versions where possible via open APIs
  • Clear labelling of original publication point
  • Implementing best practice in metrics through ‘Make Data Count’ project

As the open research world develops at a fast rate, we will continue to update our thinking and respond to the needs of our clients, research data librarians and other experts in the space.

Posted April 2, 2020 in: