By Andrew Mckenna-Foster and Adrian Clark
Note: We think you shouldn’t be forced to have a separate data repository from your institutional repository. An institutional repository should be able to handle data properly along with papers, presentations, software, etc. This post is written with that in mind.
Switching repository platforms is no walk in the park! Institutional repositories combine technical infrastructure with institutional workflows, have accounts for both faculty/students and administrators, and require file storage, sometimes at very large scales.
If you are migrating to a vendor supported platform, like Figshare, this process should be a close collaboration between you and the vendor. Figshare regularly helps institutions switch repository platforms and we wanted to share some of what we’ve learned over the years.
Even if you are looking to set up an open source platform on your own, this article may still be relevant to you, so please read on!
Sections in this article:
- Articulate the purpose for your repository
- Gather feedback and create buy-in from stakeholders
- Take time to think through how to organize the new repository
- Plan your migration carefully – make it an opportunity!
- Think about the future
- Pricing: infrastructure and resources
- Putting it all together: Requirements matrix
Articulate the Repository’s purpose
The following tips help you make sure a platform will meet your repository’s goals, before you go too far into the weeds.
Use cases: Articulate your main use cases for the repository. Will it primarily be a catalog of academic outputs managed by one person or will it be a ‘living’ record repository where researchers are submitting and versioning records, curator(s) are assisting and checking records, and admins are managing workflows and system integrations? Or a combination of both?
Workflow requirements: Workflows should support efficiency and appropriate control. In our experience, workflows may become entrenched because of limitations with your current system, or because that’s the way it’s always been done. Before engaging with a repository provider, take the opportunity to step back and review whether your workflows are really working for you! Start with your ideal, and speak with a range of providers about how they can best match what you are looking to achieve. This will help you determine how the repository will fit into or enhance your workflows. For example, knowing that there are multiple ways users and admins can submit records is not enough: do the repository submission options enhance your current workflows or will they just require headache inducing workarounds? Or if you need a ‘request access’ feature for records, how will the repository fit into your access request fulfillment workflow? Again, it might bring efficiency or it might cause unnecessary headaches. We feel a minimum of an 80/20 split to be supported should give you a good idea of which providers can empower your needs best.
Key advice: Define your primary workflows so that you’ll be able to ask potential vendors to show you exactly how the platform will work with those workflows. Or if you are building the repository yourself, work with your IT to understand exactly how a platform will work for you.
Internal user needs and compliance: It’s important to understand how your internal users will interact with the infrastructure. Do researchers need individual accounts? Can they login via single sign-on (SSO)? Do all users use the same SSO or do you have more than one system? Will you have significantly different needs for graduate students compared to the faculty? All of these can have ramifications for managing your repository and successfully engaging your community. Additionally, will the repository be a key component in open access compliance? Repositories can help your researchers make open access papers available and publish open data to comply with funder policies.
Importantly, usability can be at risk of being overlooked when considering user needs. User needs can often, and for good reason, be a proxy for: “what do our researchers need to be compliant with?” It’s important to remember the human in the machine here as well as the more functional side of things.
Integration needs: Finally, what systems will your repository integrate with? It is common for repositories to integrate with a discovery layer, a current research information management system (CRIS or RIM), thesis or dissertation submission system, and possibly internal reporting dashboards. Do you need to connect with other administrative units to gather use cases or needs? Institutions that use Figshare typically use the OAI-PMH endpoints to surface records in their discovery layer (e.g. Primo). Another common integration is using the Figshare API to tie in with an existing thesis and dissertation submission process. Scholarly communication is moving towards an ever more inter-connected ecosystem, having an idea of what your repository must, should, and could integrate with can be a helpful way to frame your thinking.
Gather feedback and create buy-in from stakeholders
Once you’ve outlined the scope of your repository, it’s time to shore up support for the project. A repository cannot be successful if no one is submitting any records for publication. It also cannot be successful if people are submitting things but the process to publish them is too complex. Success depends on collaboration and communication. Thus, you need as many of the stakeholders as possible to provide some input or express their needs during this repository selection process. Giving people a chance to take part in the process will lead to higher engagement when the repository is launched.
Key advice: You may need to re-evaluate the scope of your repository after this. It’s an iterative process!
Here are some ways to gather input:
- Send out a survey to faculty to understand how they view the current repository (if it exists). Also ask about how they perceive open access, data sharing, and open science. Gauge the knowledge around repositories and open access to inform your selection process and future engagement efforts.
- Ask the main stakeholders of your current repository to list three ways the repository is important to them and three ways the repository (or the information and services you provide) could be better.
- Create a repository committee or working group with representatives from each stakeholder group to take part in demos and deliberations – keep in mind that including too many people may slow down the process without adding much benefit! Include people who you know will:
- Be engaged with the process
- Can inform key tasks, like procurement, IT, compliance
- Will help make the repository a success after launch
Take time to think through how to organize the new repository
How will end users, both internal and external, access and use your repository? And how will this change in the future? Once a repository structure is set up, it can be a major project to reorganize so it’s good to dive deep on this and think about future needs.
Browse and search: You may want to organize records with an eye towards users browsing your repository. However, most repository users access records directly through search engines, so it may be more useful to organize things in a way that helps you manage and track records internally. Being able to extend the repository structure to accept new types of research outputs and handle the needs of diverse disciplines is also essential. When implementing a Figshare repository, our Implementation Managers talk through these options and all the ramifications to make sure your repository is useful for you.
Submission and reporting considerations: Will you have some records you need to go through review while others do not? Make sure the repository can provide that flexibility. In Figshare, the review functionality is flexible and can be configured to your needs. It can be set up with multiple reviewers or just one. Almost every institution using Figshare has the review workflow turned on for at least one part of the repository. Will you need to report on specific sets or records? How you organize the repository may make reporting easier.
Key advice: When evaluating platforms ask to see how your current repository organization would work and ask how other customers commonly organize things.
Plan your migration carefully – make it an opportunity!
If you have current repository records, it is essential that you can export the metadata and files from your legacy system. Our Migration Team often runs into situations where the legacy platform cannot export the metadata in a usable way. Or, even worse, the old vendor is not very responsive to your requests to export the metadata your institution owns! Figshare provides an openly documented API that makes sure you can always access your data- we know how important it is for an institution to maintain stewardship over the files and metadata.
It’s also important that you know your metadata really well, public and private. Fields will need to be mapped to the new system’s metadata schema. You might need to merge some fields or split some fields. In the Figshare case, the Implementation Manager helps coordinate this and we run multiple test migrations to make sure everything will migrate properly. For those more technically minded, Figshare offers tools that you can use to do the migration yourself.
No matter what, a migration is a great opportunity to clean and update your metadata! If you can get all of your metadata into one or more spreadsheets, you can use a tool like OpenRefine or tools in your spreadsheet program to:
- Make spelling consistent
- Clean up author names
- Add ORCIDs to author names
- Add new fields and content across records
- Reorganize records
Think about the future
The repository space is changing rapidly, both from the technological perspective and also the policy and practices perspectives. Finding a platform that adapts with the needs of your institution is important. Figshare development is customer driven through a feature request forum, user group meetings, and customer product testing. The Product Team prioritizes new features based on all this feedback, security requirements, and changes to technology.
As you evaluate platforms, ask yourself these questions:
- Will your new platform keep pace with the evolving repository space?
- Is the platform updated regularly and always available?
- Will you be able to move off the new platform without much difficulty?
- If you have staff turnover, in the library or IT, will the repository be able to continue functioning unhindered?
Pricing: infrastructure and resources
Pricing is always a top priority and budget restrictions can force institutions to choose a platform that may not meet all its needs. When pricing a platform, it’s important to consider all the costs, not just the obvious fees. A fully functional repository requires a mix of IT, library science, and scholarly communication expertise. It also requires digital infrastructure like software, paid services (like DOI minting), and storage.
Well-resourced institutions that can afford and want to dedicate IT staff to repository infrastructure can successfully build their own platform or use one of the many open source options. Institutions with fewer resources may need to either run their own platform and settle for a very basic implementation or pay a 3rd party to set up and maintain a platform for them.
Figshare’s model is to eliminate as much of the standard IT costs and burden as possible for an institution by providing the software, maintenance, support, and options for storage. This frees up an institution’s IT resources to focus on other research infrastructure. An institution can also dedicate those IT resources to getting the most out of their repository through integrations. Large, small and medium sized Figshare customers often have different drivers related to their scale, but one thing they do have in common is a preference for infrastructure that is secure, up to date, and reliable, with as little drag on IT resources as possible.
Putting it all together: Requirements matrix
Now that you’ve gathered feedback, built internal support, and examined your requirements, it is time to create a requirements matrix that will help you select a platform. The below table is a summary of requirements Figshare often sees in RFPs and questionnaires from around the world. Classifying your requirements in some way can be really helpful, especially if different parts of your organisation will be contributing to the evaluation exercise. Speak to your Procurement colleagues about best practices in putting your requirements together; a well written specification will more easily help identify the best options for your organisation and research community.
Category | Requirement |
Metadata | Standard fields available, custom fields controlled by admins |
Add controlled vocabularies to custom fields | |
Support for adding PIDs like DOIs, ORCIDs, ROR | |
Capture funding related metadata | |
Optimizes metadata for discoverability in search engines | |
Workflows | User focused interface and tools – does the platform serve the main users? |
Streamlined deposit workflows | |
Curation workflows for admins | |
Collaborative workspaces | |
Create collections of related works | |
Option to give access privately for restricted files | |
Reporting | Tracks usage metrics, Citations, Altmetrics |
Dashboards for reporting | |
Content Adaptability | Supports uploads of any file type |
Preview wide range of files in browser | |
Open access & restricted publishing options | |
Tools for large file deposits | |
A variety of standard record types available | |
Interoperability | Open API for custom integrations |
RIMS system integration(s) | |
ORCID Integration | |
Researcher tool integrations: e.g. GitHub, lab notebooks | |
OAI-PMH | |
Standards | Accessibility compliance |
Funder or government policy compliance? | |
Strong Security: HECVAT Lite or similar and Security Certifications | |
Availability and Updates | Fast loading, responsive, and scalable platform |
Public status page available | |
Regular platform updatesNo planned downtime | |
Are platform upgrades to new versions required? |