The University of Arizona
Map Home
Loading...
Adjust height of sidebar
KMap

Grant

EAGER. Collaborative Research: Evaluating Identifier Services for the Life Cycle of Biological Data

Sponsored by National Science Foundation

$65.4K Funding
1 People
External

Related Topics

Abstract

Unique identifiers are key to current and future access to and use of research data, which are often distributed across a landscape of storage and analysis resources, publishing platforms, and repository services. In the biology domain, researchers and data managers have expressed the need to use identifiers from the moment of data creation and throughout the research lifecycle. A wide range of methods are used in the biological sciences to produce many different kinds of data, which may require the application of different types of identifiers to make connections between physical samples, digital data, analysis, and publications. This project will develop and evaluate proof-of-concept and prototype services with particular focus on DNA/RNA sequence data. This will expand on data modeling work done as part of the iPlant Data Commons, using real world biology datasets from iPlant, the Texas Advanced Computing Center (TACC), and the National Ecological Observatory Network (NEON). Project results will be disseminated across the biology and information science communities. Software generated during this project will be maintained in an open source software repository for further development by the community. Some of these products will benefit smaller organizations that provide repository services but have limited software development staffing. This research will inform the development of similar services for different data types and in other domains dealing with issues identification through a project's lifecycle. There is a growing need for services to verify, track, and report events (i.e. provenance) in relation to identified datasets over time. Such services should start as early as possible in the life of a research project and be as much as possible automated. Much of the current research and development around digital identifiers focuses on facilitating data citation and discovery post-publication. This project will address problems arising for large, dispersed, biology datasets and changing events. Instead of assigning identifiers only at the last stage for curated datasets, usage of different identifiers are assessed throughout the continuum of data management, publication, archiving, and reuse. A prototype identifier infrastructure for identifiers management, permutation, and data validation/authentication across time. The implementation and evaluation of these services will test the use of identifiers beyond the "data publication stage," to connect dispersed data objects as they transition through the continuum of data management, publication, and archiving. This project will develop and evaluate a set of proof of concepts/prototypes to 1) model identifiers to the lifecycle management of bio data including their transition into global, unique and persistent identifiers; 2) conduct automated verification of the data linked to those identifiers to track presence at registered locations and integrity and identity over time; and 3) assess how collection creators use identifiers and respond to identifier services.

People