The following provides information on preparation of datasets and associated documentation for submission to NIDDK Central Repository (NIDDK-CR). The overall goal of this effort is to produce research datasets and associated documentation to allow qualified outside researchers to perform new research while providing protection for the privacy of the research participants. This is accomplished by providing complete and accurate study data (and related biospecimens) according to best practices to replace, remove, or otherwise protect any directly identifiable participant level data.
The data submission process involves the following steps:
The documentation should be comprehensive and sufficiently clear to enable investigators who are not familiar with the study data to use it. The following types of documents will need to be assembled for submission to NIDDK. Documents should be in their original electronic format.
To facilitate the transfer of study materials, please review the Study Data Checklist. This checklist should be used to specify the items included in the submission and may be uploaded along with the data package. NIDDK-CR will provide instructions for upload after initial contact regarding the submission.
The materials needed for submission are:
Each of these categories is described in more detail below.
Provide study documentation in PDF format, except where noted below. Documentation should include:
It should be noted that selected study documentation, not including documentation of pre-redacted (private) study datasets but including documentation of datasets to be shared, will be posted and used to describe the study on NIDDK-CR website. Examples include Forms, Data Dictionaries, Descriptive Statistics, and the Study Protocol.
To see the contents and appearance of a typical study listing, click on any listed study at the Study Search Page.
Certain personal information of study participants or of relatives, employers, or household members of the individuals must be removed from all items, including data, images, and documentation before submitting to NIDDK-CR. Below is a list of information that fall into this category and should be removed prior to submitting materials to NIDDK-CR.
Submitted datasets should include all raw data and analysis level files from all study visits, laboratory measurements, study procedures, and outcome elements along with other final supplemental files (for example, required calculated variables) so that users may approximate published results and conduct new analyses. Submitted datasets must be redacted to remove the personal information specified above and data collected solely for administrative purposes and must conform to individual informed consent restrictions. Public-use datasets may contain recodes of selected low-frequency data values necessary to protect participant privacy and minimize re-identification risks. The redaction process may impact the exact replication of published results using the public-use data.
If a study wishes to prepare the public-use datasets, modified study dataset documentation which reflects changes made to the included variables and recodes should be prepared. This documentation will be provided along with the public-use datasets to approved requestors. A summary document which describes the changes and deletions which were applied during redaction should also be included. In addition, a summary documentation file, usually called a README file, should be submitted. This document should provide a complete overview of the data and a description of their use, appropriate for investigators who are not familiar with the dataset. It should include a description of significant events which may not be documented in the protocol or other documents that would be useful to understand the submitted data; examples might include addenda describing significant changes in study procedures, cautionary information regarding the interpretation of data elements or which explain apparent inconsistencies in the data or frequently missing data; the abandonment of selected data collections from one or more sites; modifications to questionnaires over time if not documented elsewhere, etc.
The README should also contain a brief description of the study (including a general orientation to the study, its components, and its examination and follow-up timeline), a listing of all files being provided, a description of system requirements, a generation program code for installing a SAS file from the SAS export data file (if appropriate), and a frequency distribution for selected key variables.
Upon completion of the preparation of datasets and documentation, these files will be ready for transfer.
Once transferred, NIDDK-CR support staff will review the submission to verify the transferred records and included study data variables, re-generate frequencies for comparison to those generated by the study staff, and review datasets for additional items that may need to be redacted or recoded. Studies that have multiple datasets will be assessed on their ability to be linked to one another. NIDDK will also examine variables contained in multiple datasets, such as Participant ID and visit, to ensure that they have been formatted consistently across all datasets.
Pre-redacted (private) data will be held securely by NIDDK-CR and used to facilitate the comparison of submitted data to publications, assist in making appropriate resources available to requestors, and in the long-term management of the repository resources.
NIDDK-CR shares data/biospecimens (repository resources) with qualified researchers under a data use agreement restricted to the specified research proposed by the researcher. Each request for resources undergoes a review for merit and recipients of resources agree not to attempt to identify or contact participants, link to other resources not specified in their proposal, nor share these data with others. The Data and Resources Use Agreement example templates can be found on the Information for Requestors using NIDDK-CR Resources for Research (R4R) page under the Helpful Information tab.
If a study wishes to prepare the public-use data sets, all redacted data, such as full dates, center values, and original values for grouped data, should be provided in a separate file along with a link between the original Participant IDs and the newly assigned (randomized) IDs. When biospecimens are also submitted, the Biospecimen Linkage File (described below) should include any redacted biospecimen collection dates and their associated study time points for each participant. When different study-related data types (e.g. genomic, omics) are provided to other NIDDK approved repositories, a linkage file should be provided to map the data provided to NIDDK-CR with the data provided to the other repository(ies). These files will be stored securely for archival purposes and may be used to facilitate the comparison of submitted data to publications, assist in making appropriate resources available to requestors, and in the long-term management of the repository resources.
(if biospecimens are submitted for storage at NIDDK Biorepository)
Biospecimen ID | Participant ID | Visit Code | Collection Date |
---|---|---|---|
A-0001 | 010-001 | BV | 01/01/2001 |
A-0027 | 010-001 | FV01 | 02/10/2001 |
A-0078 | 010-001 | FV02 | 06/01/2001 |
A-0002 | 010-002 | BV | 01/01/2002 |
B-0001 | 020-001 | BV | 01/01/2003 |
B-0023 | 020-001 | FV01 | 02/10/2003 |
B-0002 | 020-002 | BV | 01/01/2004 |
B-0025 | 020-002 | FV01 | 02/10/2004 |
B-0041 | 020-002 | FV02 | 06/01/2004 |
I-0001 | 090-001 | BV | 01/01/2003 |
I-0007 | 090-001 | BV | 01/01/2002 |
Note that biospecimen collection dates are a required component of the Biospecimen Manifest that accompanies each shipment of biospecimens to NIDDK Biorepository as described in the Biospecimen Submission Label, Manifest, and Shipping Guidance. Alternatively, the full collection date may be provided as part of the Biospecimen Linkage File.
The biospecimen linkage file should include a simple table showing the description for each visit code. The visit code description table for the previous example might look like:
Visit | Code Description |
---|---|
BV | Baseline Visit (enrollment) |
FV01 | Follow-up Visit #1 (week 6) |
FV02 | Follow-up Visit #2 (month 6) |
(if image data were collected for storage in NIDDK-CR)
If study-related data have been or will be provided to other NIDDK-approved repositories such as dbGaP, a link must be provided between the IDs submitted to NIDDK-CR and the IDs used at the other NIDDK-approved repository. This link will be used to facilitate combining data from the two sources for approved analyses.
(Rev 02/25/2020)