Submission Requirements for the NIDDK Central Repository
Thank you for contributing to the NIDDK Central Repository. The purpose of this document is to describe the materials needed for submission. Please send any questions to firstname.lastname@example.org.
Archive Data Checklist
To facilitate the transfer of study materials, please review the Study Data Checklist now. You will use this checklist to specify the items included in the submission.
The materials needed for submission are:
- Study Data
- data collected on forms
- data resulting from lab tests, genotyping ,etc
- analysis data
- Study Documentation
- Study Forms
- Sample and/or Image Linkage Files, if applicable
Each of these categories is described in more detail below.
- Data provided to the Central Repository must be de-identified in accordance with HIPAA regulations for limited data sets (Attachment A).
-Please provide SAS data sets. Contact the Central Repository if you need to use an alternative format.
- Data set variables should have variable and value labels.
- Please name datasets in a manner that facilitates the matching of data sets to study forms. For example, Form02_v2.pdf and Form02_v2.sas7bdat could represent the study form and study data set for "Form 02: Medical History, Version 2".
- All data associated with study data collection forms should be provided
- Other relevant data, such as Central Biospecimen Laboratory (CBL) data, should be provided
- Provide analysis data sets whenever possible. These supplemental files, constructed by study investigators, are typically associated with a peer-reviewed publication and may include data records that have been merged across a number of different data sets. . Include documentation showing how analysis variables are derived from the forms (raw) variables; see "Study Documentation".
- In order to ensure the transferred data are complete and valid, a Central Repository statistician will replicate selected tables from published results.
Study Data Collection Forms
- Provide copies of data collection forms in PDF format.
- Form instructions not included in Manual of Operations/Procedures should be provided as a PDF.
- Identify copyrighted forms/scales. Copyrighted forms are not provided to requestors of Central Repository data.
- Annotate forms with variable names and value labels whenever possible.
Provide study documentation in PDF format, except where noted below. Documentation should include:
- Study protocol
- Manual of Operations/Procedures (MOP/SOP)
- Study System User Guide, if separate, from MOP/SOP, for electronic data capture
- Bibliography of publications (Word format)
- A contents file for each data set, showing the number of observations, variables, variable labels, and value label name
- Data dictionary or Codebook showing:
- descriptive statistics (means or frequencies) for each variable.
- Explanations of how analysis variables are derived from the forms (raw) variables. A log file for the program used to create the dataset may be provided in lieu of a description.
- Value labels (i.e., SAS user-defined formats.)
- Identification of the publication(s) associated with each analysis data set, if applicable
- URL for the public study website, if applicable
- Any additional information about the study or data sets that will facilitate the use of the data
- Selected documentation will be posted on the NIDDK Central Repository Portal to aid researchers in selecting data sets appropriate for their research.
To see the contents and appearance of a typical study listing, click on any listed study at the: Study Search Page.
Sample Linkage File
(if bio-samples are submitted for storage at the NIDDK central bio-repositories)
- The linkage file should be delivered at the time of study data delivery for completed studies, or at the initial communication between the Central Repository and the DCC for ongoing studies.
- Provide a sample linkage file that uniquely maps each sample ID to the corresponding Subject ID. For studies where samples are collected longitudinally over several timepoints, the linkage file should uniquely map each sample ID to a corresponding Subject ID and timepoint. From this point on, we'll refer to timepoint as a "study visit".
- Study visits should be represented by unique visit codes. A visit code is a label that identifies the study visit during which the sample was collected. Sample collection date is not a substitute for visit code.
- Example: Suppose Study "A" collected longitudinal serums from participants at baseline, week 6, and month 6. Then, a portion of the sample linkage file might look like:
|Sample ID||Subject ID||Visit Code|
The sample linkage file should include a simple table showing the description for each visit code. The visit code description table for the previous example might look like:
|BV||Baseline Visit (enrollment)|
|FV01||Follow-up Visit #1 (week 6)|
|FV02||Follow-up Visit #2 (month 6)|
Image Linkage File
(if image data was collected for storage in the data repository)
- Provide a sample linkage file that maps each Image ID to the corresponding Subject ID and visit code. The file provided should conform to the structure discussed above but refers to Image ID rather than Sample ID.
- A standard image format such as jpg is preferred but we will accept any format that was used as part of the study.
Attachment A: Remove Identifiers from Data Prior to Submission
The following direct identifiers of study individuals or of relatives, employers, or household members of
the individuals must be removed from all items, including data, images, and documentation before
submitting to the Central Repository. If you require assistance with this process, contact
- Postal address information, other than town or city, State, and zip code
- Telephone numbers
- Fax numbers
- Electronic mail addresses
- Social security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Device identifiers and serial numbers
- Web Universal Resource Locators (URLs)
- Internet Protocol (IP) address numbers
- Biometric identifiers, including finger and voice prints; and
- Full face photographic images and any comparable images.