The NIDDK Central Repository (CR) plays a crucial role in making data Findable, Accessible, Interoperable, and Reusable (FAIR). To enhance FAIRness of studies in the Repository, NIDDK and Booz Allen Hamilton, the current Data Repository contractor, have piloted a natural language processing (NLP) pipeline project for harmonizing study variables with NIH CDEs and for identifying potential new CDEs from dataset variables mapped to ontology concepts. Initial results show highly specific mapping of variables to CDEs as well as successful identification of relevant concepts for new CDEs. The pipeline can be refined and applied to other studies to potentially improve FAIRness of the NIDDK CR.
Contact Dr. Rebecca Rodriguez with any questions.