DPUK Synthetic Dataset

Data from three population cohorts were used to create a synthetic dataset (synth1.0) for system development, methods development and training purposes. The synthetic dataset does not contain any personally identifiable information.

Variables were selected to represent those typically used in dementia research and to cover survey, biomarker and imaging data modalities. Sixty-one variables were modelled, to these was added a unique participant identifier resulting in a dataset of 62 variables. Variable distributions and covariance from the source cohorts were used to parameterize the model. Generated data were checked for range, distributions, and associations. The generated dataset comprised 150,618 individuals.

To learn more, visit the DPUK Cohort Directory here.

Manuscripts citing this dataset

Machine learning for the life-time risk prediction of Alzheimer’s disease: a systematic review. 2021. DOI: https://doi.org/10.1093/braincomms/fcab246
Global dialogue on data sharing for dementia research: Transcript The dementia landscape Project. World Dementia Council. 2021. https://www.worlddementiacouncil.org/sites/default/files/2021-06/DLP%20-%20Transcript%20-%20Data.pdf
International Population Data Linkage Network 2018 Conference. 2018. https://ipdln.org/sites/default/files/2018ConcurrentSessions/AT-A-GLANCE-Final.pdf

Request Access

Data access can be requested via AD Workbench FAIR portal here. Access requests are reviewed by the DPUK team and the dataset will be automatically delivered to your workspace Inbox upon approval.

More information on DPUK’s data access policy can be found here.

Data Use Agreement

Information on DPUK’s data management and use can be found here. The cohort metadata access agreement can be found here.

Publishing results using this dataset?

DPUK asks that any publication of the metadata must acknowledge the use of DPUK resources including the source data. For detailed information on how to acknowledge DPUK’s data correctly, visit here.

Discuss

Post a question or thought about this dataset here.