Data from three population cohorts were used to create a synthetic dataset (synth1.0) for system development, methods development and training purposes. The synthetic dataset does not contain any personally identifiable information.

Variables were selected to represent those typically used in dementia research and to cover survey, biomarker and imaging data modalities. Sixty-one variables were modelled, to these was added a unique participant identifier resulting in a dataset of 62 variables. Variable distributions and covariance from the source cohorts were used to parameterize the model. Generated data were checked for range, distributions, and associations. The generated dataset comprised 150,618 individuals.

