Introduction to Cohort Builder
The FAIR Cohort Builder gives researchers the ability to qualify a dataset before submitting a data access request, to determine if there is a subset of records within a dataset that meet the requirements for their project. We call this subset a cohort.
The Cohort Builder provides users with a query builder and dynamic visualisations to explore datasets and understand the shape of the data, without giving users access to record level data. The following articles provide:
- An overview of Cohort Builder features
- An example of how to build a cohort
Cohort Builder Overview
Accessing Cohort Builder
Users can access Cohort Builder in two ways:
- By selecting New Cohort from the Cohorts drop down on the menu bar
- Directly from a dataset via the Cohort tab
Users accessing via the Cohorts drop down must select a dataset and a dictionary before launching Cohort Builder.
Cohort Builder UI
The Cohort Builder UI has two main components:
- Query builder
- Visualisations
Query Builder
The query builder allows users to:
- Create multi-clause queries
- Query across multiple dictionaries in the same dataset
- Create multiple sub-cohorts to compare
Visualisations:
Users do not need to apply any filters in the query builder to use the visualisations, and can create the following chart types:
- Bar chart
- Stacked bar chart
- Box plot
- Histogram
Users can toggle between the visualisations by using the drop down menus at the top of the display window.
In addition, users can download their visualisations in a number of file types.
Saving and Sharing Cohorts
To share or request a cohort, users must save it using the Save option at the top of the cohort tab.
When saving a cohort, users must give the cohort a name, and choose a level of visibility.
A cohort can be:
- Private - only visible to the creator and other users they share with
- Internal - visible to all logged in users
Unlike datasets, cohorts cannot be made public.
Saved cohorts can be accessed via My Cohorts on the the Cohorts dropdown. The My Cohorts page displays all cohorts that a user has access to, including those created by other users:
Users can search and filter the cohort list using the filter option on the top right of the My Cohorts tab
Requesting and Transferring Cohorts
The process for requesting a cohort is the same as a dataset, and users can request any cohort they have access to.
To request a cohort, users need to open it and choose the Request option from the cohort screen.
When a cohort request is submitted it contains a link to the cohort, so that the approver can review the cohort in detail. A link to the cohort is also included on the data transfer receipt.
Building a Cohort
This article steps users through the process of creating a cohort in Cohort Builder.
In this example the dataset has two dictionaries:
- Participant Profile
- Comorbidities
These will be used to demonstrate the following steps:
- Using the visualisation tool
- Building a single clause query
- Building a multi clause query
- Adding a dictionary to a cohort
- Querying across dictionaries
- Duplicating and comparing cohorts
Using the visualisation tool
Upon opening the cohort screen the user can immediately use the visualisation tool to preview the data held in the dataset.
In this example, the user wants to know how many participants in the study have a family history of dementia. To do this they select Bar Chart from the chart's Select a Visualisation drop down and then Family History from the Choose a Field drop down:
This creates the following chart:
The user can toggle between the different charts before building a query. For example, a box plot comparing the the number of years in education vs family history of dementia:
Building a Single Clause Query
Now that the user has some understanding of the data, they want to start building their query, which will allow them remove participants from the cohort who do not meet their criteria.
In this example the user wants participants who have both:
- a family history of dementia
- 12 or more years of education
Above you can see the following:
- The user has created a single clause query which requires subjects to have a family history of dementia AND 12 or more years of education
- This has reduced the number of participants to 643 from a possible 2097
Applying this filter has also updated the visualisation to reflect the updated query:
Building a multi-clause query
In the example above the user created a cohort where the subjects have both a family history of dementia and 12 or more years of education.
If they want a cohort which contained all participants who had a family history of dementia OR 12 or more years of education then a multi-clause query is required:
Above you can the see the following:
- The user has created a multi-clause query that contains users who have either a family history of dementia OR 12 or more years of education
- There are now three counts, one for each clause and a top level cohort count of 1706 which includes the total number of participants who meet at least one of the criteria. Participants who meet both criteria are only counted once in the top level count.
Adding another dictionary to a cohort
The examples above are queries on a single dictionary, "Participant Profile". If the user wants to explore the "Comorbidities" dictionary they need to add it to the query.
To do this they need to choose the dictionary from the Add table drop down on the top right of the query builder.
This adds the "Comorbidities" dictionary to the query builder:
The user can now use the visualisation tool to explore the "Comorbidities" dictionaries.
Querying Across Dictionaries
By following the steps above the user is able to profile the data in both dictionaries independently. However, Cohort Builder also allows users to query across dictionaries.
The query builder has two filters which allow the user to compare data in different dictionaries:
- Includes
- Excludes
In the example below the user is comparing the values in the participant ID columns of the "Participant Profile" and "Comorbidities" dictionaries, and choosing to include any IDs in their cohort that appear in both dictionaries:
Exclude performs the opposite function, excluding any matching IDs from the cohort.
The user can employ the Include and Exclude filters to query any fields which contain the same data type e.g. text fields can be compared with text fields, integer fields with integer fields.
Duplicating and Comparing Cohorts
The user can also easily duplicate their cohort by selecting Duplicate from the Cohort drop down
This will create a duplicate of the existing cohort, and the visualisation will update to display both cohorts:
The filters in each cohort can be changed independently, allowing the user to easily visualise the impact on the cohort of any changes to the query.