Using ehrQL in OpenSAFELY projects
This page describes how ehrQL fits in with a full OpenSAFELY project.
In one sentence:
Researchers develop an ehrQL query and analysis code on their own computers using dummy tables, then submit it to the OpenSAFELY jobs site to run against real tables in an OpenSAFELY backend.
Project workflow summary🔗
The workflow for a single study using ehrQL is much like that for existing studies that use cohort-extractor.
In summary:
- Create a Git repository from the template repository provided and clone it on your local machine.
- Write a dataset definition in ehrQL that specifies what data you want to extract from the database. Only this step is specific to ehrQL.
- Develop analysis scripts using dummy datasets in R, Stata, or Python to process and analyse the dummy datasets created by ehrQL.
- Test the code by running the analysis steps specified in the project pipeline.
- Execute the analysis on the real tables via OpenSAFELY's jobs site. This will generate outputs on the secure server.
- Check the output for disclosivity within the server, and redact if necessary.
- Release the outputs on the jobs site.
Dummy datasets🔗
Because OpenSAFELY doesn't allow researchers direct access to patient data, researchers must use dummy datasets for developing their analysis code on their own computer.
When an ehrQL action is executed on a researcher's computer (see Running ehrQL), ehrQL can generate dummy datasets based on the properties of the tables used in the dataset definition. Alternatively, users can also provide their own dummy tables.
This allows the dataset definition to be checked for errors, and produces dummy datasets that can be used to test downstream actions that depend on the output of the ehrQL action.
Real tables🔗
Executing a dataset definition against real tables in an OpenSAFELY backend involves running the study on the OpenSAFELY jobs site. More information about the jobs site and how to run a study can be found in the OpenSAFELY documentation.