Data Management and Cleaning for Analysis with R (STAN106)

Course content

This self paced free online course will provide you with an introduction to Data Management and Cleaning for Analysis using R Software. Each of the four module includes a Power Point slide deck, training data, R code and associated exercises for practice.

Topics covered include:

  • Introduction and theory of data cleaning and management
  • Getting started with R software
  • Subsetting variables and data cleaning
  • Creating variables, subset observations and data cleaning
  • Merging, joining and reshaping data

Course format

The self-paced course includes 4 modules. Following review of each module slide deck, you can practice using R software with the training data set and R code. This can be done by downloading the course resources and using R software on your computer or accessing Population Data BC’s Remote Training Lab (RTL).

Training time

Each module can be reviewed in approximately 30 minutes with a total training time of approximately 2  hours. Additional practice time using R software, training data set and related exercises is open to your individual needs.

You may wish to complete the modules all together or as separate training sessions over a period of several days or weeks to best fit your schedule or learning preferences.

Access fee

Access to this guide is free. Go to: and, if you do not already have a my.popdata account you will need to sign up and create one.

Once you have a my.popdata account, go to the Education & Training section of the my.popdata site at You can then login in with your PopData account username and passphrase and self-enroll to access the guide/course.

Course Developer

Megan Striha currently works as a Data Analyst. She has a Masters of Public Health degree and three years of experience in health data analysis, including working with survey, administrative and census data.