Data Management and Cleaning for Analysis with R (STAN 106)

Printer-friendly version

Course content

This self paced free online course will provide you with an introduction to Data Management and Cleaning for Analysis using R Software. Each of the four module includes a Power Point slide deck, training data, R code and associated exercises for practice.

Topics covered include:

  • Introduction and theory of data cleaning and management
  • Getting started with R software
  • Subsetting variables and data cleaning
  • Creating variables, subset observations and data cleaning
  • Merging, joining and reshaping data

Course format

The self-paced course includes 4 modules. Following review of each module slide deck, you can practice using R software with the training data set and R code. This can be done by downloading the course resources and using R software on your computer or accessing Population Data BC’s Remote Training Lab (RTL).

Training time

Each module can be reviewed in approximately 30 minutes with a total training time of approximately 2  hours. Additional practice time using R software, training data set and related exercises is open to your individual needs.

You may wish to complete the modules all together or as separate training sessions over a period of several days or weeks to best fit your schedule or learning preferences.

Course Developer

Megan Striha currently works as a Data Analyst. She has a Masters of Public Health degree and three years of experience in health data analysis, including working with survey, administrative and census data.

Access fee

Access to this training resource is free. Go to: and, if you do not already have a my.popdata account you will need to sign up and create one.

Once you have a my.popdata account, go to the Education & Training section of the my.popdata site at You can then login in with your PopData account username and passphrase and self-enroll to access the training resource.

Page last revised: December 19, 2018