Introduction to Data Science (STAN107)


Data Science is concerned with analyzing and reporting on a range of different kinds of data including structured data stored in organizational databases and unstructured data that is often text-rich and not collected according to a particular data model. Work in this field requires specialized techniques and tools that draw upon both statistical and computational methods to address complex real world problems and employ multidisciplinary analytics to derive knowledge from large sources of data (Big data).

The following Data Science modules will provide an introduction to this rapidly growing field with a particular focus on machine learning methods and analytic techniques that can serve the needs of health and environmental researchers working to understand trends in society, health and human behavior.  

This self paced free online course will provide you with an introduction to Data Science using R Software.

The course includes four modules with an introductory and practicum session. Each module focuses on the application of specific machine learning methods and analytic techniques with general formulas presented but does not delve into their statistical theory.

The four modules each include PowerPoint slide decks, training data, R and Python code and associated references for further study.

Topics include:

  • Introduction to Machine Learning
  • Regression and Regularization Algorithms
  • Advanced Supervised Learning 
  • Advanced Unsupervised Learning 

Course format

The self-paced course includes four modules. Following review of each module slide deck and webinar recordings, you can practice using R software with the training data set and R code. This can be done by downloading the course resources and using R software on your computer or accessing Population Data BC’s Remote Training Lab (RTL).

Training time

The first session of each module can be reviewed in approximately 30 minutes. The second session of each module can be viewed in 2 hours. Additional practice time using R software, training data and related analytic activities is open to your individual needs.

You may wish to complete the modules all together or as separate training sessions over a period of several days or weeks to best fit your schedule or learning preferences.

Access fee

Access to this guide is free. Go to: and, if you do not already have a my.popdata account you will need to sign up and create one.

Once you have a my.popdata account, go to the Education & Training section of the my.popdata site at You can then login in with your PopData account username and passphrase and self-enroll to access the guide/course.

Course presenter

Aman VermaAman Verma  is a Data Engineer with a PhD in Epidemiology from McGill University, and an undergraduate degree in Computer Science. He has experience in developing machine learning systems with large databases, particularly for scientific data in healthcare. While he’s comfortable learning any programming language, he’s recently become particularly interested in R. Aman is currently involved in a number of projects, including measuring how following opioid prescription guidelines can decrease the risk of opioid overdose, modelling trajectories of chronic obstructive pulmonary disease, and assessing how to best prioritize ambulance calls using secondary healthcare data.