Webinar Series - Introduction to Data Science

Presented by Population Data BC and the Canadian Urban Environmental Health Research Consortium. This FREE webinar series aims to highlight the value of specific data science methods and techniques for health and environmental research.


Overview

Data Science is concerned with analyzing and reporting on a range of different kinds of data including structured data stored in organizational databases and unstructured data that is often text-rich and not collected according to a particular data model. Work in this field requires specialized techniques and tools that draw upon both statistical and computational methods to address complex real world problems and employ multidisciplinary analytics to derive knowledge from large sources of data (Big data).

The following Data Science webinar series will provide an introduction to this rapidly growing field with a particular focus on machine learning methods and analytic techniques that can serve the needs of health and environmental researchers working to understand trends in society, health and human behavior.  

The presentations are intended for those who are interested in a broad overview to basic data science analytics. The webinar series includes four modules that each include an introductory and practicum session. Each module will focus on the application of specific machine learning methods and analytic techniques with general formulas presented but will not delve into their statistical theory.

Who should attend?

The webinar series will benefit health and environmental researchers, analysts and related professionals who want an introduction to data science approaches for data analytics using R software (Python code will also be provided). 

Requirements

To benefit from the webinar presentations, registrants should have knowledge of simple and multiple linear regression models and categorical data analysis such as logistic regression.

No prior working knowledge of R or Python is required, but some familiarity with R would be beneficial for following the practicum sessions.

As a supplemental resource for this series, you may wish to review our new free online resource: Data Management and Cleaning for Analysis with R software.

Webinar format

The live, interactive Gotowebinar software will provide remote access for participants to view the instructor's screen, listen to the lecture in real time, and ask questions via the online chat function. Registrants will receive a copy of the presentation slides, training data, R and Python code, and related resources for study and practice.

Module format

Each module will include two live webinar sessions:

  • Session 1: A one-hour introductory presentation with question period
  • Session 2: A two-hour practicum session that includes a focus on applied analytics using training data and code provided to all registrants.

Dates/Times

  • All introductory sessions will run from 11:00am to 12:00noon PST
  • All practicum sessions will run from 11:00am to 1:00pm PST

Presenter

Aman Verma  is a Data Engineer with a PhD in Epidemiology from McGill University, and an undergraduate degree in Computer Science. He has experience in developing machine learning systems with large databases, particularly for scientific data in healthcare. While he’s comfortable learning any programming language, he’s recently become particularly interested in R. Aman is currently involved in a number of projects, including measuring how following opioid prescription guidelines can decrease the risk of opioid overdose, modelling trajectories of chronic obstructive pulmonary disease, and assessing how to best prioritize ambulance calls using secondary healthcare data. 

Seminar schedule

Click on the session date to view the archived recording of the session.

Module 1: Introduction to Machine Learning

  • What is machine learning?
  • Supervised vs unsupervised learning
  • Model- and kernel-based methods
  • Measures of Accuracy (Test/train and cross-validation)
  • Causality and Accuracy
  • Unsupervised learning as feature reduction

Session 1: January 15, 2019
Session 2: January 17, 2019

Module 2: Regression and Regularization Algorithms

  • Regression with many correlated variables
  • Automatic variable selection, early approaches and problems
  • Gradient descent
  • Regularization  (L1 vs L2 vs ElasticNet)

Session 1: January 29, 2019
Session 2: January 31, 2019

Module 3: Advanced Supervised Learning 

  • Decision trees
  • Problems in overfit
  • Random Forest
  • Out-of-bag error vs cross-validation

Session 1: February 12, 2019
Session 2: February 14, 2019

Module 4: Advanced Unsupervised Learning 

  • Who uses unsupervised learning?
  • K-means
  • Expectation-maximization
  • Susceptibility to outliers
  • Dangers of labeling clusters

Session 1: February 26, 2019
Session 2: February 28, 2019

Post-webinar resources

Data Management and Cleaning for Analysis with R

This self paced free online course will provide you with an introduction to Data Management and Cleaning for Analysis using R Software. Each of the four module includes a Power Point slide deck, training data, R code and associated exercises for practice.

Topics covered include:

  • Introduction and theory of data cleaning and management
  • Getting started with R software
  • Subsetting variables and data cleaning
  • Creating variables, subset observations and data cleaning
  • Merging, joining and reshaping data

To access this resource please create a Population Data BC account here: https://my.popdata.bc.ca/accounts/register/

Once your account has been approved you will be able to access the Education and Training site and self enrol in this and other free online courses.

Did you miss the live sessions?

All webinars have been recorded and posted on the PopData's YouTube channel and linked to The Canadian Urban Environmental Health Research Consortium (CANUE) website for future reference. Once you have viewed the recorded sessions, please take a few minutes to tell us what you think. Your feedback will help us to develop future webinars.

Additional training resources for each webinar session can be accessed by enrolling in the Intro to Data Science free online course. To access this resource please create a Population Data BC account here: https://my.popdata.bc.ca/accounts/register/

Once your account has been approved you will be able to access the Education and Training site and self enrol in this and other free online courses.

We'd love to hear about your additional training interests! 

Please take a few minutes to complete our survey.


Page last revised: November 14, 2019