Workshop: Introduction to Machine Learning

UC Davis DataLab
Spring 2024
Instructor: Nick Ulle
Maintainer: Nick Ulle <naulle@ucdavis.edu>

Reader

Overview of Statistical Machine Learning

This workshop provides an overview of contemporary machine learning methods. We'll cover important terminology and popular methods so that you can determine whether machine learning is relevant to your research and what to learn more about if it is. This is a concept-focused, non-technical workshop. No laptops needed.

After this workshop, learners should be able to:

Define the following terms: observation, feature, machine learning, supervised learning, unsupervised learning, regression, classification, clustering, training set, validation set, test set, cross-validation, overfitting, underfitting, model bias, model variance, bias-variance tradeoff, ensemble model;
Explain the difference between supervised and unsupervised learning;
Explain the difference between regression and classification;
List and briefly describe popular machine learning methods;
Give an example of an ensemble model;
Explain what cross-validation is used for and give an overview of the procedure;
Assess whether and which machine learning methods might be helpful for a given research problem.

Machine Learning in R

This two-part workshop series provides an introduction to using R for two popular machine learning techniques: clustering and classification.

Clustering involves identifying groups of similar observations (called clusters) within data. Clustering can be an effective tool for finding patterns and an important part of exploratory data analysis. Classification refers to modeling categorical variables. Classification models can provide insight into the relationship between the predictors and response, as well as a way to make predictions about new observations.

In the first session, we'll begin with the advantages and disadvantages of several popular algorithms for clustering, and work through examples of how to run clustering algorithms in R. In the second session, we'll provide an overview of popular classification models, and then delve into the details of actually using them. We'll cover how to choose a model, how to partition data into training and test sets, how to use cross-validation to tune model hyperparameters, and how to evaluate the performance of models in R. We'll also explain some strategies you can use to improve model performance. This series concludes with a brief discussion of the machine learning landscape and how you can continue to learn more about machine learning and its application it to your research.

After this workshop series, learners should be able to:

Assess whether classification or clustering are relevant to their research problems and data sets;
Explain the tradeoffs between popular clustering algorithms;
Run a clustering algorithm on their data;
Build and train a classification model on their data;
Use cross-validation to estimate accuracy and tune hyperparameters for classification models;
Identify strategies to improve results from classification models.

Contributing

The course reader is a live webpage, hosted through GitHub, where you can enter curriculum content and post it to a public-facing site for learners.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
R		R
chapters		chapters
figures		figures
images		images
.gitignore		.gitignore
README.md		README.md
_quarto.yml		_quarto.yml
index.qmd		index.qmd
references.bib		references.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

chapters

chapters

figures

figures

images

images

.gitignore

.gitignore

README.md

README.md

_quarto.yml

_quarto.yml

index.qmd

index.qmd

references.bib

references.bib

Repository files navigation

Workshop: Introduction to Machine Learning

Overview of Statistical Machine Learning

Machine Learning in R

Contributing

About

Releases

Contributors 2

Languages

ucdavisdatalab/workshop_intro_to_machine_learning

Folders and files

Latest commit

History

Repository files navigation

Workshop: Introduction to Machine Learning

Contributing

About

Resources

Stars

Watchers

Forks

Languages