851401 Introduction to statistical learning with R

Lecture and exercise
Semester hours
Lecturer (assistant)
Offered in
Wintersemester 2022/23
Languages of instruction


We are currently living in an age of (big) data (e.g. 300 hours of video are uploaded on youtube, 350000 tweets are tweeted on Twitter and about 200000 photos are uploaded on facebook - every minute; sensor systems implemented in CERN may producte 1 Petabyte of data per second). It is obvious that data without evaluation/interpretation is senseless.
This lecture deals with two important fields in the study of data - supervised and unsupervised (statistical) learning problems. In the former case one wishes to predict a certain quantity (response) on the basis of measurements of other variables (predictors); in the latter (unsupervised) case, the aim is simply to find structure (e.g. groups) in the data.

The emphysis of the lecture will be on practical applications with a minimum of (mathematical/statistical) theory. Topics:

- Introduction to statistical learning: classification versus regression, responses/predictors, supervised and unsupervised learning; literature on statistical learning and the software environment R
- Shrinkage Methods: Ridge regression, least shrinkage and selection operator (LASSO), least angle regression (LARS), elastic net
- Regression and Classification Trees
- Validation Methods & Model Selection
- Support Vector Machine
- Neural Networks

Previous knowledge expected

Basic knowledge in statistics and - in an ideal case - of the statistical programming language/environment R is expected. It is recommended (but not necessary) to visit one of the introductory lectures 851309 ("Statistics with R") or 851402 ("Statistik mit R") and the advanced course 851321 ("Programmieren mit R"). A minimum level of R knowledge can be obtained in the blocked course First Steps with R (851016).

Objective (expected results of study and acquired competences)

Students shall be aware of the most important methods in the supervised and unsupervised learning context, their advantages and disadvantages/limitations and be able to apply these methods to real-world problems/data using the statistical programming language R.
You can find more details like the schedule or information about exams on the course-page in BOKUonline.