851401 Introduction to statistical learning with R (in Eng.)

Vorlesung und Übung
Vortragende/r (Mitwirkende/r)
Melcher, Michael
Angeboten im Semester
Wintersemester 2020/21
Unterrichts-/ Lehrsprachen


We are currently living in an age of (big) data (e.g. 300 hours of video are uploaded on youtube, 350000 tweets are tweeted on Twitter and about 200000 photos are uploaded on facebook - every minute; sensor systems implemented in CERN may producte 1 Petabyte of data per second). It is obvious that data without evaluation/interpretation is senseless.
This lecture deals with two important fields in the study of data - supervised and unsupervised (statistical) learning problems. In the former case one wishes to predict a certain quantity (response) on the basis of measurements of other variables (predictors); in the latter (unsupervised) case, the aim is simply to find structure (e.g. groups) in the data.

The emphysis of the lecture will be on practical applications with a minimum of (mathematical/statistical) theory. It will be organized as follows:

1. Introduction to statistical learning: classification versus regression, responses/predictors, supervised and unsupervised learning; literature on statistical learning and the software environment R
2. (A very short) introduction to R - installation, usage with the IDE RStudio, data types, functions, graphics, statistical distributions, data import and export
3. Linear Models: simple and multiple linear regression, regression diagnostics, methods of variable selection; factors or polynomials as regressors, robust regression.
4. Shrinkage Methods: Ridge regression, least shrinkage and selection operator (LASSO), least angle regression (LARS), elastic net
5. Classification: Logistic regression, linear and quadratic discriminant analysis (LDA/QDA).
6. Regression and Classification Trees
7. Unsupervised Learning: Principal Component Analysis (PCA), Hierarchical and K-Means Clustering
8. Validation Methods & Model Selection

Inhaltliche Voraussetzungen (erwartete Kenntnisse)

Basic knowledge in statistics and - in an ideal case - of the statistical programming language/environment R is expected. It is recommended (but not necessary) to visit one of the introductory lectures 851309 ("Statistics with R") or 851013 ("Statistik mit R") and the advanced course 851321 ("Programmieren mit R").


Students shall be aware of the most important methods in the supervised and unsupervised learning context, their advantages and disadvantages/limitations and be able to apply these methods to real-world problems/data using the statistical programming language R.
Noch mehr Informationen zur Lehrveranstaltung, wie Termine oder Informationen zu Prüfungen, usw. finden Sie auf der Lehrveranstaltungsseite in BOKUonline.