851401 Introduction to statistical learning with R (in Eng.)


Art
Vorlesung und Übung
Semesterstunden
2
Vortragende/r (Mitwirkende/r)
Organisation
Angeboten im Semester
Wintersemester 2022/23
Unterrichts-/ Lehrsprachen
Englisch

Lehrinhalt

We are currently living in an age of (big) data (e.g. 300 hours of video are uploaded on youtube, 350000 tweets are tweeted on Twitter and about 200000 photos are uploaded on facebook - every minute; sensor systems implemented in CERN may producte 1 Petabyte of data per second). It is obvious that data without evaluation/interpretation is senseless.
This lecture deals with two important fields in the study of data - supervised and unsupervised (statistical) learning problems. In the former case one wishes to predict a certain quantity (response) on the basis of measurements of other variables (predictors); in the latter (unsupervised) case, the aim is simply to find structure (e.g. groups) in the data.

The emphysis of the lecture will be on practical applications with a minimum of (mathematical/statistical) theory. Topics:

- Introduction to statistical learning: classification versus regression, responses/predictors, supervised and unsupervised learning; literature on statistical learning and the software environment R
- Shrinkage Methods: Ridge regression, least shrinkage and selection operator (LASSO), least angle regression (LARS), elastic net
- Regression and Classification Trees
- Validation Methods & Model Selection
- Support Vector Machine
- Neural Networks

Inhaltliche Voraussetzungen (erwartete Kenntnisse)

Basic knowledge in statistics and - in an ideal case - of the statistical programming language/environment R is expected. It is recommended (but not necessary) to visit one of the introductory lectures 851309 ("Statistics with R") or 851013 ("Statistik mit R") and the advanced course 851321 ("Programmieren mit R"). A minimum level of R knowledge can be obtained in the blocked course First Steps with R (851016).

Lehrziel

Students shall be aware of the most important methods in the supervised and unsupervised learning context, their advantages and disadvantages/limitations and be able to apply these methods to real-world problems/data using the statistical programming language R.
Noch mehr Informationen zur Lehrveranstaltung, wie Termine oder Informationen zu Prüfungen, usw. finden Sie auf der Lehrveranstaltungsseite in BOKUonline.