432064 VU Aktuelle Entwicklungen in Wirtschaft und Gesellschaft: Maschinelles Lernen für Prognosen und Kausalanalysen

Wintersemester 2023/2024 | Stand: 12.09.2023 LV auf Merkliste setzen
432064
VU Aktuelle Entwicklungen in Wirtschaft und Gesellschaft: Maschinelles Lernen für Prognosen und Kausalanalysen
VU 2
3
Block
jährlich
Englisch

The course provides an in-depth understanding of the foundations, scope and approaches of machine learning for prediction and causal analysis and it focuses on their application to problems in social sciences and economics. Starting from the basics of linear regression, which underlies many machine learning models, this course introduces students to high-dimensional predictive and causal problems. In particular, this course provides students with the basic ideas and intuition behind modern machine learning methods as well as an understanding of how, why, and when they work in practice. Students in this course will not only gain a deep understanding of the foundational aspects of machine learning, but they will also acquire the practical skills necessary for their successful applications to real-world problems.

For example, we could download historical weather data to understand what's the chance that air pollution levels will be dangerously high, given the conditions we expect there to be tomorrow. That's prediction, but that's not usually what we care about in many economic or social science applications. In many applications, we care about causal questions. We don't just care about understanding what would usually happen in a given situation, we want to understand counterfactuals, say, what would happen if we changed the system. For instance, we could ask not only whether tomorrow might have high pollution levels, but whether we can change this by, say, reducing the number of cars on the road. In other words, we are most interested in interventions we might take to improve the situation. As we will see many times in the course, a naive application of machine learning methods that are designed for prediction to causal tasks do not work well. In this course we will see that, if done right, then actually machine learning methods can be usefully adapted to causal tasks, and can be very helpful to the analyst. Thus, the goal of this course is to teach students how ideas from machine learning can be used not only for prediction but also for the study of what-if (thus, causal) questions.


The course will also help students make judgments, and develop an in-depth, critical understanding of the scope and challenges of machine learning and data-driven analytics. Throughout the entire course, students will be invited to assess the strengths and weaknesses of all different methods presented in class. The difference between prediction of observable outcomes versus causal effect estimation of unobservable parameters will be discussed to understand that just correlation is not causation.

Finally, the course will help students improve their communications skills. This course will give the students the possibility to learn how to communicate science, namely, how to present effectively their ideas, findings, proposals, analysis and critical reasoning in the area of data-driven analytics. A special emphasis will be given to oral presentations and pitches in project group works, and to writing scientific papers.

The course will cover the following topics:

  • Draw differences between Statistics, Econometrics and Machine Learning

  • Linear Regression, Assumptions, and Flexibility

  • Machine Learning Methods for Prediction

  • Non-parametric methods: CART and Random Forests

  • Parametric methods: LASSO and other regression-based methods

  • Machine Learning Methods for Causal Analysis

  • Double/Debiased Machine Learning

  • Causal Forests

  • Applications in Social Sciences and Economics (e.g. Environmental Policy Evaluation, Crime Detection in the Shadow Economy, Drug Costs in the Health Sector)

Learning skills: This course will empower students with the capability to learn several analytical tools and to apply them to real-world problems in an independent and critical way.

The content is divided into three Modules: I, II, and III.

Learning objectives:


Module I: Predictive vs. Causal Problems

  • Distinguish prediction problems from causal problems

  • Describe and justify that correlation is not causation

  • Describe why standard statistical methods, such as linear regression, fail in high dimension

  • Perform data analysis in R and apply simple predictive machine learning algorithms

  • Analyze algorithms’ outputs and compare outputs across methods


Module II: Causal Models and Machine Learning Methods

  • Describe how causal models fit into machine learning

  • Distinguish how to use machine learning for prediction vs. for causal effect estimation

  • Describe the justification behind double machine learning methods

  • Describe the justification behind causal forests

  • Perform causal effect analysis with machine learning in R

  • Analyze applications of causal problems in social science


Module III: Applied Project

  • Develop an own research project: examine and investigate a real-world question, and develop your answer in a team

  • Present your research project 

  • Describe the employed methods and dataset as well as formulate and argue your findings in a short written paper

The course consists of lectures complemented by practical sessions and group project works.  The exercises in the course will be conducted in R. Few introductory sessions into R will be provided. However, I recommend that participants familiarize themselves with the software (more details in the Prerequisites section). Please make sure that R and RStudio are installed on your laptops. To download R, go to https://www.r-project.org/, for RStudio, go to https://www.rstudio.com/products/rstudio/download/.

The final sessions will provide an in-depth understanding of how to conduct empirical research, present research ideas, and use the available datasets.

The final exam is a group project (oral presentation + short written paper)

  • 50% oral exam (early February): presentation of an original empirical research question using the methods learned in the course (groups of two or more students).

  • 50% written paper (late February): This written article (max. 10 pages) should contain the motivation, economic and econometric mechanisms, data description, and full analysis of the empirical research question.

Moreover, there will be an optional midterm homework that will allow you to boost your grade.

In the project students are required to demonstrate that:

  • they are able to perform simple though innovative research using data-driven analytics and causal learning techniques; 

  • they are able to describe critically and intuitively strengths and weaknesses of different machine learning techniques for solving concrete problems; 

  • they can apply machine learning techniques to data-driven problems in an independent way; 

  • they can communicate effectively their ideas, findings, proposals, analysis and critical reasoning. 

The overall assessment will take into account the level of knowledge and understanding of machine learning techniques acquired by the students; their capacity for thinking creatively, analytically, and critically; their capacity to design and evaluate solutions for concrete data-driven problems, making reasoned judgements about these; and their capacity to present effectively findings and conclusions.

Lecture notes, research papers and course material will be made available on the e-learning platform. Inspirational readings (recommended):


The full list of references will be provided in the course. Selected references:

No previous knowledge of machine learning is required since this is an introductory class. I expect that students have completed an undergraduate-level introduction to statistics or econometrics. The course requires basic knowledge of the linear OLS regression method. Having taken 198841 VU Data Analysis I: Data Analytics is not necessary, but obviously advantageous. Prior experience with R is not a prerequisite, however, it is certainly advantageous. 


Therefore, I recommend participants to take this course after completing 198803 VU Introduction to Programming: Programming in R course (https://orawww.uibk.ac.at/public_prod/owa/lfuonline_lv.details?lvnr_id_in=198803&sprache_in=en), or familiarizing themselves with the software using other free online tools, e.g. https://www.datacamp.com/courses/free-introduction-tor (sign up and start the free course on Introduction to R), https://swirlstats.com/


Readings: 


siehe Termine
Gruppe 0
Datum Uhrzeit Ort
Fr 06.10.2023
09.00 - 13.00 eLecture - online eLecture - online
Fr 06.10.2023
13.15 - 16.30 eLecture - online eLecture - online
Fr 20.10.2023
09.00 - 13.00 eLecture - online eLecture - online
Fr 20.10.2023
13.15 - 16.30 eLecture - online eLecture - online
Fr 10.11.2023
09.00 - 13.00 eLecture - online eLecture - online
Fr 10.11.2023
13.15 - 16.30 eLecture - online eLecture - online
Fr 12.01.2024
09.00 - 13.00 eLecture - online eLecture - online
Fr 12.01.2024
13.15 - 16.30 eLecture - online eLecture - online