Machine Learning

More and more legal text is available in digital form. This not only holds for scholarly work, but also for statutes, court rulings, or administrative documents. This creates an opportunity for scaling up traditional legal analysis, and for qualitatively new forms of legal scholarship. At the same time, the availability of high-powered, networked computing changes legal practice. While the delegation of legal decision-making to computers still sounds futuristic, machine decision aids for human legal actors gain prominence, for instance in the area of predictive policing.

All these developments require more than the availability of big legal datasets. The data must be analysed with the help of techniques known as machine learning. This condensed course explains the logic of the most characteristic techniques, and shows hands on how they can be implemented. Prior knowledge in computer science, or machine learning more specifically, is not required. But participants should have an understanding of (at least elementary) statistics. For the hands on part, the course will use R, a free statistical software. Participants should have it installed on their computers. Most users of R find it convenient to access the programme through RStudio, which is also free.

There is one limitation. Most data a legal scholar may want to use must first be translated from words into numbers. This translation exercise is both theoretically and practically challenging. The course will not cover this part of the research design,  commonly called natural language processing. It does so on the premise that a researcher must first understand what she can achieve with her data before she goes out and creates her dataset (or lays her hand on one of the earlier attempts that have already made the translation).

There are 5 course units:

1. motivation; explanation vs. prediction; bias variance tradeoff; validation
2. regression extensions: dimension reduction; semiparametric regression
3. pure prediction: nearest neighbour methods; trees; naive Bayes; support vector machines; neural networks
4. unsupervised learning: principal components; k-means; hierarchical clustering; association learning
5. outlier detection; applications

3 Febraury 2021: 11.00 – 13.00 / 14.30 – 16.30
4 Febraury 2021: 11.00 – 13.00 / 14.30 – 16.30
5 Febraury 2021: 11.00 – 13.00

Taught By: Professor Dr Dr Christoph Engel (Director)

More courses