Topics for Bachelor Theses, Master Theses and Lab Rotations in Statistics
This page lists different topics that can be turned into bachelor theses, master theses and lab rotations for students in applied statistics, data science, economics, etc., depending on individual qualifications. If you are interested, get in contact with the responsible person listed for the topic.
- Title: Analysis of Sea Turtle Movement with Machine Learning (PyTorch or Keras)
Short description: Animal movement has been studied through the lens of anomalous diffusion. Sea turtle movement presents a particular challenge as these trajectories can be temporally and spatially noisy due to the loss of signal when the animals dive. Previous work has shown that machine learning is robust to noise and can be used to characterize underlying diffusive models and infer the anomalous diffusion exponent of a trajectory. This project involves training an existing or new model to study the movement of sea turtles. Beyond the technical challenge of analyzing these trajectories, it would also be interesting to study the ergodicity of the movement and how findings using modern techniques compare to older research. This project would be conducted in collaboration with the Applied Mathematics Department and the Coastal Research Department of the Polytechnic University of Valencia, Spain (UPV).
This project is ideal for a Master’s student with a strong computational background, as coding is required. Above all, the student should be interested in spatial statistics and diffusion, with the intent to publish the results.
Contact: Nicolás Firbas (nicolas.firbas@uni-goettingen.de ), Thomas Kneib (tkneib@uni-goettingen.de) - Title: Pathogen and Pest Dispersal Estimation Using Fat-Tailed Dispersal Kernels
Short description: Agricultural pests such as the spotted wing fruit fly (D. suzukii) have been found to take refuge in surrounding forests, where they can escape pesticide applications and re-invade crops, causing massive economic damage. This project aims to use the integration of fat-tailed dispersal kernels to predict invasion and guide pesticide application. The theoretical results will be compared to trapping data from blueberry farms, which are monitored by Rutgers, The State University of New Jersey (USA). Additional work on dispersal kernel estimation is also possible.
This project is suitable for a Bachelor’s or Master’s thesis for a student interested in spatial statistics with applications to the agricultural sector. The student should have some coding background. Ideally, the project will use R, though another language could be used if there is a compelling reason. The objective of the project is to publish the results.
Contact: Nicolás Firbas (nicolas.firbas@uni-goettingen.de ), Thomas Kneib (tkneib@uni-goettingen.de) - Title: Expected to Benefit Sets in Distributional Regression
Short description: In many situations, the effect of a treatment is not homogeneous for the complete population of subjects under study, but rather varies heterogeneously across subjects. While often the goal of statistical investigations is to estimate the corresponding heterogeneity of treatment effects, ony may also be interested in inverting the relationship to identify those subjects which will benefit the most from a treatment, including appropriate quantification of uncertainty. This leads to so-called "expected to benefit" sets of observations. The goal of this master thesis is to implement and evaluate Bayesian approaches for the identification of expected to benefit sets for distributional regression models.
Contact: Thomas Kneib (tkneib@uni-goettingen.de) - Title: LASSO regularization and group fixed effects
Short description: Fixed effects specifications in panel data enable to control for various types of unobserved heterogeneity, but considerably inflate the number of parameters to be estimated. To overcome this problem, group fixed effects approaches aim at identifying sub-groups in the data that share the same fixed effects structure. In this thesis, regularization approaches such as the fused LASSO will be investigated with respect to their ability to identify group fixed effects in panel data.
Contact: Thomas Kneib (tkneib@uni-goettingen.de) - Title: Bayesian Quantile Regression with Errors in Variables
Short description: When covariates are measured with error, this can imply considerable bias in the estimates for the corresponding effects in a regression model. In a Bayesian setup, a statistical model can be assumed for the measurement error such that the true covariate values become part of the set of unknown parameters to be estimated. In particular, the true values can be included in a Markov chain Monte Carlo simulation algorithm. In this thesis, existing models shall be extended in at least one of two directions: (i) Implement Bayesian error correction schemes for Bayesian quantile regression or (ii) implement a flexible Dirichlet process mixture prior for the true covariate values.
Contact: Thomas Kneib (tkneib@uni-goettingen.de) - Title: Topics of Bayesian statistics in economics
Short description: We want to explore the rise of Bayesianism and its topics in the last 20 years in the field of economics. We want to distinguish topics in which Bayesian methods were used as opposed to non-Bayesian methods by looking at a large data set of articles in economic science. Therefore, we need to develop appropriate metrics that can be used in the context of machine learning algorithms.
Contact: Jens Lichter (jens.lichter@uni-goettingen.de) - Title: Entwicklung des Trinkwasserverbrauches in Göttingen
Short description: Modellieren des Trinkwasserverbrauches in Göttingen anhand Harzer Trinkwasserseen. Ziel ist es, den Trinkwasserverbrauch über das Jahr zu analysieren und dabei besonders auf extreme Ereignisse wie sehr hohe Temperaturen zu schauen.
Contact: Jens Lichter (jens.lichter@uni-goettingen.de) - Title: Lohn- und Personalstrukturanalyse im niedersächsischen Gesundheitswesen
Short description: Im Rahmen einer (Bachelor-)Abschlussarbeit sollen Daten zur Lohn- und Personalstruktur in niedersächsischen Krankenhäusern und vergleichbaren Gesundheitsbetrieben erhoben und analysiert werden.
Contact: Alexander Silbersdorff (asilbersdorff@uni-goettingen.de) - Title: What catches the learning eye
Short description: Using PyGaze the eye movement data of students watching introductory mathematics and statistics lectures should be recorded and analysed with respect to the students learning success. Contact: Alexander Silbersdorff(asilbersdorff@uni-goettingen.de) - Title: Machine learning applications for image and video analysis in livestock farming
Short description: Monitoring the behavior of animals is crucial in livestock farming. Among other things, the information collected can be used for the development of new farming methods or assistance systems. With the rise of powerful machine learning methods, there is the potential to increasingly automate monitoring tasks using tools for image and video analysis. Tasks that are of interest include automatic animal tracking and action recognition. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de) - Title: Using Random/Causal Forests to Analyse Heterogeneous Policy Effects of Chinese Environmental Policy.
Short description: Using Chinese firm-level data, the heterogeneous effects of Chinese environmental policies will be investigated. Data can be provided for the analysis.
A comprehensive literature review on Random Forests, Causal Forests and Dynamic Causal Forests (e.g. Breiman 2001; Wager & Athey 2018; Gavrilova et al. 2023) is expected. The method will then be applied to the data analysis.
Contact: Isea Cieply (isea.cieply@uni-goettingen.de) - Title: Title: The Gender Pay Gap in Germany: What factors contribute to gender inequalities in academia (in economics or STEM)?
Short description: The thesis involves a comprehensive review of existing studies on the gender pay gap in academia or in specific disciplines such as economics or STEM. After an independent selection of suitable variables and the adjustment of the given data, the data will be analysed first cross-sectionally and then as panel data using classical regression methods. The German Socio-Economic Panel (SOEP) can be used for data analysis.
Contact: Isea Cieply (isea.cieply@uni-goettingen.de) - Title: Outlier detection in time series using machine learning
Short description: Tree growth data from dentrometers usually have a very high resolution and are measure movements in micrometer scale. However, dendrometers are very sensitive and over the time they need to be reinstalled, which can cause point outliers and also whole sequences of outliers. The growth development of a tree tends to behave very similar to nearby trees of the same species at similar age. We have a data set with multiple growth data of trees in close proximity. Therefore, we want to use a multivariate time series model to detect outliers based on machine learning algorithms.
Contact: Jens Lichter (jens.lichter@uni-goettingen.de) - Uncertainty Estimation in (Medical) Image Classification
Short description: Over the last decade, neural networks have reached almost every field of science and became a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence, i.e. are badly calibrated. This thesis investigates and extends existing approaches for measuring uncertainty in Deep Neural Networks applied to (Medical) Image Classification tasks.
Contact: Michael Schlee(michael.schlee@uni-goettingen.de) - Hashtag weighted topic extraction
Short description: In recent years, social media platforms have witnessed an exponential growth in user-generated content, leading to a vast amount of information available online. Extracting relevant topics from this vast pool of data has become a crucial task for various applications, including sentiment analysis, trend detection, and opinion mining. Traditional methods for topic extraction rely on techniques such as keyword matching and statistical algorithms, which often fail to capture the dynamic nature and contextual relevance of topics. This master thesis proposes shall investigate the influence inherent metadata associated with social media content, specifically hashtags on the accuracy and relevance of topic extraction.
Contact: Michael Schlee(michael.schlee@uni-goettingen.de) - Title: Tree instance segmentation from forest point clouds using deep learning
Short description: With recent advances in laser scanning, it is possible to create three-dimensional point clouds of the surfaces in a forest. To monitor and understand changes in forest composition and structure it is often useful to segment this forest point cloud into individual trees. In this work, the aim is to built upon an existing deep-learning-based segmentation method. Possible avenues for research include self- and semi-supervised learning strategies as well as the exploration of new model architectures. This topic can be worked on during a lab rotation or as a master thesis. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de) - Title: Machine learning applications for image and video analysis in livestock farming
Short description: Monitoring the behavior of animals is crucial in livestock farming. Among other things, the information collected can be used for the development of new farming methods or assistance systems. With the rise of powerful machine learning methods, there is the potential to increasingly automate monitoring tasks using tools for image and video analysis. Tasks that are of interest include automatic animal tracking and action recognition. This topic can be worked on during a lab rotation or as a bachelor or master thesis. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de) - Title: Implement Bayesian Discrete Choice Models in Liesel
Liesel is a Python framework for efficient probabilistic programming that consists of a model-building library and a library for Markov-Chain-Monte-Carlo (MCMC) algorithms. This thesis implements functionality for setting up and sampling discrete choice models with hierarchical priors and mixtures-of-normals-priors with the Liesel framework and validates their behavior through simulations and comparisons to existing implementations in the R package `bayesm`. Since this thesis has a strong focus on programming in Python, prior programming experience in Python is recommended.
Contact: Johannes Brachem (brachem@uni-goettingen.de) - Title: Bayesian Penalized Transformation Models for Bounded Responses
Short description: Penalized Transformation Models (PTMs) are a novel form of location-scale regression. They allow researchers to place covariate models on the location and scale of a response variable, while estimating the response's conditional distribution directly from the data. Thus, they do not require the assumption of a parametric distribution, like existing location-scale regression models do. This thesis explores the application of PTMs to response variables that are bounded, meaning, for example, that the response can only take positive values or values between 0 and 1. To this end, the concept of link functions known from Generalized Additive Models is applied to PTMs. The model is implemented in Python using Jax and Liesel. Previous experience with Python is recommended.
Contact: Johannes Brachem (brachem@uni-goettingen.de) - Title: Bayesian Penalized Transformation Models with different reference distributions
Short description: Penalized Transformation Models (PTMs) are a novel form of location-scale regression. They allow researchers to place covariate models on the location and scale of a response variable, while estimating the response's conditional distribution directly from the data. This is achieved by estimating a transformation function that relates the conditional distribution of the data to a fully specified reference distribution. Notably, the reference distribution determines the tail behavior of the model. Commonly, the standard normal distribution is used as the reference distribution. This thesis explores the application of different reference distributions. The model is implemented in Python using Jax and Liesel. Previous experience with Python is recommended.
Contact: Johannes Brachem (brachem@uni-goettingen.de) - Title: (Generalized) Linear Model via Stochastic Variational Inference exploiting Sparse Matrices representation
Short description: Stochastic Variational Inference is a technique for approximating posterior distributions through optimization. The idea is to define a family of densities over the latent variables defined by a vector of variational parameters and then find the settings of the parameters that make the variational distribution close to the posterior by stochastic optimization. In this project the student will exploit sparse matrix representation of the design and precision matrix for implementing a fast Python library for (Generalized) Linear Models based on SVI.
Contact: Gianmarco Callegher (gianmarco.callegher@uni-goettingen.de - Title: Leaf shape heterogeneity analysis
Short description : Leaf shape variability is an important characteristic of plant development and health. It is driven mostly by genetic and environmental factors. Studying and modelling such variability can be used to understand and forecast the growth of the tree. Fresh leaves from juvenile beech trees grown in an experiment were harvested at the same time of the entire tree in August-September 2021. For each tree, all leaves were harvested and a sample of 60 to 120 leaves were scanned on a flatbed scanner. Thus, this dataset corresponds to the raw images of the scan. The aim of this project is to extract the shape, size, and average colour of the leaf images and estimate the Fréchet mean and variance via the elastic metric approach. Further, we study the modes of variation through Geodesic Principal Component Analysis (GPCA).
Contact: Alejandro Pereira (alejandro.pereira@uni-goettingen.de) - Title: Extending the Graphical Conditional Transformation Model
Short description: In Multivariate Data Analysis, especially in Genomics, Graphical Models are essential to understand the data in terms of the dependencies between different dimensions. A Benchmark Model is the Gaussian Graphical Model (GGM). The GGM does however only model linear relationships and hence we have developed an extension based on transformation models named the Graphical Conditional Transformation Model (GCTM) which can model complex nonlinear dependencies. The goal of this master thesis is to build upon this work and its code basis and to extend the model in one way and apply it to real world data. Possible extensions are the addition of covariates, applying a different transformation layer with more interpretable properties or to extend the models conditional independence penalty to be more robust. A data application can be done in cooperation with the Bioinformatics from the UMG to analyse Genomics data. Other applications in economics are also possible.
Keywords: Multivariate, Graphical Models, Copulas, Transformation Models, Normalising Flows
Contact: Matthias Herp (matthias.herp@bioinf.med.uni-goettingen.de)