Geodesic regression on Riemannian manifolds with applications to Forestry and Ecology

PhD student

Alejandro Pereira

Research Outline

We aim to develop Geodesic regression models to analyze complex data arising in forestry and ecology.

The rise of complex data has push for an increasing interest in developing statistical analysis in nonstandard spaces. Among such spaces Manifolds are of particular interest since many types of data that are of interest in forestry and ecology naturally lives on a certain Riemannian manifolds.
Examples of such data include phylogenetic trees, compositional data on simplices, directional and shape.

We focus on extending the classical (Euclidean) regression model, where the response variable is a point in Euclidean space, Geodesic regression where now the response is a point in a suitable Riemannian manifold.
Geodesic models are an attractive approach as they borrow ideas from standard regression models such as intercept and slope coefficient interpretation, thus allowing for equivalent estimation and inference procedures such as gradient descent and Maximum likelihood. We begin by presenting thorough introduction to Geodesic regression modelling on Riemmanian manifolds, describe some model extensions such nonparametric, locally linear, longitudinal. We review some estimation procedures and goodness-of-fit measures, while making especial emphasis in the equivalence between the Euclidean and Manifold versions of the definitions.

Next, we showcase the Geodesic model by studying the shape of juvenile beech trees leafs (Fagus sylvatica) as continuous curves, which are elements of an infinite-dimensional Riemannian manifold also called shape space. Using the so-called elastic metric to compare shapes and compute distances we can define the mean shape, and by including covariates such as treatment effects we can define it as a Geodesic mixed-model regression. This data comes from the "EnriCo Pot Experiment".

Finally, from a more theoretical point of view, we study the score matching estimation procedure for a general class of distributions define over arbitrary surfaces.
Since manifolds in general are not Vector spaces, the standard definition of the mean, $\bar{x}= \dfrac{1}{n} \sum_{i=1}^n x_i$, is no longer valid. A suitable extension is given by the Fréchet empirical mean, defined as the minimizer of the Fréchet variance $$\bar{x} = \arg\min_{p \in \mathcal{M}} \sum_{i=1}^N d^2(p, x_i)$$, where $d$ is the an appropriate metric.
By leveraging this definition we can extend the Location-Scale random variables to take values on general manifolds.

Principal Investigator / Supervisor

Prof Thomas Kneib, Chairs of Statistics and Econometrics. Faculty of Business and Economics.