Skip to main content

Showing 1–24 of 24 results for author: Richter, L

  1. arXiv:2407.17458  [pdf, other

    cs.LG

    EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification

    Authors: Joana Reuss, Jan Macdonald, Simon Becker, Lorenz Richter, Marco Körner

    Abstract: We introduce EuroCropsML, an analysis-ready remote sensing machine learning dataset for time series crop type classification of agricultural parcels in Europe. It is the first dataset designed to benchmark transnational few-shot crop type classification algorithms that supports advancements in algorithmic development and research comparability. It comprises 706 683 multi-class labeled data points… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

  2. arXiv:2407.07873  [pdf, other

    cs.LG math.DS math.OC math.PR stat.ML

    Dynamical Measure Transport and Neural PDE Solvers for Sampling

    Authors: Jingtong Sun, Julius Berner, Lorenz Richter, Marius Zeinhofer, Johannes Müller, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: The task of sampling from a probability density can be approached as transporting a tractable density function to the target, known as dynamical measure transport. In this work, we tackle it through a principled unified framework using deterministic or stochastic evolutions described by partial differential equations (PDEs). This framework incorporates prior trajectory-based sampling methods, such… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2406.11292  [pdf, other

    cs.RO

    Daedalus 2: Autorotation Entry, Descent and Landing Experiment on REXUS29

    Authors: Philip Bergmann, Clemens Riegler, Zuri Klaschka, Tobias Herbst, Jan M. Wolf, Maximilian Reigl, Niels Koch, Sarah Menninger, Jan von Pichowski, Cedric Bös, Bence Barthó, Frederik Dunschen, Johanna Mehringer, Ludwig Richter, Lennart Werner

    Abstract: In recent years, interplanetary exploration has gained significant momentum, leading to a focus on the development of launch vehicles. However, the critical technology of edl mechanisms has not received the same level of attention and remains less mature and capable. To address this gap, we took advantage of the REXUS program to develop a pioneering edl mechanism. We propose an alternative to conv… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  4. arXiv:2406.04940  [pdf, other

    cs.LG cs.AI

    CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

    Authors: Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

    Abstract: Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 9 content pages, 11 reference pages, 9 appendix pages

  5. arXiv:2405.03549  [pdf, other

    stat.ML cs.LG math.DS math.PR

    Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models

    Authors: Ludwig Winkler, Lorenz Richter, Manfred Opper

    Abstract: Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes give… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2403.15881  [pdf, other

    cs.LG stat.ML

    Fast and Unified Path Gradient Estimators for Normalizing Flows

    Authors: Lorenz Vaitl, Ludwig Winkler, Lorenz Richter, Pan Kessel

    Abstract: Recent work shows that path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference, resulting in improved training. However, they are often prohibitively more expensive from a computational point of view and cannot be applied to maximum likelihood training in a scalable manner, which severely hinders their widespread adoption. In thi… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  7. arXiv:2403.08763  [pdf, other

    cs.LG cs.AI cs.CL

    Simple and Scalable Strategies to Continually Pre-train Large Language Models

    Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  8. arXiv:2308.04014  [pdf, other

    cs.CL cs.LG

    Continual Pre-Training of Large Language Models: How to (re)warm your model?

    Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  9. arXiv:2307.15496  [pdf, other

    cs.LG math.NA math.PR stat.ML

    From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs

    Authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken

    Abstract: The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  10. arXiv:2307.02454  [pdf, other

    cs.LG

    Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

    Authors: Carsten Hartmann, Lorenz Richter

    Abstract: The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Un… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  11. arXiv:2307.01198  [pdf, other

    cs.LG math.OC math.PR stat.ML

    Improved sampling via learned diffusions

    Authors: Lorenz Richter, Julius Berner

    Abstract: Recently, a series of papers proposed deep learning-based approaches to sample from target distributions using controlled diffusion processes, being trained only on the unnormalized target densities without access to samples. Building on previous work, we identify these approaches as special cases of a generalized Schrödinger bridge problem, seeking a stochastic evolution between a given prior dis… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Accepted at ICLR 2024

    Journal ref: International Conference on Learning Representations, 2024

  12. arXiv:2306.00637  [pdf, other

    cs.CV

    Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

    Authors: Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

    Abstract: We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly… ▽ More

    Submitted 29 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Corresponding to "Würstchen v2"

    Journal ref: The Twelfth International Conference on Learning Representations (ICLR), 2024

  13. arXiv:2211.14487  [pdf, other

    cs.CV cs.AI cs.LG

    Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

    Authors: Mats L. Richter, Christopher Pal

    Abstract: Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experi… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  14. arXiv:2211.11183  [pdf, other

    cs.LG

    Causal Fairness Assessment of Treatment Allocation with Electronic Health Records

    Authors: Linying Zhang, Lauren R. Richter, Yixin Wang, Anna Ostropolets, Noemie Elhadad, David M. Blei, George Hripcsak

    Abstract: Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about b… ▽ More

    Submitted 7 January, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

  15. arXiv:2211.01364  [pdf, other

    cs.LG math.OC stat.ML

    An optimal control perspective on diffusion-based generative modeling

    Authors: Julius Berner, Lorenz Richter, Karen Ullrich

    Abstract: We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control t… ▽ More

    Submitted 26 March, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted for oral presentation at NeurIPS 2022 Workshop on Score-Based Methods

    Journal ref: Transactions on Machine Learning Research, 2024

  16. arXiv:2206.10588  [pdf, other

    cs.LG math.NA stat.ML

    Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning

    Authors: Lorenz Richter, Julius Berner

    Abstract: The combination of Monte Carlo methods and deep learning has recently led to efficient algorithms for solving partial differential equations (PDEs) in high dimensions. Related learning problems are often stated as variational formulations based on associated stochastic differential equations (SDEs), which allow the minimization of corresponding losses using gradient-based optimization methods. In… ▽ More

    Submitted 5 August, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 18649-18666

  17. arXiv:2106.12307  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

    Authors: Mats L. Richter, Julius Schöning, Anna Wiedenroth, Ulf Krumnack

    Abstract: When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance. The features a convolutional layer can process are strictly limited by its re… ▽ More

    Submitted 5 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Preprint

  18. arXiv:2106.09526  [pdf, other

    cs.LG cs.AI

    Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

    Authors: Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, Ulf Krumnack

    Abstract: In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given… ▽ More

    Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  19. arXiv:2102.11830  [pdf, other

    stat.ML cs.LG math.NA math.PR

    Solving high-dimensional parabolic PDEs using the tensor train format

    Authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken

    Abstract: High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulat… ▽ More

    Submitted 17 July, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

  20. Size Matters

    Authors: Mats L. Richter, Wolf Byttner, Ulf Krumnack, Ludwdig Schallner, Justin Shenk

    Abstract: Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no sim… ▽ More

    Submitted 9 February, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: Preprint

    Journal ref: Artificial Neural Networks and Machine Learning ICANN 2021 133-144

  21. arXiv:2010.10436  [pdf, other

    stat.ML cs.LG math.ST

    VarGrad: A Low-Variance Gradient Estimator for Variational Inference

    Authors: Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco J. R. Ruiz, Ömer Deniz Akyildiz

    Abstract: We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under… ▽ More

    Submitted 29 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

  22. arXiv:2006.08679  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Feature Space Saturation during Training

    Authors: Mats L. Richter, Justin Shenk, Wolf Byttner, Anders Arpteg, Mikael Huss

    Abstract: We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we… ▽ More

    Submitted 22 November, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 45 pages, 41 figures; author order changed in v5 to reflect additional contribution; for code see http://github.com/MLRichter/phd-lab and http://github.com/delve-team/delve

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: British Machine Vision Conference (BMVC) 2021

  23. arXiv:2005.05409  [pdf, other

    math.OC cs.LG math.NA math.PR stat.ML

    Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space

    Authors: Nikolas Nüsken, Lorenz Richter

    Abstract: Optimal control of diffusion processes is intimately connected to the problem of solving certain Hamilton-Jacobi-Bellman equations. Building on recent machine learning inspired approaches towards high-dimensional PDEs, we investigate the potential of $\textit{iterative diffusion optimisation}$ techniques, in particular considering applications in importance sampling and rare event simulation, and… ▽ More

    Submitted 29 January, 2023; v1 submitted 11 May, 2020; originally announced May 2020.

  24. arXiv:1907.08589  [pdf, other

    cs.LG stat.ML

    Spectral Analysis of Latent Representations

    Authors: Justin Shenk, Mats L. Richter, Anders Arpteg, Mikael Huss

    Abstract: We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook fo… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: 13 pages, 16 figures, code: https://github.com/delve-team/delve