Max Lübbering

Hi, I am Max!

Write me!

I’m a Machine Learning Researcher at Fraunhofer IAIS, focused on large-scale pretraining of multilingual large language models (LLMs). I currently work on the Eurolingua project, where we're developing a new family of European, multilingual LLMs.

I’m the creator and lead developer of Modalities, an open-source training framework designed for efficient and reproducible LLM pretraining at scale. Initially a focused engineering effort, Modalities now supports distributed training across thousands of GPUs, handles billion-parameter models with state-of-the-art techniques such as FSDP, Tensor Parallelism and Activation Checkpointing. These days, Modalities powers all Eurolingua pretrainings on Europe's largest HPC clusters.

Earlier, I completed my PhD at the University of Bonn, where I worked on uncertainty estimation (i.e., teaching neural networks awareness for what they don’t know). Before that, I received a Master’s in Computer Science (Intelligence Engineering) from Hamburg University of Technology.

LLM Research

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

Mehdi Ali*, Manuel Brack*, Max Lübbering*, Elias Wendt*, Abbas Goher Khan*, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm, Kristian Kersting

arXiv preprint arXiv:2505.22232 (under review), 2025

Architectural Proposal for Reproducible, Standardized Deep Learning Research

Max Lübbering, Vijul Shah, Moinam Chatterjee, Priya Priya, Osama Soliman, Rafet Sifa

IEEE 22nd International Conference on Software Architecture Companion, 2025

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber, Richard Rutmann, Charvi Jain, Max Lübbering, Daniel Steinigen, Johannes Leveling, Katrin Klug, Jasper Schulze Buschhoff, Lena Jurkschat, Hammam Abdelwahab, Benny Jörg Stein, Karl-Heinz Sylla, Pavel Denisov, Nicolo' Brandizzi, Qasid Saleem, Anirban Bhowmick, Lalith Manjunath, Samuel Weinbach, Carolin Penke, Oleg Filatov, Shima Asaadi, Fabio Barth, Rafet Sifa, Fabian Küch, Andreas Herten, René Jäkel, Georg Rehm, Stefan Kesselheim, Joachim Köhler, Nicolas Flores-Herr

Accepted at European Conference on Artificial Intelligence (ECAI), 2025

Tokenizer Choice for LLM Training: Negligible or Crucial?

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Buschhoff, Charvi Jain, Alexander Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr

Findings of the Association for Computational Linguistics: NAACL 2024

Modalities: A PyTorch-native framework for distributed and reproducible foundation model training.

Max Lübbering, Mehdi Ali, Felix Stollenwerk, Michael Fromm, Alexander Arno Weber, Richard Rutmnann

Github Repository (Paper in progress), 2024

Modalities

a PyTorch-native framework for distributed and reproducible Large Language Model (pre-) training at scale.

MLGym

A python framework for distributed and reproducible machine learning model training in deep learning research

DataStack

A stream-based file storage solution for machine learning datasets

DashifyML

A lightweight tool to manage and track large scale machine learning experiments

PhD Research Papers

As part of my PhD, I proposed a couple of architectural modifications to Deep Neural Networks and training objective adaptations, with the goal of teaching neural networks to better understand what they don't know (i.e., the unknown unkown).

For a complete publication list, please see my Google Scholar.

Contact Information

You can drop me a line via the form or see my email below.

maxluebbering@protonmail.com

Bonn, Germany

Say Hi!

I'd like to connect with you.

My name is

My email is

Subject

Your message