Machine Learning Scientist, Software Engineer

I'm a Principal Research SDE at Microsoft (previously Columbia, NYU, Amazon), and author of the O'Reilly book "Introduction to machine learning with Python", describing a practical approach to machine learning with python and scikit-learn. I am one of the core developers of the scikit-learn machine learning library, and I have been co-maintaining it for several years. I'm also a Software Carpentry instructor. You can find my full cv here.

My mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

I am currently part of [Gray Systems Lab at Microsoft](

You can find my previous institute website and information about the courses I was teaching at at

Introduction to Machine Learning with Python


Introduction to Machine Learning with Python provides a practial view of engineering machine learning systems in Python. The premise of the book is to enable people to learn the basics of machine learning without requiring a lot of mathematics. We therefore keep the amount of formulas to a minimum, and instead rely on code and illustrations to bring across the driving principles behind applying machine learning. We heavily focus on the use of the scikit-learn machine learning library, and give a detailed tour of its main modules and how to piece them together to a successful machine learning pipeline.

Book website Github repository with all code Buy on Amazon


scikit-learn logo


role: core developer, co-maintainer

Scikit-learn has emerged as one of the most popular open source machine learning toolkits, and is widely used in academia and industry. Scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.

Project website Github Repository

pystruct logo


role: author and maintainer

PyStruct is an easy-to-use Python library for performing structured learning and prediction. While this is a very active area of machine learning, few high-quality and easy to use tools exist. PyStruct provides a common interface for several widely used algorithms and use-cases.

Project website Github Repository

wordcloud logo


role: author and maintainer

A side-project that started as a weekend-hack, this simple word-cloud generator has found many friends in the python community. It uses a very different algorithm than the populare D3 variants inspired by wordl and allows arbitrary shapes and very dense packing of words.

Project website Github Repository

Selected Talks

advanced video series

Advanced Machine Learning with Scikit Learn: Tools and Techniques for Predictive Analytics in Python

O'Reilly Video Series. Free preview (~40min), full series is 3:45h

In this Advanced Machine Learning with scikit-learn training course, expert author Andreas Mueller will teach you how to choose and evaluate machine learning models. This course is designed for users that already have experience with Python.

You will start by learning about model complexity, overfitting and underfitting. From there, Andreas will teach you about pipelines, advanced metrics and imbalanced classes, and model selection for unsupervised learning. This video tutorial also covers dealing with categorical variables, dictionaries, and incomplete data, and how to handle text data. Finally, you will learn about out of core learning, including the sci-learn interface for out of core learning and kernel approximations for large-scale non-linear classification.

Once you have completed this computer based training course, you will have learned everything you need to know to be able to choose and evaluate machine learning models. Working files are included, allowing you to follow along with the author throughout the lessons.

O'Reilly Shop

Engineering Open Source Machine Learning Software

Data Science Summit 2016 invited talk

This talk lays out the principles guiding the design of scikit-learn, focussing on usability and maintainability. I'm also discussing some continuing challenges, like feature creep and increasing complexity, and future directions.

slides (with notes)

Automatic Machine Learning?

SciPy 2016 contributed talk

In this talk I'm discussing the why and how of automatic machine learning. I start with an explanation of the goals of automatic machine learning, and introduce meta-learning. The talk goes on to discuss recent research, available implementation and what I think we should be working on in this area.

slides (with notes)

Machine Learning with Scikit Learn 2016

SciPy 2016 Tutorial with Sebastian Raschka

The two times four hours of introductory tutorial from 2016 SciPy. Starting from what is machine learning to model building and more advanced topics. This is a somewhat updated version from last year's tutorial.

The session is split up into two videos.

Notebooks Second half

Machine Learning with Scikit Learn

SciPy 2015 Tutorial with Kyle Kastner

The two times four hours of introductory tutorial from 2015 SciPy. Starting from what is machine learning to model building and more advanced topics. The material is loosely based on the 2013 SciPy tutorial by Olivier Grisel, Gael Varoquaux and Jake Vanderplas.

The session is split up into two videos.

Notebooks Second half

Machine Learning with Scikit Learn (short)

ODSC West 2015 Introduction to scikit-learn (90min)

This talk introduction covers data representation, basic API for supervised and unsupervised learning, cross-validation, grid-search, pipelines, text processing and details about some of the most popular machine learning models. The talk concludes with remarks on scaling up computation to large datasets, and how to perform out-of-core learning with scikit-learn.

Slides Notebooks

Large-Scale Non-Linear Learning on a single CPU

PyGotham 2015 contributed talk

In the days of the "big data" buzz, many people build data driven applications on clusters from the start. However, working with distributed computing is not only pricey, but also requires a large engineering effort and removes interactivity from the data exploration process. In this talk I will demonstrate how to learn powerful nonlinear models on a single machine, even with large data sets. This can be achieved using the partial_fit interface provided by scikit-learn, that implements stochastic updates. Together with stateless transformation of the data, such as hashing, kernel approximation and random projections, these allow incrementally building a model without the need to store all the data in memory, or even on disk.



News from Scikit-Learn 0.16 and Soon-To-Be Gems for the Next Release

O'Reilly Webcast, live April 02 / 2015

Olivier Grisel and I give an overview of the 0.16 release, moderated by Ben Lorica. We also show off a couple of cool feature in the PRs, and there is a Q&A with the viewers at the end.

You need to register to see the video, but it is free.

O'Reilly Webcast

Advanced Scikit-Learn v3

PyData Amsterdam 2016

A somewhat expanded version of my PyData NYC talk "Advanced scikit-learn", the talk briefly explains the scikit-learn API and goes into some depth on pipelining and grid-searches. I describe the advantages of randomized parameter search, when it is applicable, and which metrics to use for model selection. At the end, I talk about how to do out-of-core learning.

Slides Materials


Introduction to Scikit-Learn

Strata San Jose 2015

This tutorial covers basic concepts of machine learning, such as supervised and unsupervised learning, cross validation and model selection. I talk about how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. I also go in-depth on a couple of algorithms and describe what overfitting and underfitting looks like for these.

Slides Materials

Advanced Scikit-Learn

PyData NYC 2014

A brief intoduction into scikit-learn, and the basics of pipelining and grid searches. I talk about the bias-variance tradeoff, and helper functions in scikit-learn to diagnose over-fitting and under-fitting. I also talk about randomized hyperparameter search and out-of-core learning. The last part of the talk is a brief introduction to PyStruct and Structured Prediction in general.

Slides Materials

Keynote: Commodity Machine Learning

PyData Berlin 2014

I talk about recent developments in commoditizing machine learning, and what I think needs to be done to help non-experts to apply machine-learning more easily and more effectively.



Gäel Varoquaux, Lars Buitinck, Gilles Louppe, Olivier Grisel, Fabian Pedregosa, and Andreas C. Müller:
Scikit-learn: Machine Learning Without Learning the Machinery
GetMobile: Mobile Computing and Communications, 2015.

Alexandre Abraham, Fabian Pedregosa, Michael Eickenberg, Philippe Gervais, Andreas C. Müller, Jean Kossaifi, Alexandre Gramfort, Bertrand Thirion, Gäel Varoquaux:
Machine learning for neuroimaging with scikit-learn
Frontiers in Neuroinformatics, 2014.

Andreas C. Müller:
Methods for Learning Structured Prediction in Semantic Segmentation of Natural Images
PhD Thesis. Published 2014.

Andreas C. Müller and Sven Behnke:
PyStruct - Learning Structured Prediction in Python
Journal of Machine Learning Research (JMLR), 2014.

Andreas C. Müller and Sven Behnke:
Learning Depth-Sensitive Conditional Random Fields for Semantic Segmentation of RGB-D Images
In Proceedings of IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, May 2014.

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas C. Müller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux:
API design for machine learning software: experiences from the scikit-learn project
ECML PKDD 2013 Workshop on Languages for Data Mining and Machine Learning.

Andreas Müller and Sven Behnke:
Learning a Loopy Model For Semantic Segmentation Exactly
In Proceedings of 9th International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, January 2014.

Andreas Müller, Sebastian Nowozin and Christoph H. Lampert:
Information Theoretic Clustering using Minimum Spanning Trees
DAGM-OAGM, 2012.

Andreas Müller and Sven Behnke:
Multi-Instance Methods for Partially Supervised Image Segmentation
First IAPR Workshop on Partially Supervised Learning (PSL), Ulm 2011.

Hannes Schulz, Andreas Müller, and Sven Behnke:
Exploiting Local Structure in Boltzmann Machines
Neurocomputing 74(9):1411-1417, Elsevier, April 2011.

Hannes Schulz, Andreas Müller, and Sven Behnke:
Investigating Convergence of Restricted Boltzmann Machine Learning
NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning Whistler, Canada, December 2010

Dominik Scherer, Andreas Müller, and Sven Behnke:
Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition
20th International Conference on Artificial Neural Networks (ICANN), Thessaloniki, Greece, September 2010.

Andreas Müller, Hannes Schulz, and Sven Behnke:
Topological Features in Locally Connected RBMs
in the International Joint Conference on Neural Networks (IJCNN 2010)

Hannes Schulz, Andreas Müller, and Sven Behnke:
Exploiting local structure in stacked Boltzmann machines
in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium

Pet Projects


The (notorious) cheat-sheet

A guide to picking a model in scikit-learn based on the dataset and task. It tries to give a point to start for beginners, not absolute rules, so take it with a grain of salt.

Blog post Interactive version