The following books will help you further your understanding of the material:

  • Müller, Guido: Introduction to machine learning with python (IMLP) (available for free for Columbia Students via Safari Books Online)
  • Kuhn, Johnson: Applied predictive modeling (APM) (available for free at Springer Link
  • Provost / Fawcett: Data Science for Business (DSfB)
  • Tibshibani, Hastie, Friedman: Elements of Statistical Learning (ESL)

The course will closely follow IMLP, which also comes with Python code and uses scikit-learn (as we will). APM provides goes into more detail than IMLP but only contains R code. We will not use any R code in this course. DSfB focusses on a more high-level perspective and the practical impact of data science, while ESL contains a rigorous mathematical treatment of the machine learning methods.