I have always wanted to be a data scientist, the sexiest job of the 21st century. The problem is, I wasn’t a data science major, so the stuff I learned in school weren’t really relevant to the field. Luckily, massively open online courses (MOOCs) were there so I could learn things on my own.
I list here three amazing data science online courses that I’ve taken.
Online Course 1: Applied Data Science with Python
This offering from the University of Michigan is a five-course specialization on the data science pipeline. The first two courses deal with data manipulation and plotting, focusing on the most common tools for the job – Pandas and Matplotlib. The third course deals with the basics of machine learning. The fourth and fifth courses are more specialized topics – text mining and social network analysis.
As of the writing of this article, only the first two courses have been released. I’ve completed the first two courses in the specialization and currently waiting for the third to start. Thinking about it, I liked two things about this specialization. First, the instructor, Christopher Brooks, is engaging and teaches in a very clear manner. Second, the exercises and assignments are nontrivial and require deep thought. I’ve taken a lot of online courses, and one of the things I noticed is that learning assessment is done through very easy quizzes. This is not the case here. Applied Data Science with Python actually requires you to write code to do interesting tasks.
The background knowledge required for the specialization is light. You must have some prior exposure to simple programming – Brooks doesn’t really spend a lot of time with introductory Python. So it’s best to practice basic Python before taking the specialization.
Online Course 2: Machine Learning
Stanford’s Machine Learning course focuses on the standard machine learning algorithms. The instructor, Andrew Ng, introduces each algorithm, describes the mathematical representation, and teaches the implementation. This is probably the most popular MOOC on machine learning, and it’s not hard to see why. Andrew Ng is very systematic in how he teaches. He goes into the mathematical detail of each algorithm, sufficient enough so that we know what happens under the hood, but not to the point wherein only math PhDs are able to follow.
The course goes from the most basic algorithm to the most complex – starting from simple linear regression up to deep learning and neural networks. He provides very interesting applications of machine learning, such as self-driving cars and movie recommendation engines. Andrew Ng knows how to keep his students motivated.
The only downside is the usage of Octave as the implementation language, not really a very popular option for data science. However, the course doesn’t really dwell too much on data wrangling – the focus is on machine learning algorithms, so the choice of language isn’t really that big of a deal.
Before taking this course, brush up on your basic math, probably until differential calculus.
Online Course 3: Text Mining and Analytics
Text Mining and Analytics, taught by ChengXiang Zhai, is the third course of the six-course specialization, Data Mining, offered by the University of Illinois. The topics in this course are quite specialized, dealing solely with analytics applied to textual data.
The course is relatively mathematical. It focuses more on model formulation, not on implementation. The topics use a lot of probability theory, so better read up on that.
Currently, I’m auditing the course, and I think that it’s a great introduction to text analytics for the mathematically inclined. For those who are not, I would probably wait for the fourth course of Christopher Brooks in the Applied Data Science specialization. Brooks’ approach, judging from the courses I’ve taken, aims toward a more computational understanding.
The topics considered are very interesting – word association mining, topic modeling, and sentiment analysis among others – but best be prepared to pause a lot since Zhai talks quite fast.