Advanced Techniques for Analyzing Data with Udacity
Kyiv Startup Week 2016

Warning: Cannot modify header information - headers already sent by (output started at /home2/volodja/public_html/wp-content/themes/implicit/header.php:2) in /home2/volodja/public_html/wp-content/themes/implicit/functions/ajax.php on line 118


Pattern Recognition for Fun and Profit with Udacity




Open on-going enrollment


10 weeks












To succeed in this course, you must be proficient at programming in Python and basic statistics. If you need a refresher on any of these topics, you can check out these courses:

- [Intro to Computer Science]( (You should know basic data structures and control statements, and be able to write and import functions.)

- [Inferential Statistics](

- [Descriptive Statistics](

One additional course that would be nice to have is Intro to [Data Science](, as this will get you familiar with scientific problem-solving. However, completion of that class isn't required for success. We will also use a tiny bit of git, which you can also learn about on Udacity.

One thing that we don’t require is previous exposure to machine learning. If you’re a machine learning beginner, you’re in the right place.

Image Credits


In this course, you’ll learn by doing! We’ll bring machine learning to life by showing you fascinating use cases and tackling interesting real-world problems like self-driving cars. For your final project you’ll mine the email inboxes and financial data of Enron to identify persons of interest in one of the greatest corporate fraud cases in American history.

When you finish this introductory course, you’ll be able to analyze data using machine learning techniques, and you’ll also be prepared to take our Data Analyst Nanodegree. We’ll get you started on your machine learning journey by teaching you how to use helpful tools, such as pre-written algorithms and libraries, to answer interesting questions.
You’ll learn how to start with a question and/or a dataset, and use machine learning to turn them into insights.

Lessons 1-4: Supervised Classification

Naive Bayes: We jump in headfirst, learning perhaps the world’s greatest algorithm for classifying text.

Support Vector Machines (SVMs): One of the top 10 algorithms in machine learning, and a must-try for many classification tasks. What makes it special? The ability to generate new features independently and on the fly.

Decision Trees: Extremely straightforward, often just as accurate as an SVM but (usually) way faster. The launch point for more sophisticated methods, like random forests and boosting.

Lesson 5: Datasets and Questions
Behind any great machine learning project is a great dataset that the algorithm can learn from. We were inspired by a treasure trove of email and financial data from the Enron corporation, which would normally be strictly confidential but became public when the company went bankrupt in a blizzard of fraud. Follow our lead as we wrestle this dataset into a machine-learning-ready format, in anticipation of trying to predict cases of fraud.

Lesson 6 and 7: Regressions and Outliers
Regressions are some of the most widely used machine learning algorithms, and rightly share prominence with classification. What’s a fast way to make mistakes in regression, though? Have troublesome outliers in your data. We’ll tackle how to identify and clean away those pesky data points.

Lesson 8: Unsupervised Learning

K-Means Clustering: The flagship algorithm when you don’t have labeled data to work with, and a quick method for pattern-searching when approaching a dataset for the first time.

Lessons 9-12: Features, Features, Features

Feature Creation: Taking your human intuition about the world and turning it into data that a computer can use.

Feature Selection: Einstein said it best: make everything as simple as possible, and no simpler. In this case, that means identifying the most important features of your data.

Principal Component Analysis: A more sophisticated take on feature selection, and one of the crown jewels of unsupervised learning.

Feature Scaling: Simple tricks for making sure your data and your algorithm play nicely together.
Learning from Text: More information is in text than any other format, and there are some effective but simple tools for extracting that information.

Lessons 13-14: Validation and Evaluation

Training/testing data split: How do you know that what you’re doing is working? You don’t, unless you validate. The train-test split is simple to do, and the gold standard for understanding your results.

Cross-validation: Take the training/testing split and put it on steroids. Validate your machine learning results like a pro.

Precision, recall, and F1 score: After all this data-driven work, quantify your results with metrics tailored to what is most important to you.

Lesson 15: Wrapping it all Up
We take a step back and review what we’ve learned, and how it all fits together.


Mini-project at the end of each lesson

Final project: searching for signs of corporate fraud in Enron data

Users Review
Value for Money
You have rated this

Leave a Review