Presentation of Andrew's weeks courses
Week 1 : Welcome
Welcome to Machine Learning! This week, we introduce the core idea of teaching a computer to learn concepts using data—without being explicitly programmed.
We are going to start by covering linear regression with one variable. Linear regression predicts a real-valued output based on an input value. We discuss the application of linear regression to housing price prediction, present the notion of a cost function, and introduce the gradient descent method for learning.
- To Begin with :
Welcome to Machine Learning (1min)
In this Episode, the Great Andrew Ng present us the basics of Machine Learning and explain us what we're going to learn during his next courses.
Welcome (7min)
Machine learning is the science of getting computers to act without being explicitly programmed. Many researchers also think it is the best way to make progress towards human-level AI. Finally, Andrew Ng will you about some of Silicon Valley's best practices in innovation as it pertains to machine learning and AI.
- Introduction :
What is Machine Learning ? (7min)
In this Episode, Andrew Ng define us what Machine Learning, he uses the definition of Arthur Samuel & Tom Mitchell and explain it throug the end. He dinstinguishes differents categories in Machine Learnin algorithm, the Supervised Learning, Unsupervised Learning and the Reinforcement Learning.
Supervised Learning (12min)
In this video, Andrew Ng present us the differences between Supervised Learning Algrorithm and Unsupervised Learning Algrorithm. He put a finger on Supervised Learning Algorithm with differents exemples.
Unsupervised Learning (14min)
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.
- Model and Cost Function :
Model Representation (8min)
To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.
Cost Function (8min)
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference of all the results of the hypothesis with inputs from x's and the actual output y's.
Cost Function - Intuition I (8min)
Learn how to fit the data ! The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least.
Cost Function - Intuition II (11min)
Introduction to contour plot ; this is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.
- Linear Algebra :
Matrices and Vectors (11min)
Let's get started with our linear algebra review. In this video I want to tell you what are matrices and what are vectors.
Addition and Scalar Multiplication (12min)
In this video we'll talk about matrix addition and subtraction, as well as how to multiply a matrix by a number, also called Scalar Multiplication.
Matrix Vector Multiplication (11min)
Here, we will be talking about how to multiply together two matrices. We'll start with a special case of that, of matrix vector multiplication - multiplying a matrix together with a vector.
Matrix Matrix Multiplication (13min)
In this video we'll talk about matrix-matrix multiplication, or how to multiply two matrices together. When we talk about the method in linear regression for how to solve for the parameters theta 0 and theta 1 all in one shot, without needing an iterative algorithm like gradient descent.
Matrix Multiplication Properties (9min)
Matrix multiplication is really useful, since you can pack a lot of computation into just one matrix multiplication operation. But you should be careful of how you use them. In this video, you will learn about few properties of matrix multiplication.
Inverse and Transpose (10min)
In this video, we are going to talk about a couple of special matrix operations, called the matrix inverse and the matrix transpose operation.
Week 2 : Linear Regression with Multiple Variables
Welcome to week 2 ! I hope everyone has been enjoying the course and learning a lot! This week we’re covering linear regression with multiple variables. we’ll show how linear regression can be extended to accommodate multiple input features. We also discuss best practices for implementing linear regression.
- Multivariate Linear Regression :
Multiples Features (8min)
Linear regression with multiple variables is also known as "multivariate linear regression".
We now introduce notation for equations where we can have any number of input variables.
- Gradient Descent :
Gradient Descent for Multiple Variables (5min)
In this Episode, Andrew Ng explain us how to properly initialize a Gradient Descent Algorithm with Multiples varialbes.
Gradient Descent in Practice I - Features Scaling (8min)
We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.
Gradient Descent in Practice II - Learning Rate (8min)
To Debug gradient descent, make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.
Features and Polynomial Regression (7min)
We can improve our features and the form of our hypothesis function in a couple different ways.
We can combine multiple features into one. For example, we can combine x1 and x2 into a new features x3 by taking x3 = x1 * x2. Our hypothesis function need not be linear if that does not fit the data well.
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).
- Computing Parameters Analytically :
Normal Equation (16min)
In this Episode, Andrew Ng define us what Machine Learning, he uses the definition of Arthur Samuel & Tom Mitchell and explain it throug the end. He dinstinguishes differents categories in Machine Learnin algorithm, the Supervised Learning, Unsupervised Learning and the Reinforcement Learning.
Normal Equation Noninvertibility (5min)
In this video, Andrew Ng present us the differences between Supervised Learning Algrorithm and Unsupervised Learning Algrorithm. He put a finger on Supervised Learning Algorithm with differents exemples.
Week 3 : Logistic Regression
Welcome to week 3! This week, we’ll be covering logistic regression. Logistic regression is a method for classifying data into discrete outcomes. For example, we might use logistic regression to classify an email as spam or not spam. In this module, we introduce the notion of classification, the cost function for logistic regression, and the application of logistic regression to multi-class classification.
- Classification and Representation :
Classification (8min)
To attempt classification, one method is to use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0. However, this method doesn't work well because classification is not actually a linear function.
Hypothesis Representation (7min)
We could approach the classification problem ignoring the fact that y is discrete-valued, and use our old linear regression algorithm to try to predict y given x. However, it is easy to construct examples where this method performs very poorly.
Decision Boundary (14min)
The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.
- Logistic Regression Model :
Cost Function (10min)
We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.
Simplified Cost Function and Gradient Descent (12min)
Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.
Advanced Optimization (14min)
"Conjugate gradient", "BFGS", and "L-BFGS" are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they're already tested and highly optimized. Octave provides them.
- Multiclass Classification :
Multiclass Classification: One-vs-all (6min)
Now we will approach the classification of data when we have more than two categories. Instead of y = {0,1} we will expand our definition so that y = {0,1...n}.
- Solving the Problem of Overfitting :
The Problem of Overfitting (9min)
This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting: 1. Reduce the number of features, 2. Regularization.
Cost Function (10min)
If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.
Regularized Linear Regression (10min)
We can apply regularization to both linear regression and logistic regression. That's what we will detail later on.
Regularized Logistic Regression (8min)
We can regularize logistic regression in a similar way that we regularize linear regression. As a result, we can avoid overfitting.