#concept-pamphlet #class #todo go through the entire CS229 lecture notes and UPDATE my definitions of various terms. please ππ» important to do before school starts
Linear regression
What is the notation for a training set?
Incudes training example with input and output features
?
Note that the superscript β(i)β is simply an index / i-th item in the training set and has nothing to do with exponentiation.
What are the benefits of squaring something like in the cost function of the ordinary least squares regression model?
?
- The result of the square is always positive
- Squaring puts more weight on larger errors/differences
- The result is always differentiable
- The result corresponds to the assumption of normally distributed errors
What is the least-squares cost function J?
?
What is learning rate, in the context of training neural nets? In gradient descent?
?
A hyperparameter that determines the size of steps taken during gradient descent. A higher learning rate might converge faster but overshoot the minimum, while a lower rate converges more slowly but reliably
In the context of gradient descent, it is Ξ±:
What is the equation for gradient descent?
?
where J is the cost function and Ξ± is learning rate.
Or this for a single training example:
What is the difference between stochastic gradient descent and batch gradient descent?
What is the likelihood function?
What is the log likelihood function?
What is the maximum likelihood function?
i donβt understand: LMS update rule / Widrow-Hoff learning rule and how that was derived
Classification and logistic regression
perceptron learning algorithm
Newtonβs method - for finding a zero of a function
hessian
fisher scoring
Generalized linear models
A class of distributions is in the exponential family if it can be written in the form: