#concept-pamphlet #book Online textbook #todo compile alongisde other deep learning resources

summary

ch1 terms

artificial intelligence
deep learning
knowledge base
machine learning
logistic regression
naive Bayes
representation
feature
representation learning
autoencoder
encoder
decoder
factors of variation
multilayer perceptron (MLP): math function mapping some set of input values to output values. the function is formed by composing many simpler functions.
visible layers
hidden layers

ch2

tensor
transpose
matrix product
dot product
identity matrix
matrix inverse
linear combination
linear dependence, linear independence
square matrix
singular matrix
norm
euclidean norm
max norm
frobenius norm
diagonal matrix
symmetric matrix
unit vector
unit norm
orthonormal
orthogonal matrix
eigen-decomposition
eigenvector
eigenvalue
left eigenvector
decompose
positive definite, positive semidefinite, negativei definite, negativei semideinite
singular value decomposition
moore-penrose pseudoinverse
principal components analysis

ch3

3 possible sources of uncertainty:
- inherent stochastiity: in system like subatomic particle movement. chaos
- incomplete observability: game show, outcome is deterministic but contestant doesn’t know that
- incomplete modeling: not all necessary data is used or weighted correctly. like time step discretization

terms. include specific equations where you can, pulled from reference material. include examples where helpful.

degree of belief
frequentist probability
bayesian probability
random variables
- may be discrete (finite) or continuous (real number)
probability distribution
probability mass function
joint probability distribution
normalized
uniform distribution
- example of
probability density function
marginal probability distribution
sum rule
conditional probability
chain rule / product rule
independent
conditionally independent
expectation / expected value
variance
standard deviation
covariance
correlation
covariance matrix
bernoulli distribution
multinoulli/ctaegorical distribution
normal/Gaussian distribution
precision
central limit theorem
multivariate normal distribution
precision matrix
exponential distribution
laplace distribution
generalized function
empirical distribution
empirical frequency
mixture distribution
latent variable
Gaussian mixture
prior probability
posterior probability
universal approximator
logistic sigmoid
softplus function
logit
positive part
negative part
bayes rule
measure theory
measure zero
almost everywhere
Jacobian matrix
self-information
nats
bits / shannons
shannon entropy
differential entropy
Kullback-Leibler divergence
cross-entropy
structured probabilistic model / graphical model
directed models
undirected models
proportional
description

ch4

underflow
overflow
softmax function
condition number
objective / loss / error / cost function
derivative
gradient descent
critical /stationary points
local minimum
local maximum
saddle points
global minimum
partial derivatives
gradient
directional derivative
gradient descent
learning rate
line search
hill climbing
Jacobian matrix
curvature in relation to second derivatives
Hessian matrix
second derivative test
Newton’s method
first order optimization algorithms
second order optimization algorithms
lipschitz continuous
convex optimization
constrainted optimization
Karush-Kuhn-Tucker approach
generalized Lagrange function
equality constraints and inequality constraints in the context of Langrangian

what is softmax? ?

ch5

classification
classification with missing inputs
regression
transcription
machine translation
structured output
anomaly detection
synthesis and sampling
imputation of missing values
denoising
density estimation / probability mass function estimation
model accuracy: proportion of examples for which the model produces the correct output
error rate: proportion of examples for which the model produces an incorrect output.
unsupervised vs supervised learning
reinforcement learning
design matrix
linear regression - include equatinos
parameters in deep learning
weights in deep learning
bias in deep learning
mean squared error
normal equations - include examples
generalization
training error
generalization error
statstical learning theory
data generating process, data generating distribution
i.i.d. assumption: These assumptions are that the examples in each dataset are independent from each other, and that the train set and test set are identically distributed, drawn from the same probability distribution as each other. This assumption allows us to describe the data gen- erating process with a probability distribution over a single example.
underfitting
overfitting
capacity in relation to model
hypothesis space
representational capacity
effective capacity
occam’s razor
Vapnik-Chervonenkis dimension
non-parametric models
nearest neighbor regression
Bayes error
no free lunch theorem
weight decay
regularizer
regularization
hyperparameter
capacity hyperparameter / degree of the polynomial
validation set vs training set
point estimator / statistic
bias of an estimator
asymptotically unbiased
sample mean
sample variance
unbiased sample variance
variance
standard error
mean squared error
consistency
almost sure convergence
maximum likelihood estimation
statistic efficiency
parametric case
frequentist statistics
Bayesian statistics
prior probability distribution
bayesian linear regression
posterior distribution
maximum a posteriori estimation (MAP)
logistic regression
support vector machine
kernel trick
Gaussian kernel
kernel machines / kernel methods
support vectors
decision tree
principal components analysis (PCA)
stochastic gradient descent
minibatch
curse of dimensionality
local kernels
manifold
manifold learning
manifold hypothesis

cards

spaced repetition

🧚🏼‍♀️ Belinda Mo

Explorer

Deep Learning Textbook (Goodfellow 2016)

summary

ch1 terms

ch2

ch3

ch4

ch5

cards

Graph View

Table of Contents

Backlinks

🧚🏼‍♀️ Belinda Mo

Explorer

Deep Learning Textbook (Goodfellow 2016)

summary §

ch1 terms §

ch2 §

ch3 §

ch4 §

ch5 §

cards §

Graph View

Table of Contents

Backlinks

summary

ch1 terms

ch2

ch3

ch4

ch5

cards