Machine Learning

Machine Learning #

ML Basics

Supervised – Regression

Supervised – Classification

Unsupervised – Clustering

Unsupervised – Dimensionality Reduction

NLP

Operator / Function	Definition	Usage / Intuition	Example
\(\min_x f(x)\)	Minimum value of a function	Find smallest value of objective	\( \min_x (x-3)^2 = 0 \)
\( \max_x f(x) \)	Maximum value of a function	Find largest value of objective	\( \max_x -(x-3)^2 = 0 \)
\( \arg\min_x f(x) \)	Input where function is minimized	Optimization to find best parameters	\( \arg\min_x (x-3)^2 = 3 \)
\( \arg\max_x f(x) \)	Input where function is maximized	Find best parameter location	\( \arg\max_x -(x-3)^2 = 3 \)
\( \frac{d}{dx} f(x) \)	Derivative w.r.t scalar	Slope / rate of change	\( \frac{d}{dx} (x^2) = 2x \)
\( \frac{\partial f}{\partial x_i} \)	Partial derivative	Multivariate rate of change	\( \frac{\partial}{\partial x} (x^2 + y^2) = 2x \)
\( \nabla f(x) \)	Gradient vector	Direction of steepest ascent	\( \nabla (x^2 + y^2) = [2x,2y] \)
\( \theta_{t+1} = \theta_t - \eta \nabla_\theta L \)	Gradient Descent update	Iteratively minimize loss	Linear regression update
\( \langle u, v \rangle \)	Dot product / inner product	Similarity / projection	\( \langle [1,2],[3,4] \rangle = 11 \)
\( \|x\|_2 \)	L2 norm (Euclidean)	Magnitude of vector	\( \|[3,4]\|_2 = 5 \)
\( \|x\|_1 \)	L1 norm (Manhattan)	Sum of absolute values	\( \|[3,-4]\|_1 = 7 \)
\( A^\top \)	Matrix transpose	Switch rows ↔ columns	\( [[1,2],[3,4]]^\top = [[1,3],[2,4]] \)
\( \text{Tr}(A) \)	Trace of a matrix	Sum of diagonal	\( \text{Tr}([[1,2],[3,4]]) = 5 \)
\( \det(A) \)	Determinant	Scaling factor of matrix	\( \det([[1,2],[3,4]])=-2 \)
\( \mathbb{E}[X] \)	Expectation / mean	Average value	\( \mathbb{E}[X] = \sum x_i P(x_i) \)
\( \text{Var}(X) \)	Variance	Spread of X	\( \text{Var}([1,2,3]) = 2/3 \)
\( \text{Cov}(X,Y) \)	Covariance	Measure of correlation	\( \text{Cov}(X,Y) = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] \)
\( \mathbb{P}(A) \)	Probability	Chance of event	\( \mathbb{P}(X>0) \)
\( L(y,\hat{y}) \)	Loss function	Measures prediction error	`MSE, Cross-Entropy`
\( r_{im} = - \frac{\partial L}{\partial F(x_i)} \)	Pseudo-residuals (Boosting)	Direction to reduce loss	Gradient Boosting step
\( F_m = F_{m-1} + \nu \gamma_m h_m(x) \)	Boosted model update	Add tree’s contribution	Gradient Boosting
\( \text{sign}(x) \)	Sign function	Direction of number	\( \text{sign}(-5)=-1 \)
\( \mathbf{1}_{\{\text{condition}\}} \)	Indicator function	1 if true, 0 if false	\( \mathbf{1}_{x>0} \)
\( \sigma(x) \)	Sigmoid function	Map to probability [0,1]	\( \sigma(0)=0.5 \)
\( \text{softmax}(z_i) \)	Softmax function	Multi-class probability	\( \text{softmax}([1,2,3])_i \)
\( \text{ReLU}(x) \)	Rectified Linear Unit	Nonlinear activation	\( \text{ReLU}(-2)=0, \text{ReLU}(3)=3 \)
\( \hat{y} = F_M(x) \)	Regression prediction	Final model output	Gradient Boosting Regressor
\( \hat{y} = \mathbf{1}[\sigma(F_M(x))>0.5] \)	Binary classification prediction	Threshold probability	Gradient Boosting Classifier
\( \hat{y}_i = \text{softmax}(F_M(x))_i \)	Multi-class classification	Probability per class	Gradient Boosting Multi-class

Machine Learning

Machine Learning#

Machine Learning #