Machine Learning

Machine Learning #

Machine Learning is a subset of Artificial Intelligence (AI) that focuses on teaching computers to learn patterns from data and make predictions or decisions without being explicitly programmed with fixed rules.

Model #

A model is the mathematical function that learns patterns from data and predicts outputs for new inputs.

It represents the relationship between inputs (x) and outputs (y).
Training shapes the model into a function (y = f_\theta(x)).
The term “model” may refer to:
- the model type (e.g., linear regression),
- the model architecture (e.g., number of layers),
- or the final trained model used for predictions.
More parameters → higher capacity → can learn more complex patterns.

Parameters #

Parameters are the internal values the model learns during training.

They define the exact behavior of the model.
Examples:
- (\theta_0, \theta_1) in linear regression
- Weights and biases in neural networks
These values are updated by the optimization algorithm.

Simply: parameters = learned values.

Hyperparameters #

Hyperparameters are settings chosen before training that control how the model learns.

They are not learned by the model.
Examples:
- Learning rate
- Number of layers
- Batch size
- Dropout rate
- Regularization strength (\lambda)
Choosing the right hyperparameters is called hyperparameter tuning.

Simply: hyperparameters = knobs that control training.

Loss Function #

The loss function measures how wrong the model’s predictions are.

Training tries to minimize this value.
It compares predictions vs. ground truth and returns a single number.
Examples:
- MSE for regression
- Cross-entropy for classification
Sometimes a simpler surrogate loss (e.g., cross-entropy) is used instead of the true task loss (e.g., accuracy).

Simply: loss tells the optimizer what direction to improve.

Optimization #

Optimization is the process of adjusting parameters to minimize the loss.

Gradient Descent and its variants (SGD, Adam, RMSProp) are the standard methods.
The optimizer updates parameters using the gradient of the loss.
The learning rate controls how big each update step is.

Simply: optimization = algorithmic process that learns the best parameters.

Underfitting / Overfitting #

These describe whether the model is too simple or too complex compared to the data.

Underfitting #

Model is too simple → cannot capture patterns.
High training error and high test error.

Overfitting #

Model is too complex → memorizes noise in training data.
Low training error but high test error.

Simply:

Underfitting = too simple
Overfitting = too complex

Bias–Variance Tradeoff #

This explains how model complexity affects errors.

Bias = error from wrong assumptions (model too simple).
Variance = error from being too sensitive to training data (model too complex).

Generalization error ≈ bias² + variance + irreducible noise.

Reducing bias → increases variance
Reducing variance → increases bias
The goal is the sweet spot between the two.

Simply: balance simplicity and flexibility for best performance.

Generalization #

Generalization is the model’s ability to perform well on unseen data.

Good training performance is not enough; test performance matters.
Generalization error = performance on new data.
ML’s primary goal is good generalization, not just memorization.

Simply: generalization = how well the model works on new data.

Regularization #

Regularization includes techniques that reduce overfitting and improve generalization by restricting model complexity.

Works by adding a penalty to the loss for large weights:
- L2 (weight decay) → penalty on squared weights
- L1 → penalty on absolute weights
Other techniques:
- Dropout
- Early stopping
- Data augmentation
Controlled by a hyperparameter like (\lambda).

Simply: regularization makes the model simpler and prevents overfitting.

Example to Understand ML #

Traditional programming:
- Rules (explicitly coded) + Data → Output
Machine Learning:
- Data + Output (examples) → Algorithm learns rules → Predict new output

✨ Example: Predicting house prices

Input: Size, Location, Number of rooms
Output: House Price
ML learns the mapping function:

\[ Price = f(Size, Location, Rooms) \]

Below is a clear, structured explanation of each ML learning paradigm you listed. These seven categories cover almost all learning types used in practice and interviews.

Supervised Learning #

Learning from a dataset that contains input–output pairs. The model learns a function:

\[ f: X \rightarrow y \]

Goal

Predict the correct label/value for new inputs.

Examples

Classification → spam/not spam
Regression → house prices
Models → Linear Regression, Random Forests, CNNs, Transformers

Key Idea

Model learns with supervision (labeled data).

Unsupervised Learning #

Learning from data that has no labels. The model tries to discover structure, patterns, or groups.

Examples

Clustering (K-Means, DBSCAN)
Dimensionality reduction (PCA, t-SNE)
Anomaly detection
Topic modeling (LDA)

Key Idea

Model learns without supervision, finding hidden structure.

Semi-Supervised Learning #

Learning from a mix of:

a small amount of labeled data, and
a large amount of unlabeled data.

Useful when labeling data is expensive.

Examples

Text classification with few labeled examples
Image labeling with only 1% labeled images
Algorithms → pseudo-labeling, consistency training

Key Idea

Use unlabeled data to improve learning accuracy.

Self-Supervised Learning #

Definition

A special case of unsupervised learning where the model creates its own labels from the data.

It learns by solving pretext tasks, such as predicting a masked part of the input.

Examples

Masked Language Modeling (BERT)
Next Sentence Prediction
Contrastive learning (SimCLR)
Masked image modeling (MAE)

Key Idea

Model generates internal supervision from the raw data. This is how modern large models (LLMs, vision transformers) are pre-trained.

Reinforcement Learning (RL)#

Learning by interacting with an environment and receiving reward or penalty.

The agent learns a policy:

\[ \pi(s) \rightarrow a \]

Goal

Maximize cumulative reward over time.

Examples

Robotics
Game playing (Chess, Go, Atari)
Recommendation systems
Autonomous vehicles

Key Idea

Learning through trial and error, not through labeled data.

Online Learning #

Model learns incrementally from streaming data, one sample (or mini-batch) at a time.

\[ \theta_{t+1} = \theta_t - \eta \nabla L(x_t) \]

Used when data arrives continuously or is too large to store.

Examples

Fraud detection (transaction by transaction)
Real-time anomaly detection
Clickstream prediction
IoT sensor data

Key Idea

Training happens continuously instead of batch training.

Transfer Learning #

Reusing a model trained on one task/data (source) and adapting it to a new, related task (target).

Two steps

Pre-train on large dataset
Fine-tune on small dataset for new task

Examples

Using ImageNet-pretrained ResNet for medical images
Using BERT/GPT embeddings for text classification
Using foundation models (LLMs, CLIP) for custom tasks

Key Idea

Reuse learned knowledge to reduce data and training cost.

Key Components of ML

Dataset → Collection of examples (features + labels).
Model → Mathematical representation that makes predictions.
Training → Process of learning patterns (adjusting model parameters).
Evaluation → Measuring performance (accuracy, error, etc.).
Prediction → Using the trained model on unseen data.

Why is ML important

Handles large, complex data humans cannot analyze manually.
Automates tasks (spam filtering, recommendation systems, fraud detection).
Improves over time as it sees more data.

Summary

Type	Uses	Key Idea
Supervised	Classification, regression	Learn from labeled data
Unsupervised	Clustering, anomalies	Discover structure without labels
Semi-Supervised	Low-label tasks	Combine labeled + unlabeled data
Self-Supervised	Pre-training LLMs, vision models	Model generates its own labels
Reinforcement Learning	Games, robotics	Learn via reward/penalty
Online Learning	Streaming data	Incremental learning
Transfer Learning	Fine-tuning pre-trained models	Reuse knowledge for new tasks

List of Machine Learning Algorithms

Category	Sub-type	Algorithms
Supervised Learning	Regression	Linear Regression, Polynomial Regression, Ridge, Lasso, Elastic Net, SVR, Decision Tree Regression, Random Forest Regression, Gradient Boosting (XGBoost, LightGBM, CatBoost), kNN Regression, Bayesian Regression, Neural Networks
	Classification	Logistic Regression, kNN, SVM, Decision Trees (CART, ID3, C4.5), Random Forest, Gradient Boosting (XGBoost, LightGBM, CatBoost), Naive Bayes (Gaussian, Multinomial, Bernoulli), Perceptron, Multi-layer Perceptrons, Ensemble Methods (Bagging, Stacking, Voting), Probabilistic Graphical Models (Bayesian Networks, CRFs)
Unsupervised Learning	Clustering	k-Means, Hierarchical Clustering, DBSCAN, OPTICS, Gaussian Mixture Models, Mean-Shift, Spectral Clustering, BIRCH, Affinity Propagation
	Dimensionality Reduction	PCA, Kernel PCA, ICA, SVD, Factor Analysis, t-SNE, UMAP, Autoencoders
	Association Rules	Apriori, Eclat, FP-Growth
	Density Estimation	KDE, Expectation-Maximization (EM), Hidden Markov Models (unsupervised setting)
Semi-Supervised Learning	—	Self-training, Co-training, Label Propagation/Spreading, Semi-supervised SVM, Graph-based methods, Semi-supervised Deep Learning (Consistency Regularization, Pseudo-labeling)
Reinforcement Learning	Value-based	Q-Learning, SARSA, Deep Q-Networks (DQN)
	Policy-based	Policy Gradient (REINFORCE), Actor–Critic (A2C, A3C), Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO)
	Model-based / Advanced	DDPG, TD3, SAC, Monte Carlo Tree Search, Multi-agent RL
Other Methods	Ensemble Methods	Bagging, Boosting (AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost), Stacking, Blending, Voting Classifier
	Probabilistic / Bayesian	Naive Bayes, Bayesian Networks, Gaussian Processes, HMMs, Markov Random Fields
	Deep Learning	Feedforward NN, CNN, RNN, LSTM, GRU, Transformers (BERT, GPT), Variational Autoencoders (VAE), Generative Adversarial Networks (GANs)

Machine Learning

Contents