Machine Learning#

Machine Learning is a subset of Artificial Intelligence (AI) that focuses on teaching computers to learn patterns from data and make predictions or decisions without being explicitly programmed with fixed rules.

Model#

A model is the mathematical function that learns patterns from data and predicts outputs for new inputs.

  • It represents the relationship between inputs (x) and outputs (y).

  • Training shapes the model into a function (y = f_\theta(x)).

  • The term “model” may refer to:

    • the model type (e.g., linear regression),

    • the model architecture (e.g., number of layers),

    • or the final trained model used for predictions.

  • More parameters → higher capacity → can learn more complex patterns.


Parameters#

Parameters are the internal values the model learns during training.

  • They define the exact behavior of the model.

  • Examples:

    • (\theta_0, \theta_1) in linear regression

    • Weights and biases in neural networks

  • These values are updated by the optimization algorithm.

Simply: parameters = learned values.


Hyperparameters#

Hyperparameters are settings chosen before training that control how the model learns.

  • They are not learned by the model.

  • Examples:

    • Learning rate

    • Number of layers

    • Batch size

    • Dropout rate

    • Regularization strength (\lambda)

  • Choosing the right hyperparameters is called hyperparameter tuning.

Simply: hyperparameters = knobs that control training.


Loss Function#

The loss function measures how wrong the model’s predictions are.

  • Training tries to minimize this value.

  • It compares predictions vs. ground truth and returns a single number.

  • Examples:

    • MSE for regression

    • Cross-entropy for classification

  • Sometimes a simpler surrogate loss (e.g., cross-entropy) is used instead of the true task loss (e.g., accuracy).

Simply: loss tells the optimizer what direction to improve.


Optimization#

Optimization is the process of adjusting parameters to minimize the loss.

  • Gradient Descent and its variants (SGD, Adam, RMSProp) are the standard methods.

  • The optimizer updates parameters using the gradient of the loss.

  • The learning rate controls how big each update step is.

Simply: optimization = algorithmic process that learns the best parameters.


Underfitting / Overfitting#

These describe whether the model is too simple or too complex compared to the data.

Underfitting#

  • Model is too simple → cannot capture patterns.

  • High training error and high test error.

Overfitting#

  • Model is too complex → memorizes noise in training data.

  • Low training error but high test error.

Simply:

  • Underfitting = too simple

  • Overfitting = too complex


Bias–Variance Tradeoff#

This explains how model complexity affects errors.

  • Bias = error from wrong assumptions (model too simple).

  • Variance = error from being too sensitive to training data (model too complex).

Generalization error ≈ bias² + variance + irreducible noise.

  • Reducing bias → increases variance

  • Reducing variance → increases bias

  • The goal is the sweet spot between the two.

Simply: balance simplicity and flexibility for best performance.


Generalization#

Generalization is the model’s ability to perform well on unseen data.

  • Good training performance is not enough; test performance matters.

  • Generalization error = performance on new data.

  • ML’s primary goal is good generalization, not just memorization.

Simply: generalization = how well the model works on new data.


Regularization#

Regularization includes techniques that reduce overfitting and improve generalization by restricting model complexity.

  • Works by adding a penalty to the loss for large weights:

    • L2 (weight decay) → penalty on squared weights

    • L1 → penalty on absolute weights

  • Other techniques:

    • Dropout

    • Early stopping

    • Data augmentation

  • Controlled by a hyperparameter like (\lambda).

Simply: regularization makes the model simpler and prevents overfitting.



Example to Understand ML#

  • Traditional programming:

    • Rules (explicitly coded) + Data → Output

  • Machine Learning:

    • Data + Output (examples) → Algorithm learns rules → Predict new output

✨ Example: Predicting house prices

  • Input: Size, Location, Number of rooms

  • Output: House Price

  • ML learns the mapping function:

    \[ Price = f(Size, Location, Rooms) \]

Below is a clear, structured explanation of each ML learning paradigm you listed. These seven categories cover almost all learning types used in practice and interviews.


Supervised Learning#

Learning from a dataset that contains input–output pairs. The model learns a function:

\[ f: X \rightarrow y \]

Goal

Predict the correct label/value for new inputs.

Examples

  • Classification → spam/not spam

  • Regression → house prices

  • Models → Linear Regression, Random Forests, CNNs, Transformers

Key Idea

Model learns with supervision (labeled data).


Unsupervised Learning#

Learning from data that has no labels. The model tries to discover structure, patterns, or groups.

Examples

  • Clustering (K-Means, DBSCAN)

  • Dimensionality reduction (PCA, t-SNE)

  • Anomaly detection

  • Topic modeling (LDA)

Key Idea

Model learns without supervision, finding hidden structure.


Semi-Supervised Learning#

Learning from a mix of:

  • a small amount of labeled data, and

  • a large amount of unlabeled data.

Useful when labeling data is expensive.

Examples

  • Text classification with few labeled examples

  • Image labeling with only 1% labeled images

  • Algorithms → pseudo-labeling, consistency training

Key Idea

Use unlabeled data to improve learning accuracy.


Self-Supervised Learning#

Definition

A special case of unsupervised learning where the model creates its own labels from the data.

It learns by solving pretext tasks, such as predicting a masked part of the input.

Examples

  • Masked Language Modeling (BERT)

  • Next Sentence Prediction

  • Contrastive learning (SimCLR)

  • Masked image modeling (MAE)

Key Idea

Model generates internal supervision from the raw data. This is how modern large models (LLMs, vision transformers) are pre-trained.


Reinforcement Learning (RL)#

Learning by interacting with an environment and receiving reward or penalty.

The agent learns a policy:

\[ \pi(s) \rightarrow a \]

Goal

Maximize cumulative reward over time.

Examples

  • Robotics

  • Game playing (Chess, Go, Atari)

  • Recommendation systems

  • Autonomous vehicles

Key Idea

Learning through trial and error, not through labeled data.


Online Learning#

Model learns incrementally from streaming data, one sample (or mini-batch) at a time.

\[ \theta_{t+1} = \theta_t - \eta \nabla L(x_t) \]

Used when data arrives continuously or is too large to store.

Examples

  • Fraud detection (transaction by transaction)

  • Real-time anomaly detection

  • Clickstream prediction

  • IoT sensor data

Key Idea

Training happens continuously instead of batch training.


Transfer Learning#

Reusing a model trained on one task/data (source) and adapting it to a new, related task (target).

Two steps

  1. Pre-train on large dataset

  2. Fine-tune on small dataset for new task

Examples

  • Using ImageNet-pretrained ResNet for medical images

  • Using BERT/GPT embeddings for text classification

  • Using foundation models (LLMs, CLIP) for custom tasks

Key Idea

Reuse learned knowledge to reduce data and training cost.


Key Components of ML

  1. Dataset → Collection of examples (features + labels).

  2. Model → Mathematical representation that makes predictions.

  3. Training → Process of learning patterns (adjusting model parameters).

  4. Evaluation → Measuring performance (accuracy, error, etc.).

  5. Prediction → Using the trained model on unseen data.


Why is ML important

  • Handles large, complex data humans cannot analyze manually.

  • Automates tasks (spam filtering, recommendation systems, fraud detection).

  • Improves over time as it sees more data.

Summary

Type

Uses

Key Idea

Supervised

Classification, regression

Learn from labeled data

Unsupervised

Clustering, anomalies

Discover structure without labels

Semi-Supervised

Low-label tasks

Combine labeled + unlabeled data

Self-Supervised

Pre-training LLMs, vision models

Model generates its own labels

Reinforcement Learning

Games, robotics

Learn via reward/penalty

Online Learning

Streaming data

Incremental learning

Transfer Learning

Fine-tuning pre-trained models

Reuse knowledge for new tasks

List of Machine Learning Algorithms

Category

Sub-type

Algorithms

Supervised Learning

Regression

Linear Regression, Polynomial Regression, Ridge, Lasso, Elastic Net, SVR, Decision Tree Regression, Random Forest Regression, Gradient Boosting (XGBoost, LightGBM, CatBoost), kNN Regression, Bayesian Regression, Neural Networks

Classification

Logistic Regression, kNN, SVM, Decision Trees (CART, ID3, C4.5), Random Forest, Gradient Boosting (XGBoost, LightGBM, CatBoost), Naive Bayes (Gaussian, Multinomial, Bernoulli), Perceptron, Multi-layer Perceptrons, Ensemble Methods (Bagging, Stacking, Voting), Probabilistic Graphical Models (Bayesian Networks, CRFs)

Unsupervised Learning

Clustering

k-Means, Hierarchical Clustering, DBSCAN, OPTICS, Gaussian Mixture Models, Mean-Shift, Spectral Clustering, BIRCH, Affinity Propagation

Dimensionality Reduction

PCA, Kernel PCA, ICA, SVD, Factor Analysis, t-SNE, UMAP, Autoencoders

Association Rules

Apriori, Eclat, FP-Growth

Density Estimation

KDE, Expectation-Maximization (EM), Hidden Markov Models (unsupervised setting)

Semi-Supervised Learning

Self-training, Co-training, Label Propagation/Spreading, Semi-supervised SVM, Graph-based methods, Semi-supervised Deep Learning (Consistency Regularization, Pseudo-labeling)

Reinforcement Learning

Value-based

Q-Learning, SARSA, Deep Q-Networks (DQN)

Policy-based

Policy Gradient (REINFORCE), Actor–Critic (A2C, A3C), Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO)

Model-based / Advanced

DDPG, TD3, SAC, Monte Carlo Tree Search, Multi-agent RL

Other Methods

Ensemble Methods

Bagging, Boosting (AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost), Stacking, Blending, Voting Classifier

Probabilistic / Bayesian

Naive Bayes, Bayesian Networks, Gaussian Processes, HMMs, Markov Random Fields

Deep Learning

Feedforward NN, CNN, RNN, LSTM, GRU, Transformers (BERT, GPT), Variational Autoencoders (VAE), Generative Adversarial Networks (GANs)