Machine Learning#
Machine Learning is a subset of Artificial Intelligence (AI) that focuses on teaching computers to learn patterns from data and make predictions or decisions without being explicitly programmed with fixed rules.
Model#
A model is the mathematical function that learns patterns from data and predicts outputs for new inputs.
It represents the relationship between inputs (x) and outputs (y).
Training shapes the model into a function (y = f_\theta(x)).
The term “model” may refer to:
the model type (e.g., linear regression),
the model architecture (e.g., number of layers),
or the final trained model used for predictions.
More parameters → higher capacity → can learn more complex patterns.
Parameters#
Parameters are the internal values the model learns during training.
They define the exact behavior of the model.
Examples:
(\theta_0, \theta_1) in linear regression
Weights and biases in neural networks
These values are updated by the optimization algorithm.
Simply: parameters = learned values.
Hyperparameters#
Hyperparameters are settings chosen before training that control how the model learns.
They are not learned by the model.
Examples:
Learning rate
Number of layers
Batch size
Dropout rate
Regularization strength (\lambda)
Choosing the right hyperparameters is called hyperparameter tuning.
Simply: hyperparameters = knobs that control training.
Loss Function#
The loss function measures how wrong the model’s predictions are.
Training tries to minimize this value.
It compares predictions vs. ground truth and returns a single number.
Examples:
MSE for regression
Cross-entropy for classification
Sometimes a simpler surrogate loss (e.g., cross-entropy) is used instead of the true task loss (e.g., accuracy).
Simply: loss tells the optimizer what direction to improve.
Optimization#
Optimization is the process of adjusting parameters to minimize the loss.
Gradient Descent and its variants (SGD, Adam, RMSProp) are the standard methods.
The optimizer updates parameters using the gradient of the loss.
The learning rate controls how big each update step is.
Simply: optimization = algorithmic process that learns the best parameters.
Underfitting / Overfitting#
These describe whether the model is too simple or too complex compared to the data.
Underfitting#
Model is too simple → cannot capture patterns.
High training error and high test error.
Overfitting#
Model is too complex → memorizes noise in training data.
Low training error but high test error.
Simply:
Underfitting = too simple
Overfitting = too complex
Bias–Variance Tradeoff#
This explains how model complexity affects errors.
Bias = error from wrong assumptions (model too simple).
Variance = error from being too sensitive to training data (model too complex).
Generalization error ≈ bias² + variance + irreducible noise.
Reducing bias → increases variance
Reducing variance → increases bias
The goal is the sweet spot between the two.
Simply: balance simplicity and flexibility for best performance.
Generalization#
Generalization is the model’s ability to perform well on unseen data.
Good training performance is not enough; test performance matters.
Generalization error = performance on new data.
ML’s primary goal is good generalization, not just memorization.
Simply: generalization = how well the model works on new data.
Regularization#
Regularization includes techniques that reduce overfitting and improve generalization by restricting model complexity.
Works by adding a penalty to the loss for large weights:
L2 (weight decay) → penalty on squared weights
L1 → penalty on absolute weights
Other techniques:
Dropout
Early stopping
Data augmentation
Controlled by a hyperparameter like (\lambda).
Simply: regularization makes the model simpler and prevents overfitting.
Example to Understand ML#
Traditional programming:
Rules (explicitly coded) + Data → Output
Machine Learning:
Data + Output (examples) → Algorithm learns rules → Predict new output
✨ Example: Predicting house prices
Input: Size, Location, Number of rooms
Output: House Price
ML learns the mapping function:
\[ Price = f(Size, Location, Rooms) \]
Below is a clear, structured explanation of each ML learning paradigm you listed. These seven categories cover almost all learning types used in practice and interviews.
Supervised Learning#
Learning from a dataset that contains input–output pairs. The model learns a function:
Goal
Predict the correct label/value for new inputs.
Examples
Classification → spam/not spam
Regression → house prices
Models → Linear Regression, Random Forests, CNNs, Transformers
Key Idea
Model learns with supervision (labeled data).
Unsupervised Learning#
Learning from data that has no labels. The model tries to discover structure, patterns, or groups.
Examples
Clustering (K-Means, DBSCAN)
Dimensionality reduction (PCA, t-SNE)
Anomaly detection
Topic modeling (LDA)
Key Idea
Model learns without supervision, finding hidden structure.
Semi-Supervised Learning#
Learning from a mix of:
a small amount of labeled data, and
a large amount of unlabeled data.
Useful when labeling data is expensive.
Examples
Text classification with few labeled examples
Image labeling with only 1% labeled images
Algorithms → pseudo-labeling, consistency training
Key Idea
Use unlabeled data to improve learning accuracy.
Self-Supervised Learning#
Definition
A special case of unsupervised learning where the model creates its own labels from the data.
It learns by solving pretext tasks, such as predicting a masked part of the input.
Examples
Masked Language Modeling (BERT)
Next Sentence Prediction
Contrastive learning (SimCLR)
Masked image modeling (MAE)
Key Idea
Model generates internal supervision from the raw data. This is how modern large models (LLMs, vision transformers) are pre-trained.
Reinforcement Learning (RL)#
Learning by interacting with an environment and receiving reward or penalty.
The agent learns a policy:
Goal
Maximize cumulative reward over time.
Examples
Robotics
Game playing (Chess, Go, Atari)
Recommendation systems
Autonomous vehicles
Key Idea
Learning through trial and error, not through labeled data.
Online Learning#
Model learns incrementally from streaming data, one sample (or mini-batch) at a time.
Used when data arrives continuously or is too large to store.
Examples
Fraud detection (transaction by transaction)
Real-time anomaly detection
Clickstream prediction
IoT sensor data
Key Idea
Training happens continuously instead of batch training.
Transfer Learning#
Reusing a model trained on one task/data (source) and adapting it to a new, related task (target).
Two steps
Pre-train on large dataset
Fine-tune on small dataset for new task
Examples
Using ImageNet-pretrained ResNet for medical images
Using BERT/GPT embeddings for text classification
Using foundation models (LLMs, CLIP) for custom tasks
Key Idea
Reuse learned knowledge to reduce data and training cost.
Key Components of ML
Dataset → Collection of examples (features + labels).
Model → Mathematical representation that makes predictions.
Training → Process of learning patterns (adjusting model parameters).
Evaluation → Measuring performance (accuracy, error, etc.).
Prediction → Using the trained model on unseen data.
Why is ML important
Handles large, complex data humans cannot analyze manually.
Automates tasks (spam filtering, recommendation systems, fraud detection).
Improves over time as it sees more data.
Summary
Type |
Uses |
Key Idea |
|---|---|---|
Supervised |
Classification, regression |
Learn from labeled data |
Unsupervised |
Clustering, anomalies |
Discover structure without labels |
Semi-Supervised |
Low-label tasks |
Combine labeled + unlabeled data |
Self-Supervised |
Pre-training LLMs, vision models |
Model generates its own labels |
Reinforcement Learning |
Games, robotics |
Learn via reward/penalty |
Online Learning |
Streaming data |
Incremental learning |
Transfer Learning |
Fine-tuning pre-trained models |
Reuse knowledge for new tasks |
List of Machine Learning Algorithms
Category |
Sub-type |
Algorithms |
|---|---|---|
Supervised Learning |
Regression |
Linear Regression, Polynomial Regression, Ridge, Lasso, Elastic Net, SVR, Decision Tree Regression, Random Forest Regression, Gradient Boosting (XGBoost, LightGBM, CatBoost), kNN Regression, Bayesian Regression, Neural Networks |
Classification |
Logistic Regression, kNN, SVM, Decision Trees (CART, ID3, C4.5), Random Forest, Gradient Boosting (XGBoost, LightGBM, CatBoost), Naive Bayes (Gaussian, Multinomial, Bernoulli), Perceptron, Multi-layer Perceptrons, Ensemble Methods (Bagging, Stacking, Voting), Probabilistic Graphical Models (Bayesian Networks, CRFs) |
|
Unsupervised Learning |
Clustering |
k-Means, Hierarchical Clustering, DBSCAN, OPTICS, Gaussian Mixture Models, Mean-Shift, Spectral Clustering, BIRCH, Affinity Propagation |
Dimensionality Reduction |
PCA, Kernel PCA, ICA, SVD, Factor Analysis, t-SNE, UMAP, Autoencoders |
|
Association Rules |
Apriori, Eclat, FP-Growth |
|
Density Estimation |
KDE, Expectation-Maximization (EM), Hidden Markov Models (unsupervised setting) |
|
Semi-Supervised Learning |
— |
Self-training, Co-training, Label Propagation/Spreading, Semi-supervised SVM, Graph-based methods, Semi-supervised Deep Learning (Consistency Regularization, Pseudo-labeling) |
Reinforcement Learning |
Value-based |
Q-Learning, SARSA, Deep Q-Networks (DQN) |
Policy-based |
Policy Gradient (REINFORCE), Actor–Critic (A2C, A3C), Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO) |
|
Model-based / Advanced |
DDPG, TD3, SAC, Monte Carlo Tree Search, Multi-agent RL |
|
Other Methods |
Ensemble Methods |
Bagging, Boosting (AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost), Stacking, Blending, Voting Classifier |
Probabilistic / Bayesian |
Naive Bayes, Bayesian Networks, Gaussian Processes, HMMs, Markov Random Fields |
|
Deep Learning |
Feedforward NN, CNN, RNN, LSTM, GRU, Transformers (BERT, GPT), Variational Autoencoders (VAE), Generative Adversarial Networks (GANs) |