CodeSteps

Python, C, C++, C#, PowerShell, Android, Visual C++, Java ...

Feature Scaling in Machine Learning

Feature scaling is the process of normalizing or standardizing the range of independent variables (features) in a dataset. It ensures that no single feature dominates others due to differences in magnitude, which can significantly affect the performance of many machine learning algorithms.


Why is Feature Scaling Important?

Consider a dataset with two features:

  • Age: ranges from 20 to 60
  • Income: ranges from 30,000 to 150,000

Without scaling, algorithms that rely on distances or gradients will treat Income as far more important simply because its values are numerically larger — not because it’s actually more informative. Scaling puts all features on a level playing field.


Types of Feature Scaling

1. Min-Max Normalization (Rescaling)

Transforms features to a fixed range, typically [0, 1].

Formula:

X= (XmaxXmin)​ / (XXmin​)

Example:

Age (original)Age (scaled)
200.0
400.5
601.0

When to use: When you know the distribution is not Gaussian and the algorithm requires bounded inputs (e.g., neural networks, image pixel values).

Drawback: Sensitive to outliers. A single extreme value compresses all other values.


2. Standardization (Z-Score Normalization)

Centers data around mean = 0 and scales to standard deviation = 1.

Formula:

X= (X μ​) / σ

Example:

Salaries: [40k, 50k, 60k, 70k, 80k] → Mean = 60k, Std = 14.14k

SalaryZ-Score
40,000-1.41
60,0000.00
80,000+1.41

When to use: When the algorithm assumes normally distributed data (e.g., Linear/Logistic Regression, SVM, PCA).

Advantage: Less affected by outliers compared to Min-Max.


3. Robust Scaling

Uses the median and Interquartile Range (IQR) instead of mean and std — making it robust to outliers.

Formula:

X= (Xmedian​) / IQR

where IQR = Q3 − Q1

Example: A salary dataset with an outlier of 10,000,000 won’t skew all other values when using robust scaling.

When to use: When your dataset contains significant outliers.


4. MaxAbs Scaling

Scales each feature by its maximum absolute value, resulting in values in [-1, 1].

Formula:

X= Xmax∣ / X

When to use: Works well with sparse data and preserves zero entries (common in text/NLP data).


5. Log Transformation

Applies a logarithm to compress wide-ranging values.

Formula:

X= log(X+1)

Example: Website traffic [100, 1000, 10000, 1000000] → after log: [2.0, 3.0, 4.0, 6.0]

When to use: Highly skewed distributions (e.g., income, population, prices).


Which Algorithms Need Feature Scaling?

AlgorithmNeeds Scaling?Reason
Linear Regression✅ YesGradient descent converges faster
Logistic Regression✅ YesDistance-sensitive
SVM✅ YesMaximizes margin using distances
K-Nearest Neighbors✅ YesPurely distance-based
Neural Networks✅ YesGradient sensitivity
PCA✅ YesVariance-based; large scales dominate
Decision Trees❌ NoSplit-based, not distance-sensitive
Random Forests❌ NoEnsemble of trees
Naive Bayes❌ NoProbability-based
Gradient Boosting (XGBoost)❌ NoTree-based, scale-invariant

Practical Example: KNN Without vs. With Scaling

Dataset:

PersonAgeIncome ($)Bought?
A2540,000No
B4580,000Yes
C3060,000?

Without scaling, the Euclidean distance between C and A:

d = sqrt ( (3025)2 + (6000040000)2 = sqrt (25 + 400,000,000) 20,000

The Income completely dominates Age — Age contributes almost nothing.

With Min-Max scaling (Age → [0,1], Income → [0,1]):

d = sqrt ( (0.250)2 + (0.50)2​ ) = sqrt (0.0625+0.25​ ) 0.56

Now both features contribute meaningfully to the distance.


Key Takeaways

  • Always fit the scaler on training data only, then transform both train and test sets — to prevent data leakage.
  • Standardization is the most general-purpose choice.
  • Min-Max is best when you need bounded outputs.
  • Robust Scaling is best with outliers.
  • Tree-based models are naturally immune to feature scale.
  • Feature scaling doesn’t change the information content — it just reframes the numeric range for algorithms to interpret fairly.
Feature Scaling in Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top