Feature Scaling in Machine Learning

Feature scaling is the process of normalizing or standardizing the range of independent variables (features) in a dataset. It ensures that no single feature dominates others due to differences in magnitude, which can significantly affect the performance of many machine learning algorithms.

Why is Feature Scaling Important?

Consider a dataset with two features:

Age: ranges from 20 to 60
Income: ranges from 30,000 to 150,000

Without scaling, algorithms that rely on distances or gradients will treat Income as far more important simply because its values are numerically larger — not because it’s actually more informative. Scaling puts all features on a level playing field.

Types of Feature Scaling

1. Min-Max Normalization (Rescaling)

Transforms features to a fixed range, typically [0, 1].

Formula:

X′ = (X−Xmin) / (Xmax−Xmin)

Example:

Age (original)	Age (scaled)
20	0.0
40	0.5
60	1.0

When to use: When you know the distribution is not Gaussian and the algorithm requires bounded inputs (e.g., neural networks, image pixel values).

Drawback: Sensitive to outliers. A single extreme value compresses all other values.

2. Standardization (Z-Score Normalization)

Centers data around mean = 0 and scales to standard deviation = 1.

Formula:

X′ = (X − μ) / σ

Example:

Salaries: [40k, 50k, 60k, 70k, 80k] → Mean = 60k, Std = 14.14k

Salary	Z-Score
40,000	-1.41
60,000	0.00
80,000	+1.41

When to use: When the algorithm assumes normally distributed data (e.g., Linear/Logistic Regression, SVM, PCA).

Advantage: Less affected by outliers compared to Min-Max.

3. Robust Scaling

Uses the median and Interquartile Range (IQR) instead of mean and std — making it robust to outliers.

Formula:

X′ = (X−median) / IQR

where IQR = Q3 − Q1

Example: A salary dataset with an outlier of 10,000,000 won’t skew all other values when using robust scaling.

When to use: When your dataset contains significant outliers.

4. MaxAbs Scaling

Scales each feature by its maximum absolute value, resulting in values in [-1, 1].

Formula:

X′ = X / ∣Xmax∣

When to use: Works well with sparse data and preserves zero entries (common in text/NLP data).

5. Log Transformation

Applies a logarithm to compress wide-ranging values.

Formula:

X′ = log(X+1)

Example: Website traffic [100, 1000, 10000, 1000000] → after log: [2.0, 3.0, 4.0, 6.0]

When to use: Highly skewed distributions (e.g., income, population, prices).

Which Algorithms Need Feature Scaling?

Algorithm	Needs Scaling?	Reason
Linear Regression	✅ Yes	Gradient descent converges faster
Logistic Regression	✅ Yes	Distance-sensitive
SVM	✅ Yes	Maximizes margin using distances
K-Nearest Neighbors	✅ Yes	Purely distance-based
Neural Networks	✅ Yes	Gradient sensitivity
PCA	✅ Yes	Variance-based; large scales dominate
Decision Trees	❌ No	Split-based, not distance-sensitive
Random Forests	❌ No	Ensemble of trees
Naive Bayes	❌ No	Probability-based
Gradient Boosting (XGBoost)	❌ No	Tree-based, scale-invariant

Practical Example: KNN Without vs. With Scaling

Dataset:

Person	Age	Income ($)	Bought?
A	25	40,000	No
B	45	80,000	Yes
C	30	60,000	?

Without scaling, the Euclidean distance between C and A:

d = sqrt ( (30−25)²+ (60000−40000)²) = sqrt (25 + 400,000,000) ≈ 20,000

The Income completely dominates Age — Age contributes almost nothing.

With Min-Max scaling (Age → [0,1], Income → [0,1]):

d = sqrt ( (0.25−0)²+ (0.5−0)² ) ) = sqrt (0.0625+0.25 ) ≈ 0.56

Now both features contribute meaningfully to the distance.

Key Takeaways

Always fit the scaler on training data only, then transform both train and test sets — to prevent data leakage.
Standardization is the most general-purpose choice.
Min-Max is best when you need bounded outputs.
Robust Scaling is best with outliers.
Tree-based models are naturally immune to feature scale.
Feature scaling doesn’t change the information content — it just reframes the numeric range for algorithms to interpret fairly.

Feature Scaling in Machine Learning

Feature Scaling in Machine Learning

Why is Feature Scaling Important?

Types of Feature Scaling

1. Min-Max Normalization (Rescaling)

2. Standardization (Z-Score Normalization)

3. Robust Scaling

4. MaxAbs Scaling

5. Log Transformation

Which Algorithms Need Feature Scaling?

Practical Example: KNN Without vs. With Scaling

Key Takeaways

Leave a Reply Cancel reply

Policies