RealLabelNormalization.jl
A Julia package for robust normalization of real-valued labels, commonly used in regression tasks. This package provides various normalization methods with built-in outlier handling and NaN support.
Features
- Multiple normalization methods: Min-max and Z-score normalization
- Flexible normalization modes: Global or column-wise normalization
- Robust outlier handling: Configurable quantile-based clipping
- NaN handling: Preserves NaN values while computing statistics on valid data
- Consistent train/test normalization: Save statistics from training data and apply to test data
Quick Start (Stats-Based Workflow)
using RealLabelNormalization
# Training labels with outlier
train_labels = [1.5, 2.3, 4.1, 3.7, 5.2, 100.0]
test_labels = [2.1, 3.9, 4.5]
# Step 1: Compute stats from TRAINING DATA ONLY
stats = compute_normalization_stats(train_labels; method=:zscore, clip_quantiles=(0.01, 0.99))
# Step 2: Apply SAME stats to training data
train_normalized = apply_normalization(train_labels, stats)
# Step 3: Apply SAME STATS to test data (prevents data leakage!)
test_normalized = apply_normalization(test_labels, stats)
# Step 4: Train model on normalized data
# model = train_your_model(X_train, train_normalized)
# Step 5: Denormalize predictions back to original scale using SAME stats
predictions_normalized = model(X_test) # Model outputs normalized predictions
predictions_original = denormalize_labels(predictions_normalized, stats)
Installation
using Pkg
Pkg.add("RealLabelNormalization")
API Reference
RealLabelNormalization._apply_training_clip_bounds
RealLabelNormalization._clip_outliers
RealLabelNormalization.apply_normalization
RealLabelNormalization.apply_normalization
RealLabelNormalization.compute_normalization_stats
RealLabelNormalization.compute_normalization_stats
RealLabelNormalization.denormalize_labels
RealLabelNormalization.denormalize_labels
RealLabelNormalization.normalize_labels
RealLabelNormalization.normalize_labels
RealLabelNormalization._apply_training_clip_bounds
— MethodApply training clip bounds to validation/test data.
RealLabelNormalization._clip_outliers
— MethodClip outliers using quantiles before normalization.
RealLabelNormalization.apply_normalization
— Methodapply_normalization(labels, stats)
Apply pre-computed normalization statistics to new data (validation/test sets).
Ensures consistent normalization across train/validation/test splits using only training statistics. This includes applying the same clipping bounds if they were used during training.
RealLabelNormalization.compute_normalization_stats
— Methodcompute_normalization_stats(labels; method=:minmax, mode=:global,
range=(-1, 1), clip_quantiles=(0.01, 0.99))
Compute normalization statistics from training data for later application to validation/test sets.
Inputs
labels
: Vector or matrix where the last dimension is the number of samplesmethod::Symbol
: Normalization method:minmax
: Min-max normalization (default):zscore
: Z-score normalization (mean=0, std=1)
range::Tuple{Real,Real}
: Target range for min-max normalization (default (-1, 1))(-1, 1)
: Scaled min-max to[-1,1]
(default)(0, 1)
: Standard min-max to [0,1]- Custom ranges: e.g.,
(-2, 2)
mode::Symbol
: Normalization scope:global
: Normalize across all values (default):columnwise
: Normalize each column independently:rowwise
: Normalize each row independently
clip_quantiles::Union{Nothing,Tuple{Real,Real}}
: Percentile values (0-1) for outlier clipping before normalization(0.01, 0.99)
: Clip to 1st-99th percentiles (default)(0.05, 0.95)
: Clip to 5th-95th percentiles (more aggressive)nothing
: No clipping
Returns
- Named tuple with normalization parameters that can be used with
apply_normalization
Example
# Compute stats from training data with outlier clipping
train_stats = compute_normalization_stats(train_labels; method=:zscore, mode=:columnwise, clip_quantiles=(0.05, 0.95))
# Apply to validation/test data (uses same clipping bounds)
val_normalized = apply_normalization(val_labels, train_stats)
test_normalized = apply_normalization(test_labels, train_stats)
RealLabelNormalization.denormalize_labels
— Methoddenormalize_labels(normalized_labels, stats)
Convert normalized labels back to original scale using stored statistics.
Useful for interpreting model predictions in original units.
RealLabelNormalization.normalize_labels
— Methodnormalize_labels(labels; method=:minmax, range=(-1, 1), mode=:global, clip_quantiles=(0.01, 0.99))
Normalize labels with various normalization methods and modes. Handles NaN values by ignoring them in statistical computations and preserving them in the output.
Arguments
labels
: Vector or matrix where the last dimension is the number of samplesmethod::Symbol
: Normalization method:minmax
: Min-max normalization (default):zscore
: Z-score normalization (mean=0, std=1)
range::Tuple{Real,Real}
: Target range for min-max normalization (default: (-1, 1))(-1, 1)
: Scaled min-max to[-1,1]
(default)(0, 1)
: Standard min-max to [0,1]- Custom ranges: e.g.,
(-2, 2)
mode::Symbol
: Normalization scope:global
: Normalize across all values (default):columnwise
: Normalize each column independently:rowwise
: Normalize each row independently
clip_quantiles::Union{Nothing,Tuple{Real,Real}}
: Percentile values (0-1) for outlier clipping before normalization(0.01, 0.99)
: Clip to 1st-99th percentiles (default)(0.05, 0.95)
: Clip to 5th-95th percentiles (more aggressive)nothing
: No clipping
NaN Handling
- NaN values are ignored when computing statistics (min, max, mean, std, quantiles)
- NaN values are preserved in the output (remain as NaN)
- If all values in a column are NaN, appropriate warnings are issued and NaN is returned
Returns
- Normalized labels with same shape as input
Examples
# Vector labels (single target)
labels = [1.0, 5.0, 3.0, 8.0, 2.0, 100.0] # 100.0 is outlier
# Min-max to [-1,1] with outlier clipping (default)
normalized = normalize_labels(labels)
# Min-max to [0,1]
normalized = normalize_labels(labels; range=(0, 1))
# Z-score normalization with outlier clipping
normalized = normalize_labels(labels; method=:zscore)
# Matrix labels (multi-target)
labels_matrix = [1.0 10.0; 5.0 20.0; 3.0 15.0; 8.0 25.0; 1000.0 5.0] # Outlier in col 1
# Global normalization with clipping
normalized = normalize_labels(labels_matrix; mode=:global)
# Column-wise normalization with clipping
normalized = normalize_labels(labels_matrix; mode=:columnwise)
# Row-wise normalization with clipping
normalized = normalize_labels(labels_matrix; mode=:rowwise)