RealLabelNormalization.jl

A Julia package for robust normalization of real-valued labels, commonly used in regression tasks. This package provides various normalization methods with built-in outlier handling and NaN support.

Features

  • Multiple normalization methods: Min-max and Z-score normalization
  • Flexible normalization modes: Global or column-wise normalization
  • Robust outlier handling: Configurable quantile-based clipping
  • NaN handling: Preserves NaN values while computing statistics on valid data
  • Consistent train/test normalization: Save statistics from training data and apply to test data

Quick Start (Stats-Based Workflow)

using RealLabelNormalization

# Training labels with outlier
train_labels = [1.5, 2.3, 4.1, 3.7, 5.2, 100.0]
test_labels = [2.1, 3.9, 4.5]

# Step 1: Compute stats from TRAINING DATA ONLY
stats = compute_normalization_stats(train_labels; method=:zscore, clip_quantiles=(0.01, 0.99))

# Step 2: Apply SAME stats to training data
train_normalized = apply_normalization(train_labels, stats)

# Step 3: Apply SAME STATS to test data (prevents data leakage!)
test_normalized = apply_normalization(test_labels, stats)

# Step 4: Train model on normalized data
# model = train_your_model(X_train, train_normalized)

# Step 5: Denormalize predictions back to original scale using SAME stats
predictions_normalized = model(X_test)  # Model outputs normalized predictions
predictions_original = denormalize_labels(predictions_normalized, stats)

Installation

using Pkg
Pkg.add("RealLabelNormalization")

API Reference

RealLabelNormalization.apply_normalizationMethod
apply_normalization(labels, stats)

Apply pre-computed normalization statistics to new data (validation/test sets).

Ensures consistent normalization across train/validation/test splits using only training statistics. This includes applying the same clipping bounds if they were used during training.

source
RealLabelNormalization.compute_normalization_statsMethod
compute_normalization_stats(labels; method=:minmax, mode=:global, 
range=(-1, 1), clip_quantiles=(0.01, 0.99))

Compute normalization statistics from training data for later application to validation/test sets.

Inputs

  • labels: Vector or matrix where the last dimension is the number of samples
  • method::Symbol: Normalization method
    • :minmax: Min-max normalization (default)
    • :zscore: Z-score normalization (mean=0, std=1)
  • range::Tuple{Real,Real}: Target range for min-max normalization (default (-1, 1))
    • (-1, 1): Scaled min-max to [-1,1] (default)
    • (0, 1): Standard min-max to [0,1]
    • Custom ranges: e.g., (-2, 2)
  • mode::Symbol: Normalization scope
    • :global: Normalize across all values (default)
    • :columnwise: Normalize each column independently
    • :rowwise: Normalize each row independently
  • clip_quantiles::Union{Nothing,Tuple{Real,Real}}: Percentile values (0-1) for outlier clipping before normalization
    • (0.01, 0.99): Clip to 1st-99th percentiles (default)
    • (0.05, 0.95): Clip to 5th-95th percentiles (more aggressive)
    • nothing: No clipping

Returns

  • Named tuple with normalization parameters that can be used with apply_normalization

Example

# Compute stats from training data with outlier clipping
train_stats = compute_normalization_stats(train_labels; method=:zscore, mode=:columnwise, clip_quantiles=(0.05, 0.95))

# Apply to validation/test data (uses same clipping bounds)
val_normalized = apply_normalization(val_labels, train_stats)
test_normalized = apply_normalization(test_labels, train_stats)
source
RealLabelNormalization.denormalize_labelsMethod
denormalize_labels(normalized_labels, stats)

Convert normalized labels back to original scale using stored statistics.

Useful for interpreting model predictions in original units.

source
RealLabelNormalization.normalize_labelsMethod
normalize_labels(labels; method=:minmax, range=(-1, 1), mode=:global, clip_quantiles=(0.01, 0.99))

Normalize labels with various normalization methods and modes. Handles NaN values by ignoring them in statistical computations and preserving them in the output.

Arguments

  • labels: Vector or matrix where the last dimension is the number of samples
  • method::Symbol: Normalization method
    • :minmax: Min-max normalization (default)
    • :zscore: Z-score normalization (mean=0, std=1)
  • range::Tuple{Real,Real}: Target range for min-max normalization (default: (-1, 1))
    • (-1, 1): Scaled min-max to [-1,1] (default)
    • (0, 1): Standard min-max to [0,1]
    • Custom ranges: e.g., (-2, 2)
  • mode::Symbol: Normalization scope
    • :global: Normalize across all values (default)
    • :columnwise: Normalize each column independently
    • :rowwise: Normalize each row independently
  • clip_quantiles::Union{Nothing,Tuple{Real,Real}}: Percentile values (0-1) for outlier clipping before normalization
    • (0.01, 0.99): Clip to 1st-99th percentiles (default)
    • (0.05, 0.95): Clip to 5th-95th percentiles (more aggressive)
    • nothing: No clipping

NaN Handling

  • NaN values are ignored when computing statistics (min, max, mean, std, quantiles)
  • NaN values are preserved in the output (remain as NaN)
  • If all values in a column are NaN, appropriate warnings are issued and NaN is returned

Returns

  • Normalized labels with same shape as input

Examples

# Vector labels (single target)
labels = [1.0, 5.0, 3.0, 8.0, 2.0, 100.0]  # 100.0 is outlier

# Min-max to [-1,1] with outlier clipping (default)
normalized = normalize_labels(labels)

# Min-max to [0,1] 
normalized = normalize_labels(labels; range=(0, 1))

# Z-score normalization with outlier clipping
normalized = normalize_labels(labels; method=:zscore)

# Matrix labels (multi-target)
labels_matrix = [1.0 10.0; 5.0 20.0; 3.0 15.0; 8.0 25.0; 1000.0 5.0]  # Outlier in col 1

# Global normalization with clipping
normalized = normalize_labels(labels_matrix; mode=:global)

# Column-wise normalization with clipping 
normalized = normalize_labels(labels_matrix; mode=:columnwise)

# Row-wise normalization with clipping
normalized = normalize_labels(labels_matrix; mode=:rowwise)
source