Optimal Binning for Numerical Variables using Isotonic Regression — optimal_binning_numerical

Implements an advanced binning algorithm for numerical variables using isotonic regression to ensure monotonicity in bin event rates. This method is particularly valuable for risk modeling, credit scoring, and other applications where monotonic relationships between features and target variables are expected or preferred.

Usage

optimal_binning_numerical_ir(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  auto_monotonicity = TRUE,
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: Binary integer vector (0 or 1) representing the target variable.
feature: Numeric vector of values to be binned.
min_bins: Minimum number of bins to generate (default: 3).
max_bins: Maximum number of bins allowed (default: 5).
bin_cutoff: Minimum frequency fraction for each bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins before optimization (default: 20).
auto_monotonicity: Automatically determine monotonicity direction (default: TRUE).
convergence_threshold: Convergence threshold for optimization (default: 1e-6).
max_iterations: Maximum number of iterations allowed (default: 1000).

Value

A list containing:

id: Numeric identifiers for each bin (1-based).
bin: Character vector with the bin intervals.
woe: Numeric vector with Weight of Evidence values for each bin.
iv: Numeric vector with Information Value contribution for each bin.
count: Integer vector with the total number of observations in each bin.
count_pos: Integer vector with the positive class counts in each bin.
count_neg: Integer vector with the negative class counts in each bin.
cutpoints: Numeric vector with the bin cutpoints (excluding ±Inf).
converged: Logical value indicating whether the algorithm converged.
iterations: Integer with the number of optimization iterations performed.
total_iv: Total Information Value of the binning solution.
monotone_increasing: Logical indicating whether monotonically increasing (TRUE) or decreasing (FALSE).

Details

Algorithm Overview

The algorithm transforms a continuous feature into discrete bins that maximize the relationship with a binary target while enforcing monotonicity constraints. It operates through several phases:

Pre-Binning: Initial segmentation based on quantiles or unique feature values
Frequency Stabilization: Merging of low-frequency bins to ensure statistical reliability
Monotonicity Enforcement: Application of isotonic regression via Pool Adjacent Violators (PAV)
Bin Optimization: Adjustments to meet constraints on minimum and maximum bin count
Information Value Calculation: Computation of WoE and IV metrics for each bin

Mathematical Foundation

The core mathematical concepts employed in this algorithm are:

1. Isotonic Regression

Isotonic regression solves the following optimization problem:

$$\min_{\mu} \sum_{i=1}^{n} w_i (y_i - \mu_i)^2$$

Subject to: $$\mu_1 \leq \mu_2 \leq \ldots \leq \mu_n$$ (for increasing monotonicity)

Where:

$y_i$ is the original event rate in bin $i$
$w_i$ is the weight (observation count) of bin $i$
$\mu_i$ is the isotonic (monotone) estimate for bin $i$

2. Weight of Evidence (WoE)

For each bin $i$, the Weight of Evidence is defined as:

$$WoE_i = \ln\left(\frac{p_i/P}{n_i/N}\right)$$

Where:

$p_i$: Number of positive cases in bin $i$
$P$: Total number of positive cases
$n_i$: Number of negative cases in bin $i$
$N$: Total number of negative cases

3. Information Value (IV)

For each bin $i$, the Information Value contribution is:

$$IV_i = \left(\frac{p_i}{P} - \frac{n_i}{N}\right) \times WoE_i$$

The total Information Value is:

$$IV_{total} = \sum_{i=1}^{k} IV_i$$

4. Laplace Smoothing

To handle zero counts, Laplace smoothing is applied:

$$\frac{p_i + \alpha}{P + k\alpha}, \frac{n_i + \alpha}{N + k\alpha}$$

Where:

$\alpha$: Smoothing factor (0.5 in this implementation)
$k$: Number of bins

Key Features

Automatic Monotonicity Direction: Determines optimal monotonicity (increasing/decreasing) based on data
Robust Handling of Edge Cases: Special processing for few unique values, missing data, etc.
Optimal Information Preservation: Merges bins to minimize information loss while meeting constraints
Statistical Reliability: Ensures each bin has sufficient observations for stable estimates

References

Barlow, R. E., & Brunk, H. D. (1972). The isotonic regression problem and its dual. Journal of the American Statistical Association, 67(337), 140-147.

Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. Wiley.

de Leeuw, J., Hornik, K., & Mair, P. (2009). Isotone optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. Journal of Statistical Software, 32(5), 1-24.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.

Thomas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit Scoring and Its Applications. Society for Industrial and Applied Mathematics.

Belkin, M., Hsu, D., & Mitra, P. (2018). Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate. Advances in Neural Information Processing Systems.

Examples

if (FALSE) { # \dontrun{
# Generate synthetic data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- rnorm(n)

# Basic usage
result <- optimal_binning_numerical_ir(target, feature)
print(result)

# Custom settings
result_custom <- optimal_binning_numerical_ir(
  target = target,
  feature = feature,
  min_bins = 2,
  max_bins = 6,
  bin_cutoff = 0.03,
  auto_monotonicity = TRUE
)

# Access specific components
bins <- result$bin
woe_values <- result$woe
is_increasing <- result$monotone_increasing
} # }