Optimal Numerical Binning JEDI (Joint Entropy-Driven Interval Discretization) — optimal_binning_numerical

A sophisticated numerical binning algorithm designed to optimize the Information Value (IV) while ensuring monotonic Weight of Evidence (WoE) relationships. The algorithm employs quantile-based pre-binning combined with adaptive merging strategies, ensuring both statistical stability and optimal information retention.

Usage

optimal_binning_numerical_jedi(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: Integer binary vector (0 or 1) representing the target variable.
feature: Numeric vector representing the continuous predictor.
min_bins: Minimum number of bins to create (default: 3).
max_bins: Maximum number of bins allowed (default: 5).
bin_cutoff: Minimum relative frequency per bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins before optimization (default: 20).
convergence_threshold: IV change threshold for convergence (default: 1e-6).
max_iterations: Maximum number of optimization iterations (default: 1000).

Value

A list containing the following elements:

bin: Character vector with the intervals of the bins.
woe: Numeric vector with Weight of Evidence values.
iv: Numeric vector with Information Value per bin.
count: Integer vector with the observation counts per bin.
count_pos: Integer vector with the positive class counts per bin.
count_neg: Integer vector with the negative class counts per bin.
cutpoints: Numeric vector with the cutpoints (excluding ±Inf).
converged: Logical indicating whether the algorithm converged.
iterations: Integer with the number of iterations performed.

Details

Mathematical Framework:

For a numerical variable $X$ and a binary target $Y \in \{0,1\}$, the algorithm creates $K$ bins defined by $K-1$ cutpoints where each bin $B_i = (c_{i-1}, c_i]$ optimizes the information content, satisfying the following constraints:

Monotonic WoE: $WoE_i \le WoE_{i+1}$ (or $\ge$ for decreasing trends).
Minimum Bin Size: count$(B_i)/N \ge$ bin_cutoff.
Bin Quantity Limits: min_bins $\le K \le$ max_bins.

Weight of Evidence (WoE) for bin $i$: $$WoE_i = \ln\left(\frac{\text{Pos}_i / \sum \text{Pos}_i}{\text{Neg}_i / \sum \text{Neg}_i}\right)$$

Information Value (IV) per bin: $$IV_i = \left(\frac{\text{Pos}_i}{\sum \text{Pos}_i} - \frac{\text{Neg}_i}{\sum \text{Neg}_i}\right) \times WoE_i$$

Total IV: $$IV_{total} = \sum_{i=1}^K IV_i$$

Algorithm Phases:

Quantile-based Pre-Binning: Initial segmentation with validation of minimum frequency.
Rare Bin Merging: Combines bins below the bin_cutoff to ensure statistical stability.
Monotonicity Enforcement: Adjusts bins to maintain monotonic WoE relationships.
Bin Count Optimization: Ensures the number of bins respects min_bins and max_bins constraints.
Convergence Monitoring: Tracks IV stability to identify convergence.

Key Features:

Numerical Stability: WoE calculation includes epsilon to avoid division by zero.
Adaptive Merging Strategy: Minimizes IV loss during bin merging.
Robust Handling of Edge Cases: Designed to handle extreme values and skewed distributions effectively.
Efficient Binary Search: Used for bin assignments during pre-binning.
Early Convergence Detection: Stops iterations when IV stabilizes within the threshold.

Parameters:

min_bins: Minimum number of bins to be created (default: 3, must be >= 2).
max_bins: Maximum number of bins allowed (default: 5, must be >= min_bins).
bin_cutoff: Minimum relative frequency required for a bin to remain standalone (default: 0.05).
max_n_prebins: Maximum number of pre-bins created before optimization (default: 20).
convergence_threshold: Threshold for IV change to determine convergence (default: 1e-6).
max_iterations: Maximum number of optimization iterations (default: 1000).

References

Information Theory and Statistical Learning (Cover & Thomas, 2006)
Optimal Binning for Scoring Models (Mironchyk & Tchistiakov, 2017)
Monotonic Scoring and Binning (Beltrami & Bassani, 2021)

Examples

if (FALSE) { # \dontrun{
# Basic usage with default parameters
result <- optimal_binning_numerical_jedi(
  target = c(1,0,1,0,1),
  feature = c(1.2,3.4,2.1,4.5,2.8)
)

# Custom configuration for finer granularity
result <- optimal_binning_numerical_jedi(
  target = target_vector,
  feature = feature_vector,
  min_bins = 5,
  max_bins = 10,
  bin_cutoff = 0.03
)
} # }