Optimal Binning for Numerical Variables using Dynamic Programming — optimal_binning_numerical

Performs optimal binning for numerical variables using a Dynamic Programming approach. It creates optimal bins for a numerical feature based on its relationship with a binary target variable, maximizing the predictive power while respecting user-defined constraints and enforcing monotonicity.

Usage

optimal_binning_numerical_dp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  monotonic_trend = "auto"
)

Arguments

target: An integer vector of binary target values (0 or 1).
feature: A numeric vector of feature values.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum proportion of total observations for a bin to avoid being merged (default: 0.05).
max_n_prebins: Maximum number of pre-bins before the optimization process (default: 20).
convergence_threshold: Convergence threshold for the algorithm (default: 1e-6).
max_iterations: Maximum number of iterations allowed (default: 1000).
monotonic_trend: Monotonicity direction. One of 'auto', 'ascending', 'descending', or 'none' (default: 'auto').

Value

A list containing the following elements:

id: Numeric vector of bin identifiers (1 to n).
bin: Character vector of bin ranges.
woe: Numeric vector of Weight of Evidence (WoE) values for each bin.
iv: Numeric vector of Information Value (IV) for each bin.
count: Numeric vector of total observations in each bin.
count_pos: Numeric vector of positive target observations in each bin.
count_neg: Numeric vector of negative target observations in each bin.
event_rate: Numeric vector of event rates (proportion of positive events) in each bin.
cutpoints: Numeric vector of cut points to generate the bins.
total_iv: Total Information Value across all bins.
converged: Logical indicating if the algorithm converged.
iterations: Integer number of iterations run by the algorithm.
execution_time_ms: Execution time in milliseconds.
monotonic_trend: The monotonic trend used ('auto', 'ascending', 'descending', 'none').

Details

The Dynamic Programming algorithm for numerical variables works as follows:

Create initial pre-bins based on equal-frequency binning of the feature distribution
Calculate bin statistics: counts, event rates, WoE, and IV
If monotonicity is required, determine the appropriate trend:
- In 'auto' mode: Calculate correlation between feature and target to choose direction
- In 'ascending'/'descending' mode: Use the specified direction
Enforce monotonicity by merging adjacent bins that violate the monotonic trend
Ensure bin constraints are met:
- If exceeding max_bins: Merge bins with the smallest WoE difference
- Handle rare bins: Merge bins with fewer than bin_cutoff proportion of observations
Calculate final statistics for the optimized bins

The Weight of Evidence (WoE) measures the predictive power of each bin and is calculated as:

$$WoE = \ln\left(\frac{\text{Distribution of Events}}{\text{Distribution of Non-Events}}\right)$$

The Information Value (IV) for each bin is calculated as:

$$IV = (\text{Distribution of Events} - \text{Distribution of Non-Events}) \times WoE$$

The total IV is the sum of bin IVs and measures the overall predictive power of the feature.

This implementation is based on the methodology described in:

Navas-Palencia, G. (2022). "OptBinning: Mathematical Optimization for Optimal Binning". Journal of Open Source Software, 7(74), 4101.
Siddiqi, N. (2017). "Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards". John Wiley & Sons, 2nd Edition.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2017). "Credit Scoring and Its Applications". SIAM, 2nd Edition.
Kotsiantis, S.B., & Kanellopoulos, D. (2006). "Discretization Techniques: A recent survey". GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Monotonicity constraints are particularly important in credit scoring and risk modeling applications, as they ensure that the model behaves in an intuitive and explainable way.

Examples

# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- rnorm(n)

# Run optimal binning
result <- optimal_binning_numerical_dp(target, feature, min_bins = 2, max_bins = 4)

# Print results
print(result)
#> $id
#> [1] 1 2 3 4
#> 
#> $bin
#> [1] "(-Inf;-1.691862]"     "(-1.691862;0.031526]" "(0.031526;1.651915]" 
#> [4] "(1.651915;+Inf]"     
#> 
#> $woe
#> [1]  0.18434380  0.02400115  0.01511220 -0.55136299
#> 
#> $iv
#> [1] 0.0016962072 0.0002592498 0.0001027778 0.0147786563
#> 
#> $count
#> [1]  50 450 450  50
#> 
#> $count_pos
#> [1]  27 225 224  18
#> 
#> $count_neg
#> [1]  23 225 226  32
#> 
#> $event_rate
#> [1] 0.5400000 0.5000000 0.4977778 0.3600000
#> 
#> $cutpoints
#> [1] -1.691862  0.031526  1.651915
#> 
#> $total_iv
#> [1] 0.01683689
#> 
#> $converged
#> [1] TRUE
#> 
#> $iterations
#> [1] 17
#> 
#> $execution_time_ms
#> [1] 0
#> 
#> $monotonic_trend
#> [1] "descending"
#>