Optimal Binning for Numerical Variables using Dynamic Programming
optimal_binning_numerical_dp.Rd
Performs optimal binning for numerical variables using a Dynamic Programming approach. It creates optimal bins for a numerical feature based on its relationship with a binary target variable, maximizing the predictive power while respecting user-defined constraints and enforcing monotonicity.
Usage
optimal_binning_numerical_dp(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
convergence_threshold = 1e-06,
max_iterations = 1000L,
monotonic_trend = "auto"
)
Arguments
- target
An integer vector of binary target values (0 or 1).
- feature
A numeric vector of feature values.
- min_bins
Minimum number of bins (default: 3).
- max_bins
Maximum number of bins (default: 5).
- bin_cutoff
Minimum proportion of total observations for a bin to avoid being merged (default: 0.05).
- max_n_prebins
Maximum number of pre-bins before the optimization process (default: 20).
- convergence_threshold
Convergence threshold for the algorithm (default: 1e-6).
- max_iterations
Maximum number of iterations allowed (default: 1000).
- monotonic_trend
Monotonicity direction. One of 'auto', 'ascending', 'descending', or 'none' (default: 'auto').
Value
A list containing the following elements:
- id
Numeric vector of bin identifiers (1 to n).
- bin
Character vector of bin ranges.
- woe
Numeric vector of Weight of Evidence (WoE) values for each bin.
- iv
Numeric vector of Information Value (IV) for each bin.
- count
Numeric vector of total observations in each bin.
- count_pos
Numeric vector of positive target observations in each bin.
- count_neg
Numeric vector of negative target observations in each bin.
- event_rate
Numeric vector of event rates (proportion of positive events) in each bin.
- cutpoints
Numeric vector of cut points to generate the bins.
- total_iv
Total Information Value across all bins.
- converged
Logical indicating if the algorithm converged.
- iterations
Integer number of iterations run by the algorithm.
- execution_time_ms
Execution time in milliseconds.
- monotonic_trend
The monotonic trend used ('auto', 'ascending', 'descending', 'none').
Details
The Dynamic Programming algorithm for numerical variables works as follows:
Create initial pre-bins based on equal-frequency binning of the feature distribution
Calculate bin statistics: counts, event rates, WoE, and IV
If monotonicity is required, determine the appropriate trend:
In 'auto' mode: Calculate correlation between feature and target to choose direction
In 'ascending'/'descending' mode: Use the specified direction
Enforce monotonicity by merging adjacent bins that violate the monotonic trend
Ensure bin constraints are met:
If exceeding max_bins: Merge bins with the smallest WoE difference
Handle rare bins: Merge bins with fewer than bin_cutoff proportion of observations
Calculate final statistics for the optimized bins
The Weight of Evidence (WoE) measures the predictive power of each bin and is calculated as:
$$WoE = \ln\left(\frac{\text{Distribution of Events}}{\text{Distribution of Non-Events}}\right)$$
The Information Value (IV) for each bin is calculated as:
$$IV = (\text{Distribution of Events} - \text{Distribution of Non-Events}) \times WoE$$
The total IV is the sum of bin IVs and measures the overall predictive power of the feature.
This implementation is based on the methodology described in:
Navas-Palencia, G. (2022). "OptBinning: Mathematical Optimization for Optimal Binning". Journal of Open Source Software, 7(74), 4101.
Siddiqi, N. (2017). "Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards". John Wiley & Sons, 2nd Edition.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2017). "Credit Scoring and Its Applications". SIAM, 2nd Edition.
Kotsiantis, S.B., & Kanellopoulos, D. (2006). "Discretization Techniques: A recent survey". GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.
Monotonicity constraints are particularly important in credit scoring and risk modeling applications, as they ensure that the model behaves in an intuitive and explainable way.
Examples
# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- rnorm(n)
# Run optimal binning
result <- optimal_binning_numerical_dp(target, feature, min_bins = 2, max_bins = 4)
# Print results
print(result)
#> $id
#> [1] 1 2 3 4
#>
#> $bin
#> [1] "(-Inf;-1.691862]" "(-1.691862;0.031526]" "(0.031526;1.651915]"
#> [4] "(1.651915;+Inf]"
#>
#> $woe
#> [1] 0.18434380 0.02400115 0.01511220 -0.55136299
#>
#> $iv
#> [1] 0.0016962072 0.0002592498 0.0001027778 0.0147786563
#>
#> $count
#> [1] 50 450 450 50
#>
#> $count_pos
#> [1] 27 225 224 18
#>
#> $count_neg
#> [1] 23 225 226 32
#>
#> $event_rate
#> [1] 0.5400000 0.5000000 0.4977778 0.3600000
#>
#> $cutpoints
#> [1] -1.691862 0.031526 1.651915
#>
#> $total_iv
#> [1] 0.01683689
#>
#> $converged
#> [1] TRUE
#>
#> $iterations
#> [1] 17
#>
#> $execution_time_ms
#> [1] 0
#>
#> $monotonic_trend
#> [1] "descending"
#>