
Optimal Binning for Categorical Variables using Information Value Dynamic Programming
Source:R/obc_ivb.R
ob_categorical_ivb.RdPerforms supervised discretization of categorical variables using a dynamic programming algorithm specifically designed to maximize total Information Value (IV). This implementation employs Bayesian smoothing for numerical stability, maintains monotonic Weight of Evidence constraints, and uses efficient caching strategies for optimal performance with high-cardinality features.
Usage
ob_categorical_ivb(
feature,
target,
min_bins = 3,
max_bins = 5,
bin_cutoff = 0.05,
max_n_prebins = 20,
bin_separator = "%;%",
convergence_threshold = 1e-06,
max_iterations = 1000
)Arguments
- feature
A character vector or factor representing the categorical predictor variable to be binned. Missing values are automatically converted to the category
"NA".- target
An integer vector of binary outcomes (0/1) corresponding to each observation in
feature. Missing values are not permitted.- min_bins
Integer. Minimum number of bins to produce. Must be >= 2. The algorithm searches for solutions within [
min_bins,max_bins] that maximize total IV. Defaults to 3.- max_bins
Integer. Maximum number of bins to produce. Must be >=
min_bins. Defines the upper bound of the search space. Defaults to 5.- bin_cutoff
Numeric. Minimum proportion of total observations required for a category to remain separate. Categories below this threshold are pre-merged before the optimization phase. Must be in (0, 1). Defaults to 0.05.
- max_n_prebins
Integer. Maximum number of initial bins before dynamic programming optimization. Controls computational complexity for high-cardinality features. Must be >= 2. Defaults to 20.
- bin_separator
Character string used to concatenate category names when multiple categories are merged into a single bin. Defaults to "%;%".
- convergence_threshold
Numeric. Convergence tolerance for the iterative optimization process based on IV change. Algorithm stops when \(|\Delta IV| <\)
convergence_threshold. Must be > 0. Defaults to 1e-6.- max_iterations
Integer. Maximum number of optimization iterations. Prevents excessive computation. Must be > 0. Defaults to 1000.
Value
A list containing the binning results with the following components:
idInteger vector of bin identifiers (1-indexed)
binCharacter vector of bin labels (merged category names)
woeNumeric vector of Weight of Evidence values per bin
ivNumeric vector of Information Value contribution per bin
countInteger vector of total observations per bin
count_posInteger vector of positive cases (target=1) per bin
count_negInteger vector of negative cases (target=0) per bin
total_ivNumeric total Information Value of the binning solution
convergedLogical indicating algorithm convergence
iterationsInteger count of optimization iterations performed
Details
The Information Value Binning (IVB) algorithm uses dynamic programming to find the globally optimal binning solution that maximizes total IV subject to constraints on bin count and monotonicity.
Algorithm Workflow:
Input validation and preprocessing
Single-pass category counting and statistics computation
Rare category pre-merging (frequencies <
bin_cutoff)Pre-bin limitation (if categories >
max_n_prebins)Category sorting by event rate
Cumulative statistics cache initialization
Dynamic programming table computation:
State: \(DP[i][k]\) = max IV using first \(i\) categories in \(k\) bins
Transition: \(DP[i][k] = \max_j \{DP[j][k-1] + IV(j+1, i)\}\)
Banded optimization to skip infeasible splits
Backtracking to reconstruct optimal bins
Adaptive monotonicity enforcement
Final metric computation with Bayesian smoothing
Dynamic Programming Formulation:
Let \(DP[i][k]\) represent the maximum total IV achievable using the first \(i\) categories (sorted by event rate) partitioned into \(k\) bins.
Recurrence relation: $$DP[i][k] = \max_{k-1 \leq j < i} \{DP[j][k-1] + IV(j+1, i)\}$$
Base case: $$DP[i][1] = IV(1, i) \quad \forall i$$
where \(IV(j+1, i)\) is the Information Value of a bin containing categories from \(j+1\) to \(i\).
Bayesian Smoothing:
To prevent numerical instability and overfitting with sparse bins, WoE and IV are calculated using Bayesian smoothing with pseudocounts:
$$p'_i = \frac{n_{i,pos} + \alpha_p}{N_{pos} + \alpha_{total}}$$ $$n'_i = \frac{n_{i,neg} + \alpha_n}{N_{neg} + \alpha_{total}}$$
where \(\alpha_p\) and \(\alpha_n\) are prior pseudocounts proportional to the overall event rate, and \(\alpha_{total} = 1.0\) (prior strength).
$$WoE_i = \ln\left(\frac{p'_i}{n'_i}\right)$$ $$IV_i = (p'_i - n'_i) \times WoE_i$$
Adaptive Monotonicity Enforcement:
After finding the optimal bins, the algorithm enforces WoE monotonicity by:
Computing average WoE gap: \(\bar{\Delta} = \frac{1}{k-1}\sum_{i=1}^{k-1}|WoE_{i+1} - WoE_i|\)
Setting adaptive threshold: \(\tau = \min(\epsilon, 0.01\bar{\Delta})\)
Identifying worst violation: \(i^* = \arg\max_i \{WoE_i - WoE_{i+1}\}\)
Evaluating forward and backward merges by IV retention
Selecting merge direction that maximizes total IV
Computational Complexity:
Time: \(O(k^2 \cdot n)\) where \(k\) = max_bins, \(n\) = categories
Space: \(O(k \cdot n)\) for DP tables and cumulative caches
IV calculations are \(O(1)\) due to cumulative statistics caching
Advantages over Alternative Methods:
Global optimality: Guaranteed maximum IV (within constraint space)
Bayesian regularization: Robust to sparse bins and class imbalance
Efficient caching: Cumulative stats and IV memoization
Banded optimization: Reduced search space via feasibility pruning
Adaptive monotonicity: Context-aware threshold for enforcement
Comparison with Related Methods:
vs DP (general): IVB specifically optimizes IV; general DP more flexible
vs GMB: IVB guarantees optimality; GMB is faster but approximate
vs ChiMerge: IVB uses IV criterion; ChiMerge uses chi-square
References
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulation and solution approach. Expert Systems with Applications, 158, 113508. doi:10.1016/j.eswa.2020.113508
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Siddiqi, N. (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards (2nd ed.). Wiley.
Good, I. J. (1965). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press.
Anderson, R. (2007). The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press.
See also
ob_categorical_dp for general dynamic programming binning,
ob_categorical_gmb for greedy merge approximation,
ob_categorical_cm for ChiMerge-based binning
Examples
# \donttest{
# Example 1: Basic IV optimization with Bayesian smoothing
set.seed(42)
n_obs <- 1200
# Simulate industry sectors with varying default risk
industries <- c(
"Technology", "Healthcare", "Finance", "Manufacturing",
"Retail", "Energy"
)
default_rates <- c(0.03, 0.05, 0.08, 0.12, 0.18, 0.25)
cat_feature <- sample(industries, n_obs,
replace = TRUE,
prob = c(0.20, 0.18, 0.22, 0.18, 0.12, 0.10)
)
bin_target <- sapply(cat_feature, function(x) {
rbinom(1, 1, default_rates[which(industries == x)])
})
# Apply IVB optimization
result_ivb <- ob_categorical_ivb(
cat_feature,
bin_target,
min_bins = 3,
max_bins = 4
)
# Display results
print(data.frame(
Bin = result_ivb$bin,
WoE = round(result_ivb$woe, 3),
IV = round(result_ivb$iv, 4),
Count = result_ivb$count,
EventRate = round(result_ivb$count_pos / result_ivb$count, 3)
))
#> Bin WoE IV Count EventRate
#> 1 Technology -1.723 0.3003 231 0.022
#> 2 Finance%;%Healthcare -0.542 0.0965 487 0.068
#> 3 Manufacturing 0.397 0.0342 223 0.157
#> 4 Retail%;%Energy 0.879 0.2311 259 0.232
cat("\nTotal IV (maximized):", round(result_ivb$total_iv, 4), "\n")
#>
#> Total IV (maximized): 0.6621
cat("Converged:", result_ivb$converged, "\n")
#> Converged: TRUE
cat("Iterations:", result_ivb$iterations, "\n")
#> Iterations: 1
# Example 2: Comparing IV optimization with other methods
set.seed(123)
n_obs_comp <- 1500
regions <- c("North", "South", "East", "West", "Central")
cat_feature_comp <- sample(regions, n_obs_comp, replace = TRUE)
bin_target_comp <- rbinom(n_obs_comp, 1, 0.15)
# IVB (IV-optimized)
result_ivb_comp <- ob_categorical_ivb(
cat_feature_comp, bin_target_comp,
min_bins = 2, max_bins = 3
)
# GMB (greedy approximation)
result_gmb_comp <- ob_categorical_gmb(
cat_feature_comp, bin_target_comp,
min_bins = 2, max_bins = 3
)
# DP (general optimization)
result_dp_comp <- ob_categorical_dp(
cat_feature_comp, bin_target_comp,
min_bins = 2, max_bins = 3
)
cat("\nMethod comparison:\n")
#>
#> Method comparison:
cat(" IVB total IV:", round(result_ivb_comp$total_iv, 4), "\n")
#> IVB total IV: 0.053
cat(" GMB total IV:", round(result_gmb_comp$total_iv, 4), "\n")
#> GMB total IV: 0.053
cat(" DP total IV:", round(result_dp_comp$total_iv, 4), "\n")
#> DP total IV: 0.0531
cat("\nIVB typically achieves highest IV due to explicit optimization\n")
#>
#> IVB typically achieves highest IV due to explicit optimization
# }