Optimal Binning for Categorical Variables using Information Value Dynamic Programming
optimal_binning_categorical_ivb.Rd
Implements optimal binning for categorical variables using a dynamic programming approach to maximize Information Value (IV). The algorithm finds the globally optimal binning solution within the constraints of minimum and maximum bin counts.
Usage
optimal_binning_categorical_ivb(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
bin_separator = "%;%",
convergence_threshold = 1e-06,
max_iterations = 1000L
)
Arguments
- target
Integer binary vector (0 or 1) representing the response variable.
- feature
Character vector or factor containing the categorical values of the explanatory variable.
- min_bins
Minimum number of bins (default: 3).
- max_bins
Maximum number of bins (default: 5).
- bin_cutoff
Minimum frequency for a separate bin (default: 0.05).
- max_n_prebins
Maximum number of pre-bins before optimization (default: 20).
- bin_separator
Separator for merged category names (default: "%;%").
- convergence_threshold
Convergence threshold for IV (default: 1e-6).
- max_iterations
Maximum number of iterations in the search for the optimal solution (default: 1000).
Value
A list containing:
id: Numeric vector of bin identifiers.
bin: Character vector with the names of the formed bins.
woe: Numeric vector with the Weight of Evidence (WoE) of each bin.
iv: Numeric vector with the Information Value (IV) of each bin.
count: Total count per bin.
count_pos: Positive class count per bin.
count_neg: Negative class count per bin.
total_iv: Total Information Value of the binning.
converged: Boolean indicating whether the algorithm converged.
iterations: Number of iterations performed.
Details
This implementation uses dynamic programming to find the optimal set of bins that maximizes the total Information Value. The algorithm guarantees a global optimum within the constraints of minimum and maximum bin counts.
The mathematical formulation of the dynamic programming algorithm is:
$$DP[i][k] = \max_{j<i} \{DP[j][k-1] + IV(j+1,i)\}$$
where:
\(DP[i][k]\) is the maximum IV achievable using k bins for the first i categories
\(IV(j+1,i)\) is the IV of a bin containing categories from index j+1 to i
The Weight of Evidence (WoE) for a bin is defined as:
$$WoE_i = \ln\left(\frac{n^+_i/N^+}{n^-_i/N^-}\right)$$
where:
\(n^+_i\) is the number of positive cases in bin i
\(n^-_i\) is the number of negative cases in bin i
\(N^+\) is the total number of positive cases
\(N^-\) is the total number of negative cases
The Information Value (IV) is:
$$IV = \sum_{i=1}^{n} (p_i - q_i) \times WoE_i$$
where:
\(p_i = n^+_i/N^+\) is the proportion of positive cases in bin i
\(q_i = n^-_i/N^-\) is the proportion of negative cases in bin i
This algorithm employs Bayesian smoothing for improved stability with small sample sizes or rare categories. The smoothing applies pseudo-counts based on the overall class prevalence.
The algorithm includes these main steps:
Preprocess data and calculate category statistics
Merge rare categories based on frequency threshold
Sort categories by event rate for monotonicity
Run dynamic programming to find optimal bin boundaries
Apply post-processing to ensure monotonicity of WoE
Calculate final WoE and IV values for each bin
Advantages over greedy approaches:
Guaranteed global optimum (within bin count constraints)
Better handling of complex patterns in the data
More stable results with small sample sizes
References
Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.
Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.
Lin, X., Wang, G., & Zhang, T. (2022). Efficient monotonic binning for predictive modeling in high-dimensional spaces. Knowledge-Based Systems, 235, 107629.
Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53(284), 789-798.
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.