Optimal Binning for Categorical Variables using MILP
optimal_binning_categorical_milp.Rd
Performs optimal binning for categorical variables using a Mixed Integer Linear Programming (MILP) inspired approach with enhanced statistical robustness. It creates optimal bins for a categorical feature based on its relationship with a binary target variable, maximizing the predictive power while respecting user-defined constraints. This implementation includes Bayesian smoothing for improved stability with small samples and sophisticated merging strategies.
Usage
optimal_binning_categorical_milp(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
bin_separator = "%;%",
convergence_threshold = 1e-06,
max_iterations = 1000L
)
Arguments
- target
An integer vector of binary target values (0 or 1).
- feature
A character vector of feature values.
- min_bins
Minimum number of bins (default: 3).
- max_bins
Maximum number of bins (default: 5).
- bin_cutoff
Minimum proportion of total observations for a bin to avoid being merged (default: 0.05).
- max_n_prebins
Maximum number of pre-bins before the optimization process (default: 20).
- bin_separator
Separator used to join categories within a bin (default: "%;%").
- convergence_threshold
Threshold for convergence of total Information Value (default: 1e-6).
- max_iterations
Maximum number of iterations for the optimization process (default: 1000).
Value
A list containing the following elements:
id: Numeric vector of bin identifiers.
bin: Character vector of bin categories.
woe: Numeric vector of Weight of Evidence (WoE) values for each bin.
iv: Numeric vector of Information Value (IV) for each bin.
count: Integer vector of total observations in each bin.
count_pos: Integer vector of positive target observations in each bin.
count_neg: Integer vector of negative target observations in each bin.
total_iv: Total Information Value of the binning.
converged: Logical indicating whether the algorithm converged.
iterations: Integer indicating the number of iterations run.
Details
This enhanced version of the Optimal Binning algorithm for categorical variables implements several key improvements over traditional approaches:
Mathematical Framework:
The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:
$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$
where:
\(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of events in bin i
\(q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of non-events in bin i
\(\pi = \frac{N^+}{N^+ + N^-}\) is the overall event rate
\(\alpha\) is the prior strength parameter (default: 0.5)
\(n_i^+\) is the count of events in bin i
\(n_i^-\) is the count of non-events in bin i
\(N^+\) is the total number of events
\(N^-\) is the total number of non-events
The Information Value (IV) for each bin is calculated as:
$$IV_i = (p_i^* - q_i^*) \times WoE_i$$
The total IV is:
$$IV_{total} = \sum_{i=1}^{k} |IV_i|$$
Algorithm Phases:
Initialization: Create bins for each unique category with comprehensive statistics.
Pre-binning: Reduce to max_n_prebins by merging similar bins based on event rates.
Rare Category Merging: Combine categories with frequency below bin_cutoff using similarity-based strategy.
Monotonicity Enforcement: Ensure monotonic relationship in WoE across bins using adaptive thresholds.
Bin Optimization: Iteratively merge bins to maximize IV while respecting constraints.
Solution Tracking: Maintain the best solution found during optimization.
Key Enhancements:
Bayesian smoothing for robust estimation of WoE with small samples
Similarity-based bin merging rather than just adjacent bins
Adaptive monotonicity enforcement with violation severity prioritization
Best solution tracking to ensure optimal results
Comprehensive handling of edge cases and rare categories
References
Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., & Mahajan, A. (2013). Mixed-integer nonlinear optimization. Acta Numerica, 22, 1-131.
Mironchyk, P., & Tchistiakov, V. (2017). Monotone optimal binning algorithm for credit risk modeling. SSRN Electronic Journal. doi:10.2139/ssrn.2978774
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.
Examples
if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- sample(LETTERS[1:10], n, replace = TRUE)
# Run optimal binning
result <- optimal_binning_categorical_milp(target, feature, min_bins = 2, max_bins = 4)
# Print results
print(result)
# Handle rare categories with lower threshold
result2 <- optimal_binning_categorical_milp(
target, feature,
bin_cutoff = 0.02,
min_bins = 2,
max_bins = 5
)
} # }