Skip to contents

Performs optimal binning for categorical variables using a Mixed Integer Linear Programming (MILP) inspired approach with enhanced statistical robustness. It creates optimal bins for a categorical feature based on its relationship with a binary target variable, maximizing the predictive power while respecting user-defined constraints. This implementation includes Bayesian smoothing for improved stability with small samples and sophisticated merging strategies.

Usage

optimal_binning_categorical_milp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A character vector of feature values.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum proportion of total observations for a bin to avoid being merged (default: 0.05).

max_n_prebins

Maximum number of pre-bins before the optimization process (default: 20).

bin_separator

Separator used to join categories within a bin (default: "%;%").

convergence_threshold

Threshold for convergence of total Information Value (default: 1e-6).

max_iterations

Maximum number of iterations for the optimization process (default: 1000).

Value

A list containing the following elements:

  • id: Numeric vector of bin identifiers.

  • bin: Character vector of bin categories.

  • woe: Numeric vector of Weight of Evidence (WoE) values for each bin.

  • iv: Numeric vector of Information Value (IV) for each bin.

  • count: Integer vector of total observations in each bin.

  • count_pos: Integer vector of positive target observations in each bin.

  • count_neg: Integer vector of negative target observations in each bin.

  • total_iv: Total Information Value of the binning.

  • converged: Logical indicating whether the algorithm converged.

  • iterations: Integer indicating the number of iterations run.

Details

This enhanced version of the Optimal Binning algorithm for categorical variables implements several key improvements over traditional approaches:

Mathematical Framework:

The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:

$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$

where:

  • \(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of events in bin i

  • \(q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of non-events in bin i

  • \(\pi = \frac{N^+}{N^+ + N^-}\) is the overall event rate

  • \(\alpha\) is the prior strength parameter (default: 0.5)

  • \(n_i^+\) is the count of events in bin i

  • \(n_i^-\) is the count of non-events in bin i

  • \(N^+\) is the total number of events

  • \(N^-\) is the total number of non-events

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - q_i^*) \times WoE_i$$

The total IV is:

$$IV_{total} = \sum_{i=1}^{k} |IV_i|$$

Algorithm Phases:

  1. Initialization: Create bins for each unique category with comprehensive statistics.

  2. Pre-binning: Reduce to max_n_prebins by merging similar bins based on event rates.

  3. Rare Category Merging: Combine categories with frequency below bin_cutoff using similarity-based strategy.

  4. Monotonicity Enforcement: Ensure monotonic relationship in WoE across bins using adaptive thresholds.

  5. Bin Optimization: Iteratively merge bins to maximize IV while respecting constraints.

  6. Solution Tracking: Maintain the best solution found during optimization.

Key Enhancements:

  • Bayesian smoothing for robust estimation of WoE with small samples

  • Similarity-based bin merging rather than just adjacent bins

  • Adaptive monotonicity enforcement with violation severity prioritization

  • Best solution tracking to ensure optimal results

  • Comprehensive handling of edge cases and rare categories

References

  • Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., & Mahajan, A. (2013). Mixed-integer nonlinear optimization. Acta Numerica, 22, 1-131.

  • Mironchyk, P., & Tchistiakov, V. (2017). Monotone optimal binning algorithm for credit risk modeling. SSRN Electronic Journal. doi:10.2139/ssrn.2978774

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- sample(LETTERS[1:10], n, replace = TRUE)

# Run optimal binning
result <- optimal_binning_categorical_milp(target, feature, min_bins = 2, max_bins = 4)

# Print results
print(result)

# Handle rare categories with lower threshold
result2 <- optimal_binning_categorical_milp(
  target, feature, 
  bin_cutoff = 0.02,
  min_bins = 2, 
  max_bins = 5
)
} # }