Optimal Binning for Categorical Variables using Monotonic Optimal Binning (MOB) — optimal_binning_categorical

Performs optimal binning for categorical variables using the Monotonic Optimal Binning (MOB) approach with enhanced statistical robustness. This implementation includes Bayesian smoothing for better stability with small samples, adaptive monotonicity enforcement, and sophisticated bin merging strategies.

Usage

optimal_binning_categorical_mob(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: An integer vector of binary target values (0 or 1).
feature: A character vector of categorical feature values.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum proportion of observations in a bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins (default: 20).
bin_separator: Separator used for merging category names (default: "%;%").
convergence_threshold: Convergence threshold for the algorithm (default: 1e-6).
max_iterations: Maximum number of iterations for the algorithm (default: 1000).

Value

A list containing the following elements:

id: Numeric vector of bin identifiers.
bin: Character vector of bin names (merged categories).
woe: Numeric vector of Weight of Evidence (WoE) values for each bin.
iv: Numeric vector of Information Value (IV) for each bin.
count: Integer vector of total counts for each bin.
count_pos: Integer vector of positive target counts for each bin.
count_neg: Integer vector of negative target counts for each bin.
total_iv: Total Information Value of the binning.
converged: Logical value indicating whether the algorithm converged.
iterations: Integer value indicating the number of iterations run.

Details

This enhanced version of the Monotonic Optimal Binning (MOB) algorithm implements several key improvements over traditional approaches:

Mathematical Framework:

The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:

$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$

where:

$p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}$ is the smoothed proportion of events in bin i
$q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}$ is the smoothed proportion of non-events in bin i
$\pi = \frac{N^+}{N^+ + N^-}$ is the overall event rate
$\alpha$ is the prior strength parameter (default: 0.5)
$n_i^+$ is the count of events in bin i
$n_i^-$ is the count of non-events in bin i
$N^+$ is the total number of events
$N^-$ is the total number of non-events

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - q_i^*) \times WoE_i$$

Algorithm Phases:

Initialization: Calculate statistics for each category with Bayesian smoothing.
Pre-binning: Create initial bins sorted by WoE.
Rare Category Handling: Merge categories with frequency below bin_cutoff using a similarity-based approach.
Monotonicity Enforcement: Ensure monotonic WoE across bins using adaptive thresholds and severity-based prioritization.
Bin Optimization: Reduce number of bins to max_bins while maintaining monotonicity.
Solution Tracking: Maintain the best solution found during optimization.

Key Features:

Bayesian smoothing for robust WoE estimation with small samples
Similarity-based bin merging rather than just adjacent bins
Adaptive monotonicity enforcement with violation severity prioritization
Best solution tracking to ensure optimal results
Efficient uniqueness handling for categories
Comprehensive edge case handling
Strict enforcement of max_bins parameter

References

Belotti, T., Crook, J. (2009). Credit Scoring with Macroeconomic Variables Using Survival Analysis. Journal of the Operational Research Society, 60(12), 1699-1707.
Mironchyk, P., Tchistiakov, V. (2017). Monotone optimal binning algorithm for credit risk modeling. arXiv preprint arXiv:1711.05095.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- sample(LETTERS[1:5], 1000, replace = TRUE)

# Run optimal binning
result <- optimal_binning_categorical_mob(target, feature)

# View results
print(result)

# Force exactly 2 bins
result2 <- optimal_binning_categorical_mob(
  target, feature, 
  min_bins = 2, 
  max_bins = 2
)
} # }