Skip to contents

Performs optimal binning for categorical variables using the Monotonic Optimal Binning (MOB) approach with enhanced statistical robustness. This implementation includes Bayesian smoothing for better stability with small samples, adaptive monotonicity enforcement, and sophisticated bin merging strategies.

Usage

optimal_binning_categorical_mob(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A character vector of categorical feature values.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum proportion of observations in a bin (default: 0.05).

max_n_prebins

Maximum number of pre-bins (default: 20).

bin_separator

Separator used for merging category names (default: "%;%").

convergence_threshold

Convergence threshold for the algorithm (default: 1e-6).

max_iterations

Maximum number of iterations for the algorithm (default: 1000).

Value

A list containing the following elements:

  • id: Numeric vector of bin identifiers.

  • bin: Character vector of bin names (merged categories).

  • woe: Numeric vector of Weight of Evidence (WoE) values for each bin.

  • iv: Numeric vector of Information Value (IV) for each bin.

  • count: Integer vector of total counts for each bin.

  • count_pos: Integer vector of positive target counts for each bin.

  • count_neg: Integer vector of negative target counts for each bin.

  • total_iv: Total Information Value of the binning.

  • converged: Logical value indicating whether the algorithm converged.

  • iterations: Integer value indicating the number of iterations run.

Details

This enhanced version of the Monotonic Optimal Binning (MOB) algorithm implements several key improvements over traditional approaches:

Mathematical Framework:

The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:

$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$

where:

  • \(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of events in bin i

  • \(q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of non-events in bin i

  • \(\pi = \frac{N^+}{N^+ + N^-}\) is the overall event rate

  • \(\alpha\) is the prior strength parameter (default: 0.5)

  • \(n_i^+\) is the count of events in bin i

  • \(n_i^-\) is the count of non-events in bin i

  • \(N^+\) is the total number of events

  • \(N^-\) is the total number of non-events

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - q_i^*) \times WoE_i$$

Algorithm Phases:

  1. Initialization: Calculate statistics for each category with Bayesian smoothing.

  2. Pre-binning: Create initial bins sorted by WoE.

  3. Rare Category Handling: Merge categories with frequency below bin_cutoff using a similarity-based approach.

  4. Monotonicity Enforcement: Ensure monotonic WoE across bins using adaptive thresholds and severity-based prioritization.

  5. Bin Optimization: Reduce number of bins to max_bins while maintaining monotonicity.

  6. Solution Tracking: Maintain the best solution found during optimization.

Key Features:

  • Bayesian smoothing for robust WoE estimation with small samples

  • Similarity-based bin merging rather than just adjacent bins

  • Adaptive monotonicity enforcement with violation severity prioritization

  • Best solution tracking to ensure optimal results

  • Efficient uniqueness handling for categories

  • Comprehensive edge case handling

  • Strict enforcement of max_bins parameter

References

  • Belotti, T., Crook, J. (2009). Credit Scoring with Macroeconomic Variables Using Survival Analysis. Journal of the Operational Research Society, 60(12), 1699-1707.

  • Mironchyk, P., Tchistiakov, V. (2017). Monotone optimal binning algorithm for credit risk modeling. arXiv preprint arXiv:1711.05095.

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- sample(LETTERS[1:5], 1000, replace = TRUE)

# Run optimal binning
result <- optimal_binning_categorical_mob(target, feature)

# View results
print(result)

# Force exactly 2 bins
result2 <- optimal_binning_categorical_mob(
  target, feature, 
  min_bins = 2, 
  max_bins = 2
)
} # }