Optimal Binning for Categorical Variables using Monotonic Binning Algorithm (MBA) — optimal_binning_categorical

Performs optimal binning for categorical variables using a Monotonic Binning Algorithm (MBA), which combines Weight of Evidence (WOE) and Information Value (IV) methods with monotonicity constraints. This implementation includes Bayesian smoothing for robust estimation with small samples, adaptive monotonicity enforcement, and efficient handling of rare categories.

Usage

optimal_binning_categorical_mba(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: An integer vector of binary target values (0 or 1).
feature: A character vector of categorical feature values.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum frequency for a category to be considered as a separate bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins before merging (default: 20).
bin_separator: String used to separate category names when merging bins (default: "%;%").
convergence_threshold: Threshold for convergence in optimization (default: 1e-6).
max_iterations: Maximum number of iterations for optimization (default: 1000).

Value

A list containing:

id: Numeric vector of bin identifiers.
bin: Character vector of bin labels.
woe: Numeric vector of Weight of Evidence values for each bin.
iv: Numeric vector of Information Value for each bin.
count: Integer vector of total counts for each bin.
count_pos: Integer vector of positive target counts for each bin.
count_neg: Integer vector of negative target counts for each bin.
total_iv: Total Information Value of the binning.
converged: Logical value indicating whether the algorithm converged.
iterations: Integer indicating the number of iterations run.

Details

This algorithm implements an enhanced version of the monotonic binning approach with several key features:

Bayesian Smoothing: Applies prior pseudo-counts proportional to the overall class prevalence to improve stability for small bins and rare categories.
Adaptive Monotonicity: Uses context-aware thresholds based on the average WoE difference between bins to better handle datasets with varying scales.
Similarity-Based Merging: Merges bins based on event rate similarity rather than just adjacency, which better preserves information content.
Best Solution Tracking: Maintains the best solution found during optimization, even if the algorithm doesn't formally converge.

The mathematical foundation of the algorithm is based on the following concepts:

The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:

$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$

where:

$p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}$ is the smoothed proportion of positive cases in bin i
$q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}$ is the smoothed proportion of negative cases in bin i
$\pi = \frac{N^+}{N^+ + N^-}$ is the overall positive rate
$\alpha$ is the prior strength parameter (default: 0.5)
$n_i^+$ is the count of positive cases in bin i
$n_i^-$ is the count of negative cases in bin i
$N^+$ is the total number of positive cases
$N^-$ is the total number of negative cases

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - q_i^*) \times WoE_i$$

And the total IV is:

$$IV_{total} = \sum_{i=1}^{k} |IV_i|$$

The algorithm performs the following steps:

Input validation and preprocessing
Initial pre-binning based on frequency
Merging of rare categories based on bin_cutoff
Calculation of WoE and IV with Bayesian smoothing
Enforcement of monotonicity constraints
Optimization of bin count through iterative merging

References

Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.
Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.
Lin, X., Wang, G., & Zhang, T. (2022). Efficient monotonic binning for predictive modeling in high-dimensional spaces. Knowledge-Based Systems, 235, 107629.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- sample(LETTERS[1:5], 1000, replace = TRUE)

# Run optimal binning
result <- optimal_binning_categorical_mba(feature, target)

# View results
print(result)

# Handle rare categories more aggressively
result2 <- optimal_binning_categorical_mba(
  feature, target, 
  bin_cutoff = 0.1, 
  min_bins = 2, 
  max_bins = 4
)
} # }