Skip to contents

Performs optimal binning for categorical variables using a Monotonic Binning Algorithm (MBA), which combines Weight of Evidence (WOE) and Information Value (IV) methods with monotonicity constraints. This implementation includes Bayesian smoothing for robust estimation with small samples, adaptive monotonicity enforcement, and efficient handling of rare categories.

Usage

optimal_binning_categorical_mba(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A character vector of categorical feature values.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum frequency for a category to be considered as a separate bin (default: 0.05).

max_n_prebins

Maximum number of pre-bins before merging (default: 20).

bin_separator

String used to separate category names when merging bins (default: "%;%").

convergence_threshold

Threshold for convergence in optimization (default: 1e-6).

max_iterations

Maximum number of iterations for optimization (default: 1000).

Value

A list containing:

  • id: Numeric vector of bin identifiers.

  • bin: Character vector of bin labels.

  • woe: Numeric vector of Weight of Evidence values for each bin.

  • iv: Numeric vector of Information Value for each bin.

  • count: Integer vector of total counts for each bin.

  • count_pos: Integer vector of positive target counts for each bin.

  • count_neg: Integer vector of negative target counts for each bin.

  • total_iv: Total Information Value of the binning.

  • converged: Logical value indicating whether the algorithm converged.

  • iterations: Integer indicating the number of iterations run.

Details

This algorithm implements an enhanced version of the monotonic binning approach with several key features:

  1. Bayesian Smoothing: Applies prior pseudo-counts proportional to the overall class prevalence to improve stability for small bins and rare categories.

  2. Adaptive Monotonicity: Uses context-aware thresholds based on the average WoE difference between bins to better handle datasets with varying scales.

  3. Similarity-Based Merging: Merges bins based on event rate similarity rather than just adjacency, which better preserves information content.

  4. Best Solution Tracking: Maintains the best solution found during optimization, even if the algorithm doesn't formally converge.

The mathematical foundation of the algorithm is based on the following concepts:

The Weight of Evidence (WoE) with Bayesian smoothing is calculated as:

$$WoE_i = \ln\left(\frac{p_i^*}{q_i^*}\right)$$

where:

  • \(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of positive cases in bin i

  • \(q_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of negative cases in bin i

  • \(\pi = \frac{N^+}{N^+ + N^-}\) is the overall positive rate

  • \(\alpha\) is the prior strength parameter (default: 0.5)

  • \(n_i^+\) is the count of positive cases in bin i

  • \(n_i^-\) is the count of negative cases in bin i

  • \(N^+\) is the total number of positive cases

  • \(N^-\) is the total number of negative cases

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - q_i^*) \times WoE_i$$

And the total IV is:

$$IV_{total} = \sum_{i=1}^{k} |IV_i|$$

The algorithm performs the following steps:

  1. Input validation and preprocessing

  2. Initial pre-binning based on frequency

  3. Merging of rare categories based on bin_cutoff

  4. Calculation of WoE and IV with Bayesian smoothing

  5. Enforcement of monotonicity constraints

  6. Optimization of bin count through iterative merging

References

  • Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.

  • Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.

  • Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

  • Lin, X., Wang, G., & Zhang, T. (2022). Efficient monotonic binning for predictive modeling in high-dimensional spaces. Knowledge-Based Systems, 235, 107629.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- sample(LETTERS[1:5], 1000, replace = TRUE)

# Run optimal binning
result <- optimal_binning_categorical_mba(feature, target)

# View results
print(result)

# Handle rare categories more aggressively
result2 <- optimal_binning_categorical_mba(
  feature, target, 
  bin_cutoff = 0.1, 
  min_bins = 2, 
  max_bins = 4
)
} # }