Skip to contents

A robust categorical binning algorithm that optimizes Information Value (IV) while maintaining monotonic Weight of Evidence (WoE) relationships. This implementation employs Bayesian smoothing, adaptive monotonicity enforcement, and sophisticated information-theoretic optimization to create statistically stable and interpretable bins.

Usage

optimal_binning_categorical_jedi(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

Integer binary vector (0 or 1) representing the response variable

feature

Character vector of categorical predictor values

min_bins

Minimum number of output bins (default: 3). Adjusted if unique categories < min_bins

max_bins

Maximum number of output bins (default: 5). Must be >= min_bins

bin_cutoff

Minimum relative frequency threshold for individual bins (default: 0.05)

max_n_prebins

Maximum number of pre-bins before optimization (default: 20)

bin_separator

Delimiter for names of combined categories (default: "%;%")

convergence_threshold

IV difference threshold for convergence (default: 1e-6)

max_iterations

Maximum number of optimization iterations (default: 1000)

Value

A list containing:

  • id: Numeric vector with bin identifiers

  • bin: Character vector with bin names (concatenated categories)

  • woe: Numeric vector with Weight of Evidence values

  • iv: Numeric vector with Information Value per bin

  • count: Integer vector with observation counts per bin

  • count_pos: Integer vector with positive class counts per bin

  • count_neg: Integer vector with negative class counts per bin

  • total_iv: Total Information Value of the binning

  • converged: Logical indicating whether the algorithm converged

  • iterations: Integer count of optimization iterations performed

Details

The algorithm employs a multi-phase optimization approach based on information theory principles:

Mathematical Framework:

For a bin i, the Weight of Evidence (WoE) is calculated with Bayesian smoothing as:

$$WoE_i = \ln\left(\frac{p_i^*}{n_i^*}\right)$$

where:

  • \(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of positive cases

  • \(n_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of negative cases

  • \(\pi = \frac{N^+}{N^+ + N^-}\) is the overall positive rate

  • \(\alpha\) is the prior strength parameter (default: 0.5)

  • \(n_i^+\) is the count of positive cases in bin i

  • \(n_i^-\) is the count of negative cases in bin i

  • \(N^+\) is the total number of positive cases

  • \(N^-\) is the total number of negative cases

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - n_i^*) \times WoE_i$$

And the total IV is:

$$IV_{total} = \sum_{i=1}^{k} IV_i$$

Algorithm Phases:

  1. Initial Binning: Creates individual bins for unique categories with comprehensive statistics

  2. Low-Frequency Treatment: Combines rare categories (< bin_cutoff) to ensure statistical stability

  3. Optimization: Iteratively merges bins using adaptive IV loss minimization while ensuring WoE monotonicity

  4. Final Adjustment: Ensures bin count constraints (min_bins <= bins <= max_bins) when feasible

Key Features:

  • Bayesian smoothing for robust WoE estimation with small samples

  • Adaptive monotonicity enforcement with violation severity prioritization

  • Information-theoretic merging strategy that minimizes information loss

  • Handling of edge cases including imbalanced datasets and sparse categories

  • Best-solution tracking to ensure optimal results even with early convergence

References

  • Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.

  • Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.

  • Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

  • García-Magariño, I., Medrano, C., Lombas, A. S., & Barrasa, A. (2019). A hybrid approach with agent-based simulation and clustering for sociograms. Information Sciences, 499, 47-61.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

Examples

if (FALSE) { # \dontrun{
# Basic usage
result <- optimal_binning_categorical_jedi(
  target = c(1,0,1,1,0),
  feature = c("A","B","A","C","B"),
  min_bins = 2,
  max_bins = 3
)

# Rare category handling
result <- optimal_binning_categorical_jedi(
  target = target_vector,
  feature = feature_vector,
  bin_cutoff = 0.03,  # More aggressive rare category treatment
  max_n_prebins = 15  # Limit on initial bins
)

# Working with more complex settings
result <- optimal_binning_categorical_jedi(
  target = target_vector,
  feature = feature_vector,
  min_bins = 3,
  max_bins = 10,
  bin_cutoff = 0.01,
  convergence_threshold = 1e-8,  # Stricter convergence
  max_iterations = 2000  # More iterations for complex problems
)
} # }