Optimal Categorical Binning JEDI (Joint Entropy-Driven Information Maximization) — optimal_binning_categorical

A robust categorical binning algorithm that optimizes Information Value (IV) while maintaining monotonic Weight of Evidence (WoE) relationships. This implementation employs Bayesian smoothing, adaptive monotonicity enforcement, and sophisticated information-theoretic optimization to create statistically stable and interpretable bins.

Usage

optimal_binning_categorical_jedi(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: Integer binary vector (0 or 1) representing the response variable
feature: Character vector of categorical predictor values
min_bins: Minimum number of output bins (default: 3). Adjusted if unique categories < min_bins
max_bins: Maximum number of output bins (default: 5). Must be >= min_bins
bin_cutoff: Minimum relative frequency threshold for individual bins (default: 0.05)
max_n_prebins: Maximum number of pre-bins before optimization (default: 20)
bin_separator: Delimiter for names of combined categories (default: "%;%")
convergence_threshold: IV difference threshold for convergence (default: 1e-6)
max_iterations: Maximum number of optimization iterations (default: 1000)

Value

A list containing:

id: Numeric vector with bin identifiers
bin: Character vector with bin names (concatenated categories)
woe: Numeric vector with Weight of Evidence values
iv: Numeric vector with Information Value per bin
count: Integer vector with observation counts per bin
count_pos: Integer vector with positive class counts per bin
count_neg: Integer vector with negative class counts per bin
total_iv: Total Information Value of the binning
converged: Logical indicating whether the algorithm converged
iterations: Integer count of optimization iterations performed

Details

The algorithm employs a multi-phase optimization approach based on information theory principles:

Mathematical Framework:

For a bin i, the Weight of Evidence (WoE) is calculated with Bayesian smoothing as:

$$WoE_i = \ln\left(\frac{p_i^*}{n_i^*}\right)$$

where:

$p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}$ is the smoothed proportion of positive cases
$n_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}$ is the smoothed proportion of negative cases
$\pi = \frac{N^+}{N^+ + N^-}$ is the overall positive rate
$\alpha$ is the prior strength parameter (default: 0.5)
$n_i^+$ is the count of positive cases in bin i
$n_i^-$ is the count of negative cases in bin i
$N^+$ is the total number of positive cases
$N^-$ is the total number of negative cases

The Information Value (IV) for each bin is calculated as:

$$IV_i = (p_i^* - n_i^*) \times WoE_i$$

And the total IV is:

$$IV_{total} = \sum_{i=1}^{k} IV_i$$

Algorithm Phases:

Initial Binning: Creates individual bins for unique categories with comprehensive statistics
Low-Frequency Treatment: Combines rare categories (< bin_cutoff) to ensure statistical stability
Optimization: Iteratively merges bins using adaptive IV loss minimization while ensuring WoE monotonicity
Final Adjustment: Ensures bin count constraints (min_bins <= bins <= max_bins) when feasible

Key Features:

Bayesian smoothing for robust WoE estimation with small samples
Adaptive monotonicity enforcement with violation severity prioritization
Information-theoretic merging strategy that minimizes information loss
Handling of edge cases including imbalanced datasets and sparse categories
Best-solution tracking to ensure optimal results even with early convergence

References

Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.
Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.
García-Magariño, I., Medrano, C., Lombas, A. S., & Barrasa, A. (2019). A hybrid approach with agent-based simulation and clustering for sociograms. Information Sciences, 499, 47-61.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

Examples

if (FALSE) { # \dontrun{
# Basic usage
result <- optimal_binning_categorical_jedi(
  target = c(1,0,1,1,0),
  feature = c("A","B","A","C","B"),
  min_bins = 2,
  max_bins = 3
)

# Rare category handling
result <- optimal_binning_categorical_jedi(
  target = target_vector,
  feature = feature_vector,
  bin_cutoff = 0.03,  # More aggressive rare category treatment
  max_n_prebins = 15  # Limit on initial bins
)

# Working with more complex settings
result <- optimal_binning_categorical_jedi(
  target = target_vector,
  feature = feature_vector,
  min_bins = 3,
  max_bins = 10,
  bin_cutoff = 0.01,
  convergence_threshold = 1e-8,  # Stricter convergence
  max_iterations = 2000  # More iterations for complex problems
)
} # }