Optimal Categorical Binning JEDI (Joint Entropy-Driven Information Maximization)
optimal_binning_categorical_jedi.Rd
A robust categorical binning algorithm that optimizes Information Value (IV) while maintaining monotonic Weight of Evidence (WoE) relationships. This implementation employs Bayesian smoothing, adaptive monotonicity enforcement, and sophisticated information-theoretic optimization to create statistically stable and interpretable bins.
Usage
optimal_binning_categorical_jedi(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
bin_separator = "%;%",
convergence_threshold = 1e-06,
max_iterations = 1000L
)
Arguments
- target
Integer binary vector (0 or 1) representing the response variable
- feature
Character vector of categorical predictor values
- min_bins
Minimum number of output bins (default: 3). Adjusted if unique categories < min_bins
- max_bins
Maximum number of output bins (default: 5). Must be >= min_bins
- bin_cutoff
Minimum relative frequency threshold for individual bins (default: 0.05)
- max_n_prebins
Maximum number of pre-bins before optimization (default: 20)
- bin_separator
Delimiter for names of combined categories (default: "%;%")
- convergence_threshold
IV difference threshold for convergence (default: 1e-6)
- max_iterations
Maximum number of optimization iterations (default: 1000)
Value
A list containing:
id: Numeric vector with bin identifiers
bin: Character vector with bin names (concatenated categories)
woe: Numeric vector with Weight of Evidence values
iv: Numeric vector with Information Value per bin
count: Integer vector with observation counts per bin
count_pos: Integer vector with positive class counts per bin
count_neg: Integer vector with negative class counts per bin
total_iv: Total Information Value of the binning
converged: Logical indicating whether the algorithm converged
iterations: Integer count of optimization iterations performed
Details
The algorithm employs a multi-phase optimization approach based on information theory principles:
Mathematical Framework:
For a bin i, the Weight of Evidence (WoE) is calculated with Bayesian smoothing as:
$$WoE_i = \ln\left(\frac{p_i^*}{n_i^*}\right)$$
where:
\(p_i^* = \frac{n_i^+ + \alpha \cdot \pi}{N^+ + \alpha}\) is the smoothed proportion of positive cases
\(n_i^* = \frac{n_i^- + \alpha \cdot (1-\pi)}{N^- + \alpha}\) is the smoothed proportion of negative cases
\(\pi = \frac{N^+}{N^+ + N^-}\) is the overall positive rate
\(\alpha\) is the prior strength parameter (default: 0.5)
\(n_i^+\) is the count of positive cases in bin i
\(n_i^-\) is the count of negative cases in bin i
\(N^+\) is the total number of positive cases
\(N^-\) is the total number of negative cases
The Information Value (IV) for each bin is calculated as:
$$IV_i = (p_i^* - n_i^*) \times WoE_i$$
And the total IV is:
$$IV_{total} = \sum_{i=1}^{k} IV_i$$
Algorithm Phases:
Initial Binning: Creates individual bins for unique categories with comprehensive statistics
Low-Frequency Treatment: Combines rare categories (< bin_cutoff) to ensure statistical stability
Optimization: Iteratively merges bins using adaptive IV loss minimization while ensuring WoE monotonicity
Final Adjustment: Ensures bin count constraints (min_bins <= bins <= max_bins) when feasible
Key Features:
Bayesian smoothing for robust WoE estimation with small samples
Adaptive monotonicity enforcement with violation severity prioritization
Information-theoretic merging strategy that minimizes information loss
Handling of edge cases including imbalanced datasets and sparse categories
Best-solution tracking to ensure optimal results even with early convergence
References
Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.
Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and its Applications. SIAM.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.
García-Magariño, I., Medrano, C., Lombas, A. S., & Barrasa, A. (2019). A hybrid approach with agent-based simulation and clustering for sociograms. Information Sciences, 499, 47-61.
Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.
Examples
if (FALSE) { # \dontrun{
# Basic usage
result <- optimal_binning_categorical_jedi(
target = c(1,0,1,1,0),
feature = c("A","B","A","C","B"),
min_bins = 2,
max_bins = 3
)
# Rare category handling
result <- optimal_binning_categorical_jedi(
target = target_vector,
feature = feature_vector,
bin_cutoff = 0.03, # More aggressive rare category treatment
max_n_prebins = 15 # Limit on initial bins
)
# Working with more complex settings
result <- optimal_binning_categorical_jedi(
target = target_vector,
feature = feature_vector,
min_bins = 3,
max_bins = 10,
bin_cutoff = 0.01,
convergence_threshold = 1e-8, # Stricter convergence
max_iterations = 2000 # More iterations for complex problems
)
} # }