Optimal Binning for Categorical Variables using Sliding Window Binning (SWB) — optimal_binning_categorical

This function performs optimal binning for categorical variables using a Sliding Window Binning (SWB) approach. The goal is to generate bins with good predictive power (IV) while maintaining monotonicity of Weight of Evidence (WoE). This implementation includes statistical robustness enhancements through Laplace smoothing and Jensen-Shannon divergence for bin similarity measurement.

Usage

optimal_binning_categorical_swb(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: Integer binary vector (0 or 1) representing the response variable.
feature: Character vector with the categories of the explanatory variable.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum frequency to consider a category as a separate bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins before merging (default: 20).
bin_separator: Separator used when concatenating category names in each bin (default: "%;%").
convergence_threshold: Threshold for IV convergence (default: 1e-6).
max_iterations: Maximum number of iterations for optimization (default: 1000).

Value

A list containing:

id: Numeric identifiers for each bin.
bin: String vector with the names of the bins.
woe: Numeric vector with WoE values for each bin.
iv: Numeric vector with IV values for each bin.
count: Integer vector with the total count in each bin.
count_pos: Integer vector with the count of positives (target=1) in each bin.
count_neg: Integer vector with the count of negatives (target=0) in each bin.
event_rate: Numeric vector with the event rate (proportion of target=1) in each bin.
converged: Logical value indicating whether the algorithm converged.
iterations: Integer value indicating how many iterations were executed.
total_iv: Total Information Value across all bins.

Details

Statistical Methodology

The Sliding Window Binning (SWB) algorithm for categorical variables optimizes binning based on the statistical concepts of Weight of Evidence (WoE) and Information Value (IV):

Weight of Evidence measures the predictive power of a bin: $$WoE_i = \ln\left(\frac{P(X \in Bin_i | Y = 1)}{P(X \in Bin_i | Y = 0)}\right)$$

With Laplace smoothing applied for robustness: $$WoE_i = \ln\left(\frac{(n_{i+} + \alpha)/(n_{+} + 2\alpha)}{(n_{i-} + \alpha)/(n_{-} + 2\alpha)}\right)$$

Where:

$n_{i+}$ is the number of positive cases (target=1) in bin i
$n_{i-}$ is the number of negative cases (target=0) in bin i
$n_{+}$ is the total number of positive cases
$n_{-}$ is the total number of negative cases
$\alpha$ is the Laplace smoothing parameter (default: 0.5)

Information Value measures the overall predictive power: $$IV_i = \left(P(X \in Bin_i | Y = 1) - P(X \in Bin_i | Y = 0)\right) \times WoE_i$$ $$IV_{total} = \sum_{i=1}^{k} IV_i$$

Algorithm Steps

Initialize bins for each category, grouping rare categories (below bin_cutoff).
Special handling for variables with 1-2 levels: no optimization, just calculate metrics.
For variables with more levels: a. Sort bins by WoE values b. Iteratively merge similar bins based on Jensen-Shannon divergence and IV loss c. Enforce monotonicity of WoE across bins d. Optimize until constraints (min_bins, max_bins) are satisfied

Bin Similarity Measurement

Bins are merged based on statistical similarity measured using Jensen-Shannon divergence: $$JS(P||Q) = \frac{1}{2}KL(P||M) + \frac{1}{2}KL(Q||M)$$

Where:

$KL$ is the Kullback-Leibler divergence
$M = \frac{1}{2}(P+Q)$ is the midpoint distribution
$P$ and $Q$ are the event rate distributions of two bins

References

Beltrán, C., et al. (2022). Weight of Evidence (WoE) and Information Value (IV): A novel implementation for predictive modeling in credit scoring. Expert Systems with Applications, 183, 115351.
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86.

Examples

if (FALSE) { # \dontrun{
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- sample(LETTERS[1:5], 1000, replace = TRUE)
result <- optimal_binning_categorical_swb(target, feature)
print(result)
} # }