Optimal Binning for Numerical Variables using OSLP — optimal_binning_numerical

Performs optimal binning for numerical variables using the Optimal Supervised Learning Partitioning (OSLP) approach. This advanced binning algorithm creates bins that maximize predictive power while preserving interpretability through monotonic Weight of Evidence (WoE) values.

Usage

optimal_binning_numerical_oslp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  laplace_smoothing = 0.5
)

Arguments

target: A numeric vector of binary target values (0 or 1).
feature: A numeric vector of feature values.
min_bins: Minimum number of bins (default: 3, must be >= 2).
max_bins: Maximum number of bins (default: 5, must be > min_bins).
bin_cutoff: Minimum proportion of total observations for a bin to avoid being merged (default: 0.05, must be in (0, 1)).
max_n_prebins: Maximum number of pre-bins before optimization (default: 20).
convergence_threshold: Threshold for convergence (default: 1e-6).
max_iterations: Maximum number of iterations (default: 1000).
laplace_smoothing: Smoothing parameter for WoE calculation (default: 0.5).

Value

A list containing:

id: Numeric vector of bin identifiers (1-based).
bin: Character vector of bin labels.
woe: Numeric vector of Weight of Evidence (WoE) values for each bin.
iv: Numeric vector of Information Value (IV) for each bin.
count: Integer vector of total count of observations in each bin.
count_pos: Integer vector of positive class count in each bin.
count_neg: Integer vector of negative class count in each bin.
event_rate: Numeric vector of positive class rate in each bin.
cutpoints: Numeric vector of cutpoints used to create the bins.
total_iv: Numeric value of total Information Value across all bins.
converged: Logical value indicating whether the algorithm converged.
iterations: Integer value indicating the number of iterations run.

Details

Mathematical Framework:

Weight of Evidence (WoE): For a bin $i$ with Laplace smoothing alpha: $$WoE_i = \ln\left(\frac{n_{1i} + \alpha}{n_{1} + m\alpha} \cdot \frac{n_{0} + m\alpha}{n_{0i} + \alpha}\right)$$ Where:

$n_{1i}$ is the count of positive cases in bin $i$
$n_{0i}$ is the count of negative cases in bin $i$
$n_{1}$ is the total count of positive cases
$n_{0}$ is the total count of negative cases
$m$ is the number of bins
$\alpha$ is the Laplace smoothing parameter

Information Value (IV): Summarizes predictive power across all bins: $$IV = \sum_{i} (P(X|Y=1) - P(X|Y=0)) \times WoE_i$$

Algorithm Steps:

Pre-binning: Initial bins created using quantile-based approach
Merge Small Bins: Bins with frequency below threshold are merged
Enforce Monotonicity: Bins that violate monotonicity in WoE are merged
Optimize Bin Count: Bins are merged if exceeding max_bins
Calculate Metrics: Final WoE, IV, and event rates are computed

References

Belcastro, L., Marozzo, F., Talia, D., & Trunfio, P. (2020). "Big Data Analytics." Handbook of Big Data Technologies. Springer.
Mironchyk, P., & Tchistiakov, V. (2017). "Monotone Optimal Binning Algorithm for Credit Risk Modeling." SSRN 2987720.
Good, I.J. (1952). "Rational Decisions." Journal of the Royal Statistical Society, Series B, 14, 107-114. (Origin of Laplace smoothing)
Thomas, L.C. (2009). "Consumer Credit Models: Pricing, Profit, and Portfolios." Oxford University Press.

Examples

if (FALSE) { # \dontrun{
# Sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- rnorm(n)

# Perform optimal binning
result <- optimal_binning_numerical_oslp(target, feature,
                                         min_bins = 2, max_bins = 4)

# Print results
print(result)

# Visualize WoE against bins
barplot(result$woe, names.arg = result$bin, las = 2,
        main = "Weight of Evidence by Bin",
        ylab = "WoE")
abline(h = 0, lty = 2)
} # }