Optimal Binning for Numerical Variables using Branch and Bound Algorithm — optimal_binning_numerical

Performs optimal binning for numerical variables using a Branch and Bound approach. This method transforms continuous features into discrete bins by maximizing the statistical relationship with a binary target variable while maintaining interpretability constraints. The algorithm optimizes Weight of Evidence (WoE) and Information Value (IV) metrics commonly used in risk modeling, credit scoring, and statistical analysis.

Usage

optimal_binning_numerical_bb(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  is_monotonic = TRUE,
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: An integer binary vector (0 or 1) representing the target variable.
feature: A numeric vector of feature values to be binned.
min_bins: Minimum number of bins to generate (default: 3).
max_bins: Maximum number of bins to generate (default: 5).
bin_cutoff: Minimum frequency fraction for each bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins generated before optimization (default: 20).
is_monotonic: Logical value indicating whether to enforce monotonicity in WoE (default: TRUE).
convergence_threshold: Convergence threshold for total Information Value (IV) change (default: 1e-6).
max_iterations: Maximum number of iterations allowed for the optimization process (default: 1000).

Value

A list containing:

id: Numeric identifiers for each bin (1-based).
bin: Character vector with the intervals of each bin (e.g., (-Inf; 0], (0; +Inf)).
woe: Numeric vector with the Weight of Evidence values for each bin.
iv: Numeric vector with the Information Value contribution for each bin.
count: Integer vector with the total number of observations in each bin.
count_pos: Integer vector with the number of positive observations in each bin.
count_neg: Integer vector with the number of negative observations in each bin.
cutpoints: Numeric vector of cut points between bins (excluding infinity).
converged: Logical value indicating whether the algorithm converged.
iterations: Number of iterations executed by the optimization algorithm.
total_iv: The total Information Value of the binning solution.

Details

Algorithm Overview

The implementation follows a five-phase approach:

Input Validation: Ensures data integrity and parameter validity.
Pre-Binning:
- Creates initial bins using quantile-based division
- Handles special cases for limited unique values
- Uses binary search for efficient observation assignment
Statistical Stabilization:
- Merges bins with frequencies below the specified threshold
- Ensures each bin has sufficient observations for reliable statistics
Monotonicity Enforcement (optional):
- Ensures WoE values follow a consistent trend (increasing or decreasing)
- Improves interpretability and aligns with business expectations
- Selects optimal monotonicity direction based on IV preservation
Branch and Bound Optimization:
- Iteratively merges bins with minimal IV contribution
- Continues until reaching the target number of bins or convergence
- Preserves predictive power while reducing complexity

Mathematical Foundation

The algorithm optimizes two key metrics:

Weight of Evidence (WoE) for bin $i$: $$WoE_i = \ln\left(\frac{p_i/P}{n_i/N}\right)$$

Where:
- $p_i$: Number of positive cases in bin $i$
- $P$: Total number of positive cases
- $n_i$: Number of negative cases in bin $i$
- $N$: Total number of negative cases
Information Value (IV) for bin $i$: $$IV_i = \left(\frac{p_i}{P} - \frac{n_i}{N}\right) \times WoE_i$$

The total Information Value is the sum across all bins: $$IV_{total} = \sum_{i=1}^{k} IV_i$$
Smoothing: The implementation uses Laplace smoothing to handle zero counts: $$\frac{p_i + \alpha}{P + k\alpha}, \frac{n_i + \alpha}{N + k\alpha}$$

Where:
- $\alpha$: Small constant (0.5 in this implementation)
- $k$: Number of bins

Branch and Bound Strategy

The core optimization uses a greedy iterative approach:

Start with more bins than needed (from pre-binning)
Identify the bin with the smallest IV contribution
Merge this bin with an adjacent bin
Recompute WoE and IV values
If monotonicity is required, enforce it
Repeat until target number of bins is reached or convergence

This approach minimizes information loss while reducing model complexity.

References

Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society: Series C (Applied Statistics), 8(2), 65-75.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.

Thomas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit Scoring and Its Applications. Society for Industrial and Applied Mathematics.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning, 194-202.

Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039-1082.

Examples

if (FALSE) { # \dontrun{
# Generate synthetic data
set.seed(123)
n <- 10000
feature <- rnorm(n)
# Create target with logistic relationship
target <- rbinom(n, 1, plogis(0.5 * feature))

# Apply optimal binning
result <- optimal_binning_numerical_bb(target, feature, min_bins = 3, max_bins = 5)
print(result)

# Access specific components
bins <- result$bin
woe_values <- result$woe
total_iv <- result$total_iv

# Example with custom parameters
result2 <- optimal_binning_numerical_bb(
  target = target,
  feature = feature,
  min_bins = 2,
  max_bins = 8,
  bin_cutoff = 0.02,
  is_monotonic = TRUE
)
} # }