Optimal Binning for Numerical Variables using Branch and Bound Algorithm
optimal_binning_numerical_bb.Rd
Performs optimal binning for numerical variables using a Branch and Bound approach. This method transforms continuous features into discrete bins by maximizing the statistical relationship with a binary target variable while maintaining interpretability constraints. The algorithm optimizes Weight of Evidence (WoE) and Information Value (IV) metrics commonly used in risk modeling, credit scoring, and statistical analysis.
Usage
optimal_binning_numerical_bb(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
is_monotonic = TRUE,
convergence_threshold = 1e-06,
max_iterations = 1000L
)
Arguments
- target
An integer binary vector (0 or 1) representing the target variable.
- feature
A numeric vector of feature values to be binned.
- min_bins
Minimum number of bins to generate (default: 3).
- max_bins
Maximum number of bins to generate (default: 5).
- bin_cutoff
Minimum frequency fraction for each bin (default: 0.05).
- max_n_prebins
Maximum number of pre-bins generated before optimization (default: 20).
- is_monotonic
Logical value indicating whether to enforce monotonicity in WoE (default: TRUE).
- convergence_threshold
Convergence threshold for total Information Value (IV) change (default: 1e-6).
- max_iterations
Maximum number of iterations allowed for the optimization process (default: 1000).
Value
A list containing:
- id
Numeric identifiers for each bin (1-based).
- bin
Character vector with the intervals of each bin (e.g.,
(-Inf; 0]
,(0; +Inf)
).- woe
Numeric vector with the Weight of Evidence values for each bin.
- iv
Numeric vector with the Information Value contribution for each bin.
- count
Integer vector with the total number of observations in each bin.
- count_pos
Integer vector with the number of positive observations in each bin.
- count_neg
Integer vector with the number of negative observations in each bin.
- cutpoints
Numeric vector of cut points between bins (excluding infinity).
- converged
Logical value indicating whether the algorithm converged.
- iterations
Number of iterations executed by the optimization algorithm.
- total_iv
The total Information Value of the binning solution.
Details
Algorithm Overview
The implementation follows a five-phase approach:
Input Validation: Ensures data integrity and parameter validity.
Pre-Binning:
Creates initial bins using quantile-based division
Handles special cases for limited unique values
Uses binary search for efficient observation assignment
Statistical Stabilization:
Merges bins with frequencies below the specified threshold
Ensures each bin has sufficient observations for reliable statistics
Monotonicity Enforcement (optional):
Ensures WoE values follow a consistent trend (increasing or decreasing)
Improves interpretability and aligns with business expectations
Selects optimal monotonicity direction based on IV preservation
Branch and Bound Optimization:
Iteratively merges bins with minimal IV contribution
Continues until reaching the target number of bins or convergence
Preserves predictive power while reducing complexity
Mathematical Foundation
The algorithm optimizes two key metrics:
Weight of Evidence (WoE) for bin \(i\): $$WoE_i = \ln\left(\frac{p_i/P}{n_i/N}\right)$$
Where:
\(p_i\): Number of positive cases in bin \(i\)
\(P\): Total number of positive cases
\(n_i\): Number of negative cases in bin \(i\)
\(N\): Total number of negative cases
Information Value (IV) for bin \(i\): $$IV_i = \left(\frac{p_i}{P} - \frac{n_i}{N}\right) \times WoE_i$$
The total Information Value is the sum across all bins: $$IV_{total} = \sum_{i=1}^{k} IV_i$$
Smoothing: The implementation uses Laplace smoothing to handle zero counts: $$\frac{p_i + \alpha}{P + k\alpha}, \frac{n_i + \alpha}{N + k\alpha}$$
Where:
\(\alpha\): Small constant (0.5 in this implementation)
\(k\): Number of bins
Branch and Bound Strategy
The core optimization uses a greedy iterative approach:
Start with more bins than needed (from pre-binning)
Identify the bin with the smallest IV contribution
Merge this bin with an adjacent bin
Recompute WoE and IV values
If monotonicity is required, enforce it
Repeat until target number of bins is reached or convergence
This approach minimizes information loss while reducing model complexity.
References
Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society: Series C (Applied Statistics), 8(2), 65-75.
Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.
Thomas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit Scoring and Its Applications. Society for Industrial and Applied Mathematics.
Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning, 194-202.
Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039-1082.
Examples
if (FALSE) { # \dontrun{
# Generate synthetic data
set.seed(123)
n <- 10000
feature <- rnorm(n)
# Create target with logistic relationship
target <- rbinom(n, 1, plogis(0.5 * feature))
# Apply optimal binning
result <- optimal_binning_numerical_bb(target, feature, min_bins = 3, max_bins = 5)
print(result)
# Access specific components
bins <- result$bin
woe_values <- result$woe
total_iv <- result$total_iv
# Example with custom parameters
result2 <- optimal_binning_numerical_bb(
target = target,
feature = feature,
min_bins = 2,
max_bins = 8,
bin_cutoff = 0.02,
is_monotonic = TRUE
)
} # }