Optimal Binning for Numerical Variables using Equal-Width Binning — optimal_binning_numerical

Performs optimal binning for numerical variables using equal-width intervals as a starting point, followed by a suite of optimization steps. This method balances predictive power and interpretability by creating statistically stable bins with a strong relationship to the target variable. The algorithm is particularly useful for risk modeling, credit scoring, and feature engineering in classification tasks.

Usage

optimal_binning_numerical_ewb(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  is_monotonic = TRUE,
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target: Integer binary vector (0 or 1) representing the target variable.
feature: Numeric vector with the values of the feature to be binned.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum fraction of observations each bin must contain (default: 0.05).
max_n_prebins: Maximum number of pre-bins before optimization (default: 20).
is_monotonic: Logical indicating whether to enforce monotonicity in WoE (default: TRUE).
convergence_threshold: Convergence threshold for optimization process (default: 1e-6).
max_iterations: Maximum number of iterations allowed (default: 1000).

Value

A list containing:

id: Numeric identifiers for each bin (1-based indexing).
bin: Character vector with the interval specification of each bin (e.g., "(-Inf;0.5]").
woe: Numeric vector with the Weight of Evidence values for each bin.
iv: Numeric vector with the Information Value contribution for each bin.
count: Integer vector with the total number of observations in each bin.
count_pos: Integer vector with the number of positive observations in each bin.
count_neg: Integer vector with the number of negative observations in each bin.
cutpoints: Numeric vector with the cut points between bins (excluding infinity).
converged: Logical value indicating whether the algorithm converged.
iterations: Number of iterations performed by the algorithm.
total_iv: Total Information Value of the binning solution.

Details

Algorithm Overview

The implementation follows a multi-stage approach:

Pre-processing:
- Validation of inputs and handling of missing values
- Special processing for features with few unique values
Equal-Width Binning:
- Division of the feature range into intervals of equal width
- Initial assignment of observations to bins
Statistical Optimization:
- Merging of rare bins with frequencies below threshold
- WoE monotonicity enforcement (optional)
- Optimization to meet maximum bins constraint
Metric Calculation:
- Weight of Evidence (WoE) and Information Value (IV) computation

Mathematical Foundation

The algorithm uses two key metrics from information theory:

Weight of Evidence (WoE) for bin $i$: $$WoE_i = \ln\left(\frac{p_i/P}{n_i/N}\right)$$

Where:
- $p_i$: Number of positive cases in bin $i$
- $P$: Total number of positive cases
- $n_i$: Number of negative cases in bin $i$
- $N$: Total number of negative cases
Information Value (IV) for bin $i$: $$IV_i = \left(\frac{p_i}{P} - \frac{n_i}{N}\right) \times WoE_i$$

The total Information Value is the sum across all bins: $$IV_{total} = \sum_{i=1}^{k} IV_i$$
Laplace Smoothing: To handle zero counts, the algorithm employs Laplace smoothing: $$\frac{p_i + \alpha}{P + k\alpha}, \frac{n_i + \alpha}{N + k\alpha}$$

Where:
- $\alpha$: Smoothing factor (0.5 in this implementation)
- $k$: Number of bins

Monotonicity Enforcement

When is_monotonic = TRUE, the algorithm ensures that WoE values either consistently increase or decrease across bins. This property is desirable for:

Interpretability: Monotonic relationships are easier to explain
Robustness: Reduces overfitting and improves stability
Business logic: Aligns with domain knowledge expectations

The algorithm determines the preferred monotonicity direction (increasing or decreasing) based on the initial bins and proceeds to merge bins that violate this pattern while minimizing information loss.

Handling Edge Cases

The algorithm includes special handling for:

Missing values (NaN)
Features with few unique values
Nearly constant features
Highly imbalanced target distributions

References

Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning, 194-202.

García, S., Luengo, J., Sáez, J. A., López, V., & Herrera, F. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.

Thomas, L. C. (2009). Consumer Credit Models: Pricing, Profit and Portfolios. Oxford University Press.

Zeng, Y. (2014). Univariate feature selection and binner. arXiv preprint arXiv:1410.5420.

Examples

if (FALSE) { # \dontrun{
# Generate synthetic data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- rnorm(1000)

# Basic usage
result <- optimal_binning_numerical_ewb(target, feature)
print(result)

# Custom parameters
result_custom <- optimal_binning_numerical_ewb(
  target = target,
  feature = feature,
  min_bins = 2,
  max_bins = 8,
  bin_cutoff = 0.03,
  is_monotonic = TRUE
)

# Extract cutpoints for use in prediction
cutpoints <- result$cutpoints

# Calculate total information value
total_iv <- result$total_iv
} # }