Skip to contents

Performs optimal binning for numerical variables using equal-width intervals as a starting point, followed by a suite of optimization steps. This method balances predictive power and interpretability by creating statistically stable bins with a strong relationship to the target variable. The algorithm is particularly useful for risk modeling, credit scoring, and feature engineering in classification tasks.

Usage

optimal_binning_numerical_ewb(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  is_monotonic = TRUE,
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

Integer binary vector (0 or 1) representing the target variable.

feature

Numeric vector with the values of the feature to be binned.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum fraction of observations each bin must contain (default: 0.05).

max_n_prebins

Maximum number of pre-bins before optimization (default: 20).

is_monotonic

Logical indicating whether to enforce monotonicity in WoE (default: TRUE).

convergence_threshold

Convergence threshold for optimization process (default: 1e-6).

max_iterations

Maximum number of iterations allowed (default: 1000).

Value

A list containing:

id

Numeric identifiers for each bin (1-based indexing).

bin

Character vector with the interval specification of each bin (e.g., "(-Inf;0.5]").

woe

Numeric vector with the Weight of Evidence values for each bin.

iv

Numeric vector with the Information Value contribution for each bin.

count

Integer vector with the total number of observations in each bin.

count_pos

Integer vector with the number of positive observations in each bin.

count_neg

Integer vector with the number of negative observations in each bin.

cutpoints

Numeric vector with the cut points between bins (excluding infinity).

converged

Logical value indicating whether the algorithm converged.

iterations

Number of iterations performed by the algorithm.

total_iv

Total Information Value of the binning solution.

Details

Algorithm Overview

The implementation follows a multi-stage approach:

  1. Pre-processing:

    • Validation of inputs and handling of missing values

    • Special processing for features with few unique values

  2. Equal-Width Binning:

    • Division of the feature range into intervals of equal width

    • Initial assignment of observations to bins

  3. Statistical Optimization:

    • Merging of rare bins with frequencies below threshold

    • WoE monotonicity enforcement (optional)

    • Optimization to meet maximum bins constraint

  4. Metric Calculation:

    • Weight of Evidence (WoE) and Information Value (IV) computation

Mathematical Foundation

The algorithm uses two key metrics from information theory:

  1. Weight of Evidence (WoE) for bin \(i\): $$WoE_i = \ln\left(\frac{p_i/P}{n_i/N}\right)$$

    Where:

    • \(p_i\): Number of positive cases in bin \(i\)

    • \(P\): Total number of positive cases

    • \(n_i\): Number of negative cases in bin \(i\)

    • \(N\): Total number of negative cases

  2. Information Value (IV) for bin \(i\): $$IV_i = \left(\frac{p_i}{P} - \frac{n_i}{N}\right) \times WoE_i$$

    The total Information Value is the sum across all bins: $$IV_{total} = \sum_{i=1}^{k} IV_i$$

  3. Laplace Smoothing: To handle zero counts, the algorithm employs Laplace smoothing: $$\frac{p_i + \alpha}{P + k\alpha}, \frac{n_i + \alpha}{N + k\alpha}$$

    Where:

    • \(\alpha\): Smoothing factor (0.5 in this implementation)

    • \(k\): Number of bins

Monotonicity Enforcement

When is_monotonic = TRUE, the algorithm ensures that WoE values either consistently increase or decrease across bins. This property is desirable for:

  • Interpretability: Monotonic relationships are easier to explain

  • Robustness: Reduces overfitting and improves stability

  • Business logic: Aligns with domain knowledge expectations

The algorithm determines the preferred monotonicity direction (increasing or decreasing) based on the initial bins and proceeds to merge bins that violate this pattern while minimizing information loss.

Handling Edge Cases

The algorithm includes special handling for:

  • Missing values (NaN)

  • Features with few unique values

  • Nearly constant features

  • Highly imbalanced target distributions

References

Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning, 194-202.

García, S., Luengo, J., Sáez, J. A., López, V., & Herrera, F. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.

Thomas, L. C. (2009). Consumer Credit Models: Pricing, Profit and Portfolios. Oxford University Press.

Zeng, Y. (2014). Univariate feature selection and binner. arXiv preprint arXiv:1410.5420.

Examples

if (FALSE) { # \dontrun{
# Generate synthetic data
set.seed(123)
target <- sample(0:1, 1000, replace = TRUE)
feature <- rnorm(1000)

# Basic usage
result <- optimal_binning_numerical_ewb(target, feature)
print(result)

# Custom parameters
result_custom <- optimal_binning_numerical_ewb(
  target = target,
  feature = feature,
  min_bins = 2,
  max_bins = 8,
  bin_cutoff = 0.03,
  is_monotonic = TRUE
)

# Extract cutpoints for use in prediction
cutpoints <- result$cutpoints

# Calculate total information value
total_iv <- result$total_iv
} # }