Skip to contents

This function implements an optimal binning algorithm for numerical variables using Monotonic Risk Binning with Likelihood Ratio Pre-binning (MRBLP). It transforms a continuous feature into discrete bins while preserving the monotonic relationship with the target variable and maximizing the predictive power.

Usage

optimal_binning_numerical_mrblp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  laplace_smoothing = 0.5
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A numeric vector of the continuous feature to be binned.

min_bins

Integer. The minimum number of bins to create (default: 3).

max_bins

Integer. The maximum number of bins to create (default: 5).

bin_cutoff

Numeric. The minimum proportion of observations in each bin (default: 0.05).

max_n_prebins

Integer. The maximum number of pre-bins to create during the initial binning step (default: 20).

convergence_threshold

Numeric. The threshold for convergence in the monotonic binning step (default: 1e-6).

max_iterations

Integer. The maximum number of iterations for the monotonic binning step (default: 1000).

laplace_smoothing

Numeric. Smoothing parameter for WoE calculation (default: 0.5).

Value

A list containing the following elements:

id

Bin identifiers (1-based)

bin

A character vector of bin ranges

woe

A numeric vector of Weight of Evidence (WoE) values for each bin

iv

A numeric vector of Information Value (IV) for each bin

count

An integer vector of the total count of observations in each bin

count_pos

An integer vector of the count of positive observations in each bin

count_neg

An integer vector of the count of negative observations in each bin

event_rate

A numeric vector with the proportion of positive cases in each bin

cutpoints

A numeric vector of cutpoints used to create the bins

total_iv

The total Information Value of all bins combined

converged

A logical value indicating whether the algorithm converged

iterations

An integer value indicating the number of iterations run

Details

Mathematical Framework:

Weight of Evidence (WoE): For a bin i with Laplace smoothing alpha: $$WoE_i = \ln\left(\frac{n_{1i} + \alpha}{n_{1} + m\alpha} \cdot \frac{n_{0} + m\alpha}{n_{0i} + \alpha}\right)$$ Where:

  • \(n_{1i}\) is the count of positive cases in bin \(i\)

  • \(n_{0i}\) is the count of negative cases in bin \(i\)

  • \(n_{1}\) is the total count of positive cases

  • \(n_{0}\) is the total count of negative cases

  • \(m\) is the number of bins

  • \(\alpha\) is the Laplace smoothing parameter

Information Value (IV): Summarizes predictive power across all bins: $$IV = \sum_{i} (P(X|Y=1) - P(X|Y=0)) \times WoE_i$$

Algorithm Steps:

  1. Pre-binning: Initial bins are created using equal-frequency binning.

  2. Merge Small Bins: Bins with frequency below the threshold are merged.

  3. Enforce Monotonicity: Bins that violate monotonicity in WoE are merged.

  4. Adjust Bin Count: Bins are merged/split to respect min_bins and max_bins.

  5. Calculate Metrics: Final WoE and IV values are computed with Laplace smoothing.

References

  • Belcastro, L., Marozzo, F., Talia, D., & Trunfio, P. (2020). "Big Data Analytics on Clouds." In Handbook of Big Data Technologies (pp. 101-142). Springer, Cham.

  • Zeng, Y. (2014). "Optimal Binning for Scoring Modeling." Computational Economics, 44(1), 137-149.

  • Good, I.J. (1952). "Rational Decisions." Journal of the Royal Statistical Society, Series B, 14, 107-114. (Origin of Laplace smoothing/additive smoothing)

Examples

if (FALSE) { # \dontrun{
# Generate sample data
set.seed(42)
n <- 10000
feature <- rnorm(n)
target <- rbinom(n, 1, plogis(0.5 + 0.5 * feature))

# Run optimal binning
result <- optimal_binning_numerical_mrblp(target, feature)

# View binning results
print(result)

# Plot Weight of Evidence against bins
plot(result$woe, type = "b", xlab = "Bin", ylab = "WoE",
     main = "Weight of Evidence by Bin")
abline(h = 0, lty = 2)
} # }