Skip to contents

This function implements an optimal binning algorithm for numerical variables using an Unsupervised Decision Tree (UDT) approach with Weight of Evidence (WoE) and Information Value (IV) criteria. The algorithm creates bins that maximize the predictive power of the feature while maintaining interpretability.

Usage

optimal_binning_numerical_udt(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  laplace_smoothing = 0.5,
  monotonicity_direction = "none",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A numeric vector of feature values to be binned.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum frequency of observations in each bin as a proportion (default: 0.05).

max_n_prebins

Maximum number of pre-bins for initial discretization (default: 20).

laplace_smoothing

Smoothing parameter for WoE calculation to handle zero counts (default: 0.5).

monotonicity_direction

Specify monotonicity constraint: "none", "increasing", "decreasing", or "auto" (default: "none").

convergence_threshold

Threshold for convergence of the optimization process (default: 1e-6).

max_iterations

Maximum number of iterations for the optimization process (default: 1000).

Value

A list containing binning details:

id

A numeric vector of bin identifiers.

bin

A character vector of bin intervals.

woe

A numeric vector of Weight of Evidence values for each bin.

iv

A numeric vector of Information Value for each bin.

event_rate

A numeric vector of event rates (proportion of positives) for each bin.

count

An integer vector of total observations in each bin.

count_pos

An integer vector of positive observations in each bin.

count_neg

An integer vector of negative observations in each bin.

cutpoints

A numeric vector of cut points between bins.

total_iv

The total Information Value of the binning.

gini

The Gini coefficient measuring discrimination power.

ks

The Kolmogorov-Smirnov statistic measuring separation.

converged

A logical value indicating whether the algorithm converged.

iterations

An integer value of the number of iterations run.

Details

The Unsupervised Decision Tree (UDT) binning algorithm discretizes a continuous variable into bins that maximize the Information Value (IV) while respecting constraints on the number and size of bins.

The algorithm follows these main steps:

  1. Initial discretization using an entropy-based decision tree approach

  2. Merging of rare bins based on the bin_cutoff parameter

  3. Bin optimization using IV and WoE criteria

  4. Optional enforcement of monotonicity in WoE across bins

  5. Adjustment of the number of bins to be within the specified range

The mathematical formulation of the optimization problem is:

$$ \max_{\{c_1, c_2, ..., c_{m-1}\}} \sum_{i=1}^{m} (p_i - q_i) \cdot \ln\left(\frac{p_i + \epsilon}{q_i + \epsilon}\right) $$

Subject to:

  • \(min\_bins \leq m \leq max\_bins\)

  • \(\frac{n_i}{n} \geq bin\_cutoff\) for all i

  • Optionally, \(WoE_1 \leq WoE_2 \leq ... \leq WoE_m\) (for increasing monotonicity)

Where:

  • \(p_i = \frac{n_{i,1}}{n_1}\) is the proportion of positive observations in bin i

  • \(q_i = \frac{n_{i,0}}{n_0}\) is the proportion of negative observations in bin i

  • \(\epsilon\) is the Laplace smoothing parameter

The algorithm includes special handling for missing values (NA/NaN) and extreme values (±Inf), as well as proper treatment of variables with very few unique values.

References

Belkin, M., Hsu, D., Ma, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849-15854.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.

Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and Its Applications. SIAM.

Siddiqi, N. (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. Wiley.

Examples

if (FALSE) { # \dontrun{
# Generate sample data
set.seed(123)
n <- 10000
feature <- rnorm(n)
target <- rbinom(n, 1, plogis(0.5 * feature))

# Apply optimal binning
result <- optimal_binning_numerical_udt(
  target, feature, 
  min_bins = 3, 
  max_bins = 5,
  monotonicity_direction = "auto",
  laplace_smoothing = 0.5
)

# View binning results
print(result)
} # }