Skip to contents

This function implements optimal binning for numerical variables using the Minimum Description Length Principle (MDLP) with optional monotonicity constraints on the Weight of Evidence (WoE).

Usage

optimal_binning_numerical_fast_mdlpm(
  target,
  feature,
  min_bins = 2L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 100L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  force_monotonicity = TRUE
)

Arguments

target

Binary target variable (0/1)

feature

Numerical feature to be binned

min_bins

Minimum number of bins (default: 2)

max_bins

Maximum number of bins (default: 5)

bin_cutoff

Minimum relative frequency for a bin (not fully implemented, for future extensions)

max_n_prebins

Maximum number of pre-bins (not fully implemented, for future extensions)

convergence_threshold

Convergence threshold for monotonicity enforcement

max_iterations

Maximum number of iterations for monotonicity enforcement

force_monotonicity

Whether to enforce monotonicity of Weight of Evidence

Value

A list containing:

id

Bin identifiers

bin

Bin interval representations

woe

Weight of Evidence values for each bin

iv

Information Value components for each bin

count

Total count in each bin

count_pos

Positive count in each bin

count_neg

Negative count in each bin

cutpoints

Cut points between bins

converged

Whether the algorithm converged

iterations

Number of iterations performed

Details

The algorithm recursively partitions the feature space by finding cut points that maximize information gain, subject to the MDLP criterion that determines whether a cut is justified based on information theory principles. The monotonicity constraint ensures that the WoE values across bins follow a monotonic (strictly increasing or decreasing) pattern, which is often desirable in credit risk modeling applications.

References

Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022-1027.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Examples

if (FALSE) { # \dontrun{
# Generate sample data
set.seed(123)
feature <- rnorm(1000)
target <- as.integer(feature + rnorm(1000) > 0)

# Apply optimal binning
result <- optimal_binning_numerical_fast_mdlpm(target, feature, min_bins = 3, max_bins = 5)

# Print results
print(result)

# Create WoE transformation
woe_transform <- function(x, bins, woe_values) {
  result <- rep(NA, length(x))
  for(i in seq_along(bins)) {
    idx <- eval(parse(text = paste0("x", bins[i])))
    result[idx] <- woe_values[i]
  }
  return(result)
}
} # }