Optimal Binning for Numerical Variables using MDLP with Monotonicity — optimal_binning_numerical_fast

This function implements optimal binning for numerical variables using the Minimum Description Length Principle (MDLP) with optional monotonicity constraints on the Weight of Evidence (WoE).

Usage

optimal_binning_numerical_fast_mdlpm(
  target,
  feature,
  min_bins = 2L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 100L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  force_monotonicity = TRUE
)

Arguments

target: Binary target variable (0/1)
feature: Numerical feature to be binned
min_bins: Minimum number of bins (default: 2)
max_bins: Maximum number of bins (default: 5)
bin_cutoff: Minimum relative frequency for a bin (not fully implemented, for future extensions)
max_n_prebins: Maximum number of pre-bins (not fully implemented, for future extensions)
convergence_threshold: Convergence threshold for monotonicity enforcement
max_iterations: Maximum number of iterations for monotonicity enforcement
force_monotonicity: Whether to enforce monotonicity of Weight of Evidence

Value

A list containing:

id: Bin identifiers
bin: Bin interval representations
woe: Weight of Evidence values for each bin
iv: Information Value components for each bin
count: Total count in each bin
count_pos: Positive count in each bin
count_neg: Negative count in each bin
cutpoints: Cut points between bins
converged: Whether the algorithm converged
iterations: Number of iterations performed

Details

The algorithm recursively partitions the feature space by finding cut points that maximize information gain, subject to the MDLP criterion that determines whether a cut is justified based on information theory principles. The monotonicity constraint ensures that the WoE values across bins follow a monotonic (strictly increasing or decreasing) pattern, which is often desirable in credit risk modeling applications.

References

Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022-1027.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Examples

if (FALSE) { # \dontrun{
# Generate sample data
set.seed(123)
feature <- rnorm(1000)
target <- as.integer(feature + rnorm(1000) > 0)

# Apply optimal binning
result <- optimal_binning_numerical_fast_mdlpm(target, feature, min_bins = 3, max_bins = 5)

# Print results
print(result)

# Create WoE transformation
woe_transform <- function(x, bins, woe_values) {
  result <- rep(NA, length(x))
  for(i in seq_along(bins)) {
    idx <- eval(parse(text = paste0("x", bins[i])))
    result[idx] <- woe_values[i]
  }
  return(result)
}
} # }