Optimal Binning for Numerical Variables using MDLP with Monotonicity
optimal_binning_numerical_fast_mdlpm.Rd
This function implements optimal binning for numerical variables using the Minimum Description Length Principle (MDLP) with optional monotonicity constraints on the Weight of Evidence (WoE).
Usage
optimal_binning_numerical_fast_mdlpm(
target,
feature,
min_bins = 2L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 100L,
convergence_threshold = 1e-06,
max_iterations = 1000L,
force_monotonicity = TRUE
)
Arguments
- target
Binary target variable (0/1)
- feature
Numerical feature to be binned
- min_bins
Minimum number of bins (default: 2)
- max_bins
Maximum number of bins (default: 5)
- bin_cutoff
Minimum relative frequency for a bin (not fully implemented, for future extensions)
- max_n_prebins
Maximum number of pre-bins (not fully implemented, for future extensions)
- convergence_threshold
Convergence threshold for monotonicity enforcement
- max_iterations
Maximum number of iterations for monotonicity enforcement
- force_monotonicity
Whether to enforce monotonicity of Weight of Evidence
Value
A list containing:
- id
Bin identifiers
- bin
Bin interval representations
- woe
Weight of Evidence values for each bin
- iv
Information Value components for each bin
- count
Total count in each bin
- count_pos
Positive count in each bin
- count_neg
Negative count in each bin
- cutpoints
Cut points between bins
- converged
Whether the algorithm converged
- iterations
Number of iterations performed
Details
The algorithm recursively partitions the feature space by finding cut points that maximize information gain, subject to the MDLP criterion that determines whether a cut is justified based on information theory principles. The monotonicity constraint ensures that the WoE values across bins follow a monotonic (strictly increasing or decreasing) pattern, which is often desirable in credit risk modeling applications.
References
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022-1027.
Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.
Examples
if (FALSE) { # \dontrun{
# Generate sample data
set.seed(123)
feature <- rnorm(1000)
target <- as.integer(feature + rnorm(1000) > 0)
# Apply optimal binning
result <- optimal_binning_numerical_fast_mdlpm(target, feature, min_bins = 3, max_bins = 5)
# Print results
print(result)
# Create WoE transformation
woe_transform <- function(x, bins, woe_values) {
result <- rep(NA, length(x))
for(i in seq_along(bins)) {
idx <- eval(parse(text = paste0("x", bins[i])))
result[idx] <- woe_values[i]
}
return(result)
}
} # }