Optimal Binning for Numerical Features using the Minimum Description Length Principle (MDLP)
optimal_binning_numerical_mdlp.Rd
This function performs optimal binning for numerical features using the Minimum Description Length Principle (MDLP).
It minimizes information loss by merging adjacent bins that reduce the MDL cost, while ensuring monotonicity in the Weight of Evidence (WoE).
The algorithm adjusts the number of bins between min_bins
and max_bins
and handles rare bins by merging them iteratively.
Designed for robust and numerically stable calculations, it incorporates protections for extreme cases and convergence controls.
Usage
optimal_binning_numerical_mdlp(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
convergence_threshold = 1e-06,
max_iterations = 1000L,
laplace_smoothing = 0.5
)
Arguments
- target
An integer binary vector (0 or 1) representing the target variable.
- feature
A numeric vector representing the feature to bin.
- min_bins
Minimum number of bins (default: 3).
- max_bins
Maximum number of bins (default: 5).
- bin_cutoff
Minimum proportion of records per bin (default: 0.05).
- max_n_prebins
Maximum number of pre-bins before merging (default: 20).
- convergence_threshold
Convergence threshold for IV optimization (default: 1e-6).
- max_iterations
Maximum number of iterations allowed (default: 1000).
- laplace_smoothing
Smoothing parameter for WoE calculation (default: 0.5).
Value
A list with the following components:
id
: A numeric vector with bin identifiers (1-based).bin
: A vector of bin names representing the intervals.woe
: A numeric vector with the WoE values for each bin.iv
: A numeric vector with the IV values for each bin.count
: An integer vector with the total number of observations in each bin.count_pos
: An integer vector with the count of positive cases in each bin.count_neg
: An integer vector with the count of negative cases in each bin.cutpoints
: A numeric vector of cut points defining the bins.total_iv
: A numeric value representing the total information value of the binning.converged
: A boolean indicating whether the algorithm converged.iterations
: An integer with the number of iterations performed.
Details
Core Steps:
Input Validation: Ensures feature and target are valid, numeric, and binary respectively. Validates consistency between
min_bins
andmax_bins
.Pre-Binning: Creates pre-bins based on equal frequencies or unique values if there are few observations.
MDL-Based Merging: Iteratively merges bins to minimize the MDL cost, which combines model complexity and data fit quality.
Rare Bin Handling: Merges bins with frequencies below the
bin_cutoff
threshold to ensure statistical stability.Monotonicity Enforcement: Adjusts bins to ensure that the WoE values are monotonically increasing or decreasing.
Validation: Validates the final bin structure for consistency and correctness.
Mathematical Framework:
Entropy Calculation: For a bin \( i \) with positive (\( p \)) and negative (\( n \)) counts: $$Entropy = -p \log_2(p) - n \log_2(n)$$
MDL Cost: Combines the cost of the model and data description: $$MDL\_Cost = Model\_Cost + Data\_Cost$$ Where: $$Model\_Cost = \log_2(Number\_of\_bins - 1)$$ $$Data\_Cost = Total\_Entropy - \sum_{i} Count_i \times Entropy_i$$
Weight of Evidence (WoE): For a bin \( i \) with Laplace smoothing parameter: $$WoE_i = \ln\left(\frac{n_{1i} + a}{n_{1} + ma} \cdot \frac{n_{0} + ma}{n_{0i} + a}\right)$$ Where:
\(n_{1i}\) is the count of positive cases in bin \(i\)
\(n_{0i}\) is the count of negative cases in bin \(i\)
\(n_{1}\) is the total count of positive cases
\(n_{0}\) is the total count of negative cases
\(m\) is the number of bins
a is the Laplace smoothing parameter
Information Value (IV): Summarizes predictive power across all bins: $$IV = \sum_{i} (P(X|Y=1) - P(X|Y=0)) \times WoE_i$$
Features:
Merges bins iteratively to minimize the MDL cost.
Ensures monotonicity of WoE to improve model interpretability.
Handles rare bins by merging categories with low frequencies.
Stable against edge cases like all identical values or insufficient observations.
Efficiently processes large datasets with iterative binning and convergence checks.
Applies Laplace smoothing for robust WoE calculation in sparse bins.
References
Fayyad, U. & Irani, K. (1993). "Multi-interval discretization of continuous-valued attributes for classification learning." Proceedings of the International Joint Conference on Artificial Intelligence, 1022-1027.
Rissanen, J. (1978). "Modeling by shortest data description." Automatica, 14(5), 465-471.
Good, I.J. (1952). "Rational Decisions." Journal of the Royal Statistical Society, Series B, 14, 107-114. (Origin of Laplace smoothing/additive smoothing)
Examples
if (FALSE) { # \dontrun{
# Example usage
set.seed(123)
target <- sample(0:1, 100, replace = TRUE)
feature <- runif(100)
result <- optimal_binning_numerical_mdlp(target, feature, min_bins = 3, max_bins = 5)
print(result)
# With different parameters
result2 <- optimal_binning_numerical_mdlp(
target,
feature,
min_bins = 2,
max_bins = 10,
bin_cutoff = 0.03,
laplace_smoothing = 0.1
)
# Print summary statistics
print(paste("Total Information Value:", round(result2$total_iv, 4)))
print(paste("Number of bins created:", length(result2$bin)))
} # }