Optimal Binning for Categorical Variables with Multinomial Target using JEDI-MWoE
optimal_binning_categorical_jedi_mwoe.Rd
Implements an optimized categorical binning algorithm that extends the JEDI (Joint Entropy Discretization and Integration) framework to handle multinomial response variables using M-WOE (Multinomial Weight of Evidence). This implementation provides a robust solution for categorical feature discretization in multinomial classification problems while maintaining monotonic relationships and optimizing information value.
Usage
optimal_binning_categorical_jedi_mwoe(
target,
feature,
min_bins = 3L,
max_bins = 5L,
bin_cutoff = 0.05,
max_n_prebins = 20L,
bin_separator = "%;%",
convergence_threshold = 1e-06,
max_iterations = 1000L
)
Arguments
- target
Integer vector of class labels (0 to n_classes-1). Must be consecutive integers starting from 0.
- feature
Character vector of categorical values to be binned. Must have the same length as target.
- min_bins
Minimum number of bins in the output (default: 3). Will be automatically adjusted if number of unique categories is less than min_bins. Value must be >= 1.
- max_bins
Maximum number of bins allowed in the output (default: 5). Must be >= min_bins. Algorithm will merge bins if necessary to meet this constraint.
- bin_cutoff
Minimum relative frequency threshold for individual bins (default: 0.05). Categories with frequency below this threshold will be candidates for merging. Value must be between 0 and 1.
- max_n_prebins
Maximum number of pre-bins before optimization (default: 20). Controls initial complexity before optimization phase. Must be >= min_bins.
- bin_separator
String separator used when combining category names (default: "%;%"). Used to create readable bin labels.
- convergence_threshold
Convergence threshold for Information Value change (default: 1e-6). Algorithm stops when IV change is below this value.
- max_iterations
Maximum number of optimization iterations (default: 1000). Prevents infinite loops in edge cases.
Value
A list containing:
id: Numeric identifiers for each bin.
bin: Character vector of bin names (concatenated categories).
woe: Numeric matrix (n_bins × n_classes) of M-WOE values for each class.
iv: Numeric matrix (n_bins × n_classes) of IV contributions for each class.
count: Integer vector of total observation counts per bin.
class_counts: Integer matrix (n_bins × n_classes) of counts per class per bin.
class_rates: Numeric matrix (n_bins × n_classes) of class rates per bin.
converged: Logical indicating whether algorithm converged.
iterations: Integer count of optimization iterations performed.
n_classes: Integer indicating number of classes detected.
total_iv: Numeric vector of total IV per class.
Details
The algorithm implements a sophisticated binning strategy based on information theory and extends the traditional binary WOE to handle multiple classes.
Mathematical Framework
M-WOE Calculation (with Laplace smoothing): For each bin i and class k: $$M-WOE_{i,k} = \ln\left(\frac{P(X = x_i|Y = k)}{P(X = x_i|Y \neq k)}\right)$$ $$= \ln\left(\frac{(n_{k,i} + \alpha)/(N_k + 2\alpha)}{(\sum_{j \neq k} n_{j,i} + \alpha)/(\sum_{j \neq k} N_j + 2\alpha)}\right)$$
where:
\(n_{k,i}\) is the count of class k in bin i
\(N_k\) is the total count of class k
\(\alpha\) is the Laplace smoothing parameter (default: 0.5)
The denominator represents the proportion in all other classes combined
Information Value: For each class k: $$IV_k = \sum_{i=1}^{n} \left(P(X = x_i|Y = k) - P(X = x_i|Y \neq k)\right) \times M-WOE_{i,k}$$
Jensen-Shannon Divergence: For measuring statistical similarity between bins: $$JS(P||Q) = \frac{1}{2}KL(P||M) + \frac{1}{2}KL(Q||M)$$
where:
\(KL\) is the Kullback-Leibler divergence
\(M = \frac{1}{2}(P+Q)\) is the midpoint distribution
\(P\) and \(Q\) are the class distributions of two bins
Optimization Objective: $$maximize \sum_{k=1}^{K} IV_k$$ subject to:
Monotonicity constraints for each class
Minimum bin size constraints
Number of bins constraints
Algorithm Phases
Initial Binning: Creates individual bins for unique categories
Low Frequency Treatment: Merges rare categories based on bin_cutoff
Monotonicity Optimization: Iteratively merges bins while maintaining monotonicity
Final Adjustment: Ensures constraints on number of bins are met
Note
Performance Considerations:
Time complexity: O(n_classes * n_samples * log(n_samples))
Space complexity: O(n_classes * n_bins)
For large datasets, initial binning phase may be memory-intensive
Edge Cases:
Single category: Returns original category as single bin
All samples in one class: Creates degenerate case with warning
Missing values: Treated as a special category "MISSING"
References
Beltrami, M. et al. (2021). JEDI: Joint Entropy Discretization and Integration. arXiv preprint arXiv:2101.03228.
Thomas, L.C. (2009). Consumer Credit Models: Pricing, Profit and Portfolios. Oxford University Press.
Good, I.J. (1950). Probability and the Weighing of Evidence. Charles Griffin & Company.
Kullback, S. (1959). Information Theory and Statistics. John Wiley & Sons.
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.
See also
optimal_binning_categorical_jedi for binary classification
woe_transformation for applying WOE transformation
Examples
# Basic usage with 3 classes
feature <- c("A", "B", "A", "C", "B", "D", "A")
target <- c(0, 1, 2, 1, 0, 2, 1)
result <- optimal_binning_categorical_jedi_mwoe(target, feature)
# With custom parameters
result <- optimal_binning_categorical_jedi_mwoe(
target = target,
feature = feature,
min_bins = 2,
max_bins = 4,
bin_cutoff = 0.1,
max_n_prebins = 15,
convergence_threshold = 1e-8
)