Skip to contents

Implements an optimized categorical binning algorithm that extends the JEDI (Joint Entropy Discretization and Integration) framework to handle multinomial response variables using M-WOE (Multinomial Weight of Evidence). This implementation provides a robust solution for categorical feature discretization in multinomial classification problems while maintaining monotonic relationships and optimizing information value.

Usage

optimal_binning_categorical_jedi_mwoe(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

Integer vector of class labels (0 to n_classes-1). Must be consecutive integers starting from 0.

feature

Character vector of categorical values to be binned. Must have the same length as target.

min_bins

Minimum number of bins in the output (default: 3). Will be automatically adjusted if number of unique categories is less than min_bins. Value must be >= 1.

max_bins

Maximum number of bins allowed in the output (default: 5). Must be >= min_bins. Algorithm will merge bins if necessary to meet this constraint.

bin_cutoff

Minimum relative frequency threshold for individual bins (default: 0.05). Categories with frequency below this threshold will be candidates for merging. Value must be between 0 and 1.

max_n_prebins

Maximum number of pre-bins before optimization (default: 20). Controls initial complexity before optimization phase. Must be >= min_bins.

bin_separator

String separator used when combining category names (default: "%;%"). Used to create readable bin labels.

convergence_threshold

Convergence threshold for Information Value change (default: 1e-6). Algorithm stops when IV change is below this value.

max_iterations

Maximum number of optimization iterations (default: 1000). Prevents infinite loops in edge cases.

Value

A list containing:

  • id: Numeric identifiers for each bin.

  • bin: Character vector of bin names (concatenated categories).

  • woe: Numeric matrix (n_bins × n_classes) of M-WOE values for each class.

  • iv: Numeric matrix (n_bins × n_classes) of IV contributions for each class.

  • count: Integer vector of total observation counts per bin.

  • class_counts: Integer matrix (n_bins × n_classes) of counts per class per bin.

  • class_rates: Numeric matrix (n_bins × n_classes) of class rates per bin.

  • converged: Logical indicating whether algorithm converged.

  • iterations: Integer count of optimization iterations performed.

  • n_classes: Integer indicating number of classes detected.

  • total_iv: Numeric vector of total IV per class.

Details

The algorithm implements a sophisticated binning strategy based on information theory and extends the traditional binary WOE to handle multiple classes.

Mathematical Framework

  1. M-WOE Calculation (with Laplace smoothing): For each bin i and class k: $$M-WOE_{i,k} = \ln\left(\frac{P(X = x_i|Y = k)}{P(X = x_i|Y \neq k)}\right)$$ $$= \ln\left(\frac{(n_{k,i} + \alpha)/(N_k + 2\alpha)}{(\sum_{j \neq k} n_{j,i} + \alpha)/(\sum_{j \neq k} N_j + 2\alpha)}\right)$$

where:

  • \(n_{k,i}\) is the count of class k in bin i

  • \(N_k\) is the total count of class k

  • \(\alpha\) is the Laplace smoothing parameter (default: 0.5)

  • The denominator represents the proportion in all other classes combined

  1. Information Value: For each class k: $$IV_k = \sum_{i=1}^{n} \left(P(X = x_i|Y = k) - P(X = x_i|Y \neq k)\right) \times M-WOE_{i,k}$$

  2. Jensen-Shannon Divergence: For measuring statistical similarity between bins: $$JS(P||Q) = \frac{1}{2}KL(P||M) + \frac{1}{2}KL(Q||M)$$

where:

  • \(KL\) is the Kullback-Leibler divergence

  • \(M = \frac{1}{2}(P+Q)\) is the midpoint distribution

  • \(P\) and \(Q\) are the class distributions of two bins

  1. Optimization Objective: $$maximize \sum_{k=1}^{K} IV_k$$ subject to:

    • Monotonicity constraints for each class

    • Minimum bin size constraints

    • Number of bins constraints

Algorithm Phases

  1. Initial Binning: Creates individual bins for unique categories

  2. Low Frequency Treatment: Merges rare categories based on bin_cutoff

  3. Monotonicity Optimization: Iteratively merges bins while maintaining monotonicity

  4. Final Adjustment: Ensures constraints on number of bins are met

Merging Strategy

The algorithm alternates between two merging strategies:

  • Statistical similarity-based merging using Jensen-Shannon divergence

  • Information value-based merging that minimizes IV loss

Statistical Robustness

  • Employs Laplace smoothing for stable probability estimates

  • Uses epsilon protection against numerical instability

  • Detects and resolves monotonicity violations efficiently

Note

Performance Considerations:

  • Time complexity: O(n_classes * n_samples * log(n_samples))

  • Space complexity: O(n_classes * n_bins)

  • For large datasets, initial binning phase may be memory-intensive

Edge Cases:

  • Single category: Returns original category as single bin

  • All samples in one class: Creates degenerate case with warning

  • Missing values: Treated as a special category "MISSING"

References

  • Beltrami, M. et al. (2021). JEDI: Joint Entropy Discretization and Integration. arXiv preprint arXiv:2101.03228.

  • Thomas, L.C. (2009). Consumer Credit Models: Pricing, Profit and Portfolios. Oxford University Press.

  • Good, I.J. (1950). Probability and the Weighing of Evidence. Charles Griffin & Company.

  • Kullback, S. (1959). Information Theory and Statistics. John Wiley & Sons.

  • Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.

See also

  • optimal_binning_categorical_jedi for binary classification

  • woe_transformation for applying WOE transformation

Examples

# Basic usage with 3 classes
feature <- c("A", "B", "A", "C", "B", "D", "A")
target <- c(0, 1, 2, 1, 0, 2, 1)
result <- optimal_binning_categorical_jedi_mwoe(target, feature)

# With custom parameters
result <- optimal_binning_categorical_jedi_mwoe(
  target = target,
  feature = feature,
  min_bins = 2,
  max_bins = 4,
  bin_cutoff = 0.1,
  max_n_prebins = 15,
  convergence_threshold = 1e-8
)