Skip to contents

Implements optimal binning for categorical variables using a dynamic programming approach to maximize Information Value (IV). The algorithm finds the globally optimal binning solution within the constraints of minimum and maximum bin counts.

Usage

optimal_binning_categorical_ivb(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "%;%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)

Arguments

target

Integer binary vector (0 or 1) representing the response variable.

feature

Character vector or factor containing the categorical values of the explanatory variable.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum frequency for a separate bin (default: 0.05).

max_n_prebins

Maximum number of pre-bins before optimization (default: 20).

bin_separator

Separator for merged category names (default: "%;%").

convergence_threshold

Convergence threshold for IV (default: 1e-6).

max_iterations

Maximum number of iterations in the search for the optimal solution (default: 1000).

Value

A list containing:

  • id: Numeric vector of bin identifiers.

  • bin: Character vector with the names of the formed bins.

  • woe: Numeric vector with the Weight of Evidence (WoE) of each bin.

  • iv: Numeric vector with the Information Value (IV) of each bin.

  • count: Total count per bin.

  • count_pos: Positive class count per bin.

  • count_neg: Negative class count per bin.

  • total_iv: Total Information Value of the binning.

  • converged: Boolean indicating whether the algorithm converged.

  • iterations: Number of iterations performed.

Details

This implementation uses dynamic programming to find the optimal set of bins that maximizes the total Information Value. The algorithm guarantees a global optimum within the constraints of minimum and maximum bin counts.

The mathematical formulation of the dynamic programming algorithm is:

$$DP[i][k] = \max_{j<i} \{DP[j][k-1] + IV(j+1,i)\}$$

where:

  • \(DP[i][k]\) is the maximum IV achievable using k bins for the first i categories

  • \(IV(j+1,i)\) is the IV of a bin containing categories from index j+1 to i

The Weight of Evidence (WoE) for a bin is defined as:

$$WoE_i = \ln\left(\frac{n^+_i/N^+}{n^-_i/N^-}\right)$$

where:

  • \(n^+_i\) is the number of positive cases in bin i

  • \(n^-_i\) is the number of negative cases in bin i

  • \(N^+\) is the total number of positive cases

  • \(N^-\) is the total number of negative cases

The Information Value (IV) is:

$$IV = \sum_{i=1}^{n} (p_i - q_i) \times WoE_i$$

where:

  • \(p_i = n^+_i/N^+\) is the proportion of positive cases in bin i

  • \(q_i = n^-_i/N^-\) is the proportion of negative cases in bin i

This algorithm employs Bayesian smoothing for improved stability with small sample sizes or rare categories. The smoothing applies pseudo-counts based on the overall class prevalence.

The algorithm includes these main steps:

  1. Preprocess data and calculate category statistics

  2. Merge rare categories based on frequency threshold

  3. Sort categories by event rate for monotonicity

  4. Run dynamic programming to find optimal bin boundaries

  5. Apply post-processing to ensure monotonicity of WoE

  6. Calculate final WoE and IV values for each bin

Advantages over greedy approaches:

  • Guaranteed global optimum (within bin count constraints)

  • Better handling of complex patterns in the data

  • More stable results with small sample sizes

References

  • Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.

  • Siddiqi, N. (2006). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

  • Lin, X., Wang, G., & Zhang, T. (2022). Efficient monotonic binning for predictive modeling in high-dimensional spaces. Knowledge-Based Systems, 235, 107629.

  • Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53(284), 789-798.

  • Bellman, R. (1957). Dynamic Programming. Princeton University Press.

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

Examples

if (FALSE) { # \dontrun{
# Example data
target <- c(1,0,1,1,0,1,0,0,1,1)
feature <- c("A","B","A","C","B","D","C","A","D","B")

# Run optimal binning
result <- optimal_binning_categorical_ivb(target, feature, min_bins = 2, max_bins = 4)

# View results
print(result)
} # }