Optimal Binning for Categorical Variables using Dynamic Programming — optimal_binning_categorical

Performs optimal binning for categorical variables using a dynamic programming approach with linear constraints. This algorithm finds the optimal grouping of categories that maximizes the Information Value (IV) while respecting constraints on the number of bins and monotonicity.

Usage

optimal_binning_categorical_dp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  bin_separator = "%;%",
  monotonic_trend = "auto"
)

Arguments

target: An integer vector of binary target values (0 or 1).
feature: A character vector of categorical feature values.
min_bins: Minimum number of bins (default: 3).
max_bins: Maximum number of bins (default: 5).
bin_cutoff: Minimum proportion of total observations for a bin (default: 0.05).
max_n_prebins: Maximum number of pre-bins before merging (default: 20).
convergence_threshold: Convergence threshold for the dynamic programming algorithm (default: 0.0000001).
max_iterations: Maximum number of iterations for the dynamic programming algorithm (default: 1000).
bin_separator: Separator for concatenating category names in bins (default: "%;%).
monotonic_trend: Force monotonic trend ('auto', 'ascending', 'descending', 'none'). Default: 'auto'.

Value

A data frame containing binning information with the following columns:

id: Bin identifier (integer)
bin: Bin name containing the concatenated categories
woe: Weight of Evidence value for the bin
iv: Information Value contribution of the bin
count: Total number of observations in the bin
count_pos: Number of positive events (target=1) in the bin
count_neg: Number of negative events (target=0) in the bin
event_rate: Rate of positive events in the bin (count_pos/count)
total_iv: Total Information Value across all bins
converged: Logical indicating whether the algorithm converged
iterations: Number of iterations performed
execution_time_ms: Execution time in milliseconds

Details

The algorithm uses dynamic programming to find the optimal binning solution that maximizes the total Information Value (IV) while respecting constraints. The process follows these steps:

Preprocess the data to count occurrences and merge rare categories
Sort categories based on their event rates (ratio of positive events to total events)
Use dynamic programming to find the optimal partitioning of categories into bins
Apply monotonicity constraints if specified
Calculate WoE and IV metrics for each final bin

The Weight of Evidence (WoE) for each bin is calculated as:

$$WoE = \ln\left(\frac{\text{Distribution of Events}}{\text{Distribution of Non-Events}}\right)$$

The Information Value (IV) contribution for each bin is:

$$IV = (\text{Distribution of Events} - \text{Distribution of Non-Events}) \times WoE$$

Total IV is the sum of IV values across all bins and measures the overall predictive power.

This implementation is based on the methodology described in:

Navas-Palencia, G. (2022). "OptBinning: Mathematical Optimization for Optimal Binning". Journal of Open Source Software, 7(74), 4101.
Siddiqi, N. (2017). "Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards". John Wiley & Sons, 2nd Edition.
Thomas, L.C., Edelman, D.B., & Crook, J.N. (2017). "Credit Scoring and Its Applications". SIAM, 2nd Edition.
Kotsiantis, S.B., & Kanellopoulos, D. (2006). "Discretization Techniques: A recent survey". GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

The dynamic programming algorithm optimally partitions the sorted categories into bins that maximize the total information value. The recurrence relation used is:

$$DP[i][k] = \max_{j<i} \{DP[j][k-1] + IV(\text{bin from } j \text{ to } i)\}$$

where $DP[i][k]$ represents the maximum total IV achievable using the first i categories partitioned into "k" bins.

Examples

if (FALSE) { # \dontrun{
# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- sample(c("A", "B", "C", "D", "E"), n, replace = TRUE)

# Perform optimal binning
result <- optimal_binning_categorical_dp(target, feature, min_bins = 2, max_bins = 4)

# View results
print(result)
} # }