Skip to contents

Performs optimal binning for numerical variables using a Dynamic Programming approach. It creates optimal bins for a numerical feature based on its relationship with a binary target variable, maximizing the predictive power while respecting user-defined constraints and enforcing monotonicity.

Usage

optimal_binning_numerical_dp(
  target,
  feature,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  convergence_threshold = 1e-06,
  max_iterations = 1000L,
  monotonic_trend = "auto"
)

Arguments

target

An integer vector of binary target values (0 or 1).

feature

A numeric vector of feature values.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 5).

bin_cutoff

Minimum proportion of total observations for a bin to avoid being merged (default: 0.05).

max_n_prebins

Maximum number of pre-bins before the optimization process (default: 20).

convergence_threshold

Convergence threshold for the algorithm (default: 1e-6).

max_iterations

Maximum number of iterations allowed (default: 1000).

monotonic_trend

Monotonicity direction. One of 'auto', 'ascending', 'descending', or 'none' (default: 'auto').

Value

A list containing the following elements:

id

Numeric vector of bin identifiers (1 to n).

bin

Character vector of bin ranges.

woe

Numeric vector of Weight of Evidence (WoE) values for each bin.

iv

Numeric vector of Information Value (IV) for each bin.

count

Numeric vector of total observations in each bin.

count_pos

Numeric vector of positive target observations in each bin.

count_neg

Numeric vector of negative target observations in each bin.

event_rate

Numeric vector of event rates (proportion of positive events) in each bin.

cutpoints

Numeric vector of cut points to generate the bins.

total_iv

Total Information Value across all bins.

converged

Logical indicating if the algorithm converged.

iterations

Integer number of iterations run by the algorithm.

execution_time_ms

Execution time in milliseconds.

monotonic_trend

The monotonic trend used ('auto', 'ascending', 'descending', 'none').

Details

The Dynamic Programming algorithm for numerical variables works as follows:

  1. Create initial pre-bins based on equal-frequency binning of the feature distribution

  2. Calculate bin statistics: counts, event rates, WoE, and IV

  3. If monotonicity is required, determine the appropriate trend:

    • In 'auto' mode: Calculate correlation between feature and target to choose direction

    • In 'ascending'/'descending' mode: Use the specified direction

  4. Enforce monotonicity by merging adjacent bins that violate the monotonic trend

  5. Ensure bin constraints are met:

    • If exceeding max_bins: Merge bins with the smallest WoE difference

    • Handle rare bins: Merge bins with fewer than bin_cutoff proportion of observations

  6. Calculate final statistics for the optimized bins

The Weight of Evidence (WoE) measures the predictive power of each bin and is calculated as:

$$WoE = \ln\left(\frac{\text{Distribution of Events}}{\text{Distribution of Non-Events}}\right)$$

The Information Value (IV) for each bin is calculated as:

$$IV = (\text{Distribution of Events} - \text{Distribution of Non-Events}) \times WoE$$

The total IV is the sum of bin IVs and measures the overall predictive power of the feature.

This implementation is based on the methodology described in:

  • Navas-Palencia, G. (2022). "OptBinning: Mathematical Optimization for Optimal Binning". Journal of Open Source Software, 7(74), 4101.

  • Siddiqi, N. (2017). "Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards". John Wiley & Sons, 2nd Edition.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2017). "Credit Scoring and Its Applications". SIAM, 2nd Edition.

  • Kotsiantis, S.B., & Kanellopoulos, D. (2006). "Discretization Techniques: A recent survey". GESTS International Transactions on Computer Science and Engineering, 32(1), 47-58.

Monotonicity constraints are particularly important in credit scoring and risk modeling applications, as they ensure that the model behaves in an intuitive and explainable way.

Examples

# Create sample data
set.seed(123)
n <- 1000
target <- sample(0:1, n, replace = TRUE)
feature <- rnorm(n)

# Run optimal binning
result <- optimal_binning_numerical_dp(target, feature, min_bins = 2, max_bins = 4)

# Print results
print(result)
#> $id
#> [1] 1 2 3 4
#> 
#> $bin
#> [1] "(-Inf;-1.691862]"     "(-1.691862;0.031526]" "(0.031526;1.651915]" 
#> [4] "(1.651915;+Inf]"     
#> 
#> $woe
#> [1]  0.18434380  0.02400115  0.01511220 -0.55136299
#> 
#> $iv
#> [1] 0.0016962072 0.0002592498 0.0001027778 0.0147786563
#> 
#> $count
#> [1]  50 450 450  50
#> 
#> $count_pos
#> [1]  27 225 224  18
#> 
#> $count_neg
#> [1]  23 225 226  32
#> 
#> $event_rate
#> [1] 0.5400000 0.5000000 0.4977778 0.3600000
#> 
#> $cutpoints
#> [1] -1.691862  0.031526  1.651915
#> 
#> $total_iv
#> [1] 0.01683689
#> 
#> $converged
#> [1] TRUE
#> 
#> $iterations
#> [1] 17
#> 
#> $execution_time_ms
#> [1] 0
#> 
#> $monotonic_trend
#> [1] "descending"
#>