Skip to contents

This function implements a comprehensive suite of state-of-the-art algorithms for optimal binning and Weight of Evidence (WoE) calculation for both numerical and categorical variables. It maximizes predictive power while preserving interpretability through monotonic constraints, information-theoretic optimization, and statistical validation. Primarily designed for credit risk modeling, classification problems, and predictive analytics applications.

Usage

obwoe(
  dt,
  target,
  features = NULL,
  min_bins = 3L,
  max_bins = 4L,
  method = "jedi",
  positive = "bad|1",
  preprocess = TRUE,
  progress = TRUE,
  trace = FALSE,
  outputall = TRUE,
  control = list()
)

Arguments

dt

A data.table containing the dataset.

target

The name of the target variable column (must be binary: 0/1).

features

Vector of feature names to process. If NULL, all features except the target will be processed.

min_bins

Minimum number of bins (default: 3).

max_bins

Maximum number of bins (default: 4).

method

The binning method to use. Can be "auto" or one of the methods listed in the details section tables. Default is 'jedi'.

positive

Character string specifying which category should be considered as positive. Must be either "bad|1" or "good|1".

preprocess

Logical. Whether to preprocess the data before binning (default: TRUE).

progress

Logical. Whether to display a progress bar. Default is TRUE.

trace

Logical. Whether to generate error logs when testing existing methods.

outputall

Logical. If TRUE, returns only the optimal binning gains table. If FALSE, returns a list with data, gains table, and reports (default: TRUE).

control

A list of additional control parameters:

  • cat_cutoff: Minimum frequency for a category (default: 0.05)

  • bin_cutoff: Minimum frequency for a bin (default: 0.05)

  • min_bads: Minimum proportion of bad cases in a bin (default: 0.05)

  • pvalue_threshold: P-value threshold for statistical tests (default: 0.05)

  • max_n_prebins: Maximum number of pre-bins (default: 20)

  • monotonicity_direction: Direction of monotonicity for some algorithms ("increase" or "decrease")

  • lambda: Regularization parameter for some algorithms (default: 0.1)

  • min_bin_size: Minimum bin size as a proportion of total observations (default: 0.05)

  • min_iv_gain: Minimum IV gain for bin splitting for some algorithms (default: 0.01)

  • max_depth: Maximum depth for tree-based algorithms (default: 10)

  • num_miss_value: Value to replace missing numeric values (default: -999.0)

  • char_miss_value: Value to replace missing categorical values (default: "N/A")

  • outlier_method: Method for outlier detection ("iqr", "zscore", or "grubbs")

  • outlier_process: Whether to process outliers (default: FALSE)

  • iqr_k: IQR multiplier for outlier detection (default: 1.5)

  • zscore_threshold: Z-score threshold for outlier detection (default: 3)

  • grubbs_alpha: Significance level for Grubbs' test (default: 0.05)

  • n_threads: Number of threads for parallel processing (default: 1)

  • is_monotonic: Whether to enforce monotonicity in binning (default: TRUE)

  • population_size: Population size for genetic algorithm (default: 50)

  • max_generations: Maximum number of generations for genetic algorithm (default: 100)

  • mutation_rate: Mutation rate for genetic algorithm (default: 0.1)

  • initial_temperature: Initial temperature for simulated annealing (default: 1)

  • cooling_rate: Cooling rate for simulated annealing (default: 0.995)

  • max_iterations: Maximum number of iterations for iterative algorithms (default: 1000)

  • include_upper_bound: Include upper bound for numeric bins (default is TRUE)

  • bin_separator: Bin separator for optimal bins categorical variables (default = "%;%")

  • laplace_smoothing: Smoothing parameter for WoE calculation (default: 0.5)

  • sketch_k: Parameter controlling the accuracy of sketch-based algorithms (default: 200)

  • sketch_width: Width parameter for sketch-based algorithms (default: 2000)

  • sketch_depth: Depth parameter for sketch-based algorithms (default: 5)

  • polynomial_degree: Degree of polynomial for LPDB algorithm (default: 3)

  • auto_monotonicity: Auto-detect monotonicity direction (default: TRUE)

  • monotonic_trend: Monotonicity direction for DP algorithm (default: "auto")

  • use_chi2_algorithm: Whether to use enhanced Chi2 algorithm (default: FALSE)

  • chi_merge_threshold: Threshold for chi-merge algorithm (default: 0.05)

  • force_monotonic_direction: Force direction in MBLP (0=auto, 1=increasing, -1=decreasing)

  • monotonicity_direction: Monotonicity for UDT ("none", "increasing", "decreasing", "auto")

  • divergence_method: Divergence measure for DMIV ("he", "kl", "tr", "klj", "sc", "js", "l1", "l2", "ln")

  • bin_method: Method for WoE calculation in DMIV ("woe", "woe1")

  • adaptive_cooling: Whether to use adaptive cooling in SAB (default: TRUE)

  • enforce_monotonic: Whether to enforce monotonicity in various algorithms (default: TRUE)

Value

Depending on the value of outputall: If outputall = FALSE: A data.table containing the optimal binning gains table (woebin). If outputall = TRUE: A list containing:

data

The original dataset with added WoE columns

woebin

Information about the bins created, including:

  • feature: Name of the feature

  • bin: Bin label or range

  • count: Number of observations in the bin

  • count_distr: Proportion of observations in the bin

  • good: Number of good cases (target = 0) in the bin

  • bad: Number of bad cases (target = 1) in the bin

  • good_rate: Proportion of good cases in the bin

  • bad_rate: Proportion of bad cases in the bin

  • woe: Weight of Evidence for the bin

  • iv: Information Value contribution of the bin

report_best_model

Report on the best tested models, including:

  • feature: Name of the feature

  • method: Best method selected for the feature

  • iv_total: Total Information Value achieved

  • n_bins: Number of bins created

  • runtime: Execution time for binning the feature

report_preprocess

Preprocessing report for each feature, including:

  • feature: Name of the feature

  • type: Data type of the feature

  • missing_count: Number of missing values

  • outlier_count: Number of outliers detected

  • unique_count: Number of unique values

  • mean_before: Mean value before preprocessing

  • mean_after: Mean value after preprocessing

  • sd_before: Standard deviation before preprocessing

  • sd_after: Standard deviation after preprocessing

Details

Categorical Variable Algorithms

AlgorithmAbbreviationTheoretical FoundationKey Features
ChiMergeCMStatistical TestsUses chi-square tests to merge.
Dynamic Programming with Local ConstraintsDPLCMathematical ProgrammingMaximizes IV with global constraints.
Fisher's Exact Test BinningFETBStatistical TestsUses Fisher's exact test for statistical merging.
Greedy Merge BinningGMBIterative OptimizationIteratively merges bins to optimize.
Information Value BinningIVBInformation TheoryDynamic programming for IV optimization.
Joint Entropy-Driven InformationJEDIInformation TheoryAdaptive merging with entropy.
Monotonic Binning AlgorithmMBAInformation TheoryCombines WoE/IV with monotonicity constraints.
Mixed Integer Linear ProgrammingMILPMathematical ProgrammingMathematical optimization for binning.
Monotonic Optimal BinningMOBIterative OptimizationSpecialized for monotonicity.
Simulated Annealing BinningSABMetaheuristic OptimizationSimulated annealing for global optimization.
Similarity-Based Logistic PartitioningSBLPDistance-Based MethodsSimilarity measures for optimal binning.
Sliding Window BinningSWBIterative OptimizationSliding window approach for binning.
User-Defined TechniqueUDTHybrid MethodsFlexible hybrid approach.
JEDI Multinomial WoEJEDI_MWOEInformation TheoryExtension of JEDI for multinomial targets.

Numerical Variable Algorithms

AlgorithmAbbreviationTheoretical FoundationKey Features
Branch and BoundBBMathematical ProgrammingEfficient search in solution space.
ChiMergeCMStatistical MethodsChi-square-based merging.
Dynamic Programming with Local ConstraintsDPLCMathematical ProgrammingConstrained optimization.
Equal-Width BinningEWBSimple DiscretizationEqual-width intervals for binning.
Fisher's Exact Test BinningFETBStatistical TestsFisher's test for statistical merging.
Joint Entropy-Driven IntervalJEDIInformation TheoryEntropy optimization with merging.
K-means BinningKMBClusteringK-means inspired clustering.
Local Density BinningLDBDensity EstimationAdapts to local density structure.
Local Polynomial Density BinningLPDBDensity EstimationPolynomial density estimation.
Monotonic Binning with Linear ProgrammingMBLPMathematical ProgrammingLinear programming with monotonicity.
Minimum Description Length PrincipleMDLPInformation TheoryMDL criterion with monotonicity.
Monotonic Optimal BinningMOBIterative OptimizationSpecialized monotonicity.
Monotonic Risk Binning with LR Pre-binningMRBLPHybrid MethodsLikelihood ratio pre-binning.
Optimal Supervised Learning PartitioningOSLPSupervised LearningSpecialized supervised approach.
Unsupervised Binning with Standard DeviationUBSDStatistical MethodsStandard deviation-based binning.
Unsupervised Decision TreeUDTDecision TreesDecision tree inspired binning.
Isotonic RegressionIRStatistical MethodsPool Adjacent Violators algorithm (PAVA).
Fast MDLP with MonotonicityFAST_MDLPMInformation TheoryOptimized MDL implementation.
JEDI Multinomial WoEJEDI_MWOEInformation TheoryMultinomial extension of JEDI.
Sketch-based BinningSKETCHApproximate ComputingKLL sketch for efficient computation.

Mathematical Framework

Weight of Evidence (WoE)

The Weight of Evidence measures the predictive power of a bin and is defined as:

$$WoE_i = \ln\left(\frac{P(X_i|Y=1)}{P(X_i|Y=0)}\right)$$

Where \(P(X_i|Y=1)\) is the proportion of positive events in bin i relative to all positive events, and \(P(X_i|Y=0)\) is the proportion of negative events in bin i relative to all negative events.

With Bayesian smoothing applied (used in many implementations):

$$WoE_i = \ln\left(\frac{n_{1i} + \alpha\pi}{n_1 + m\alpha} \cdot \frac{n_0 + m\alpha}{n_{0i} + \alpha(1-\pi)}\right)$$

Where:

  • \(n_{1i}\) is the count of positive cases in bin i

  • \(n_{0i}\) is the count of negative cases in bin i

  • \(n_1\) is the total count of positive cases

  • \(n_0\) is the total count of negative cases

  • \(\pi\) is the overall positive rate

  • \(\alpha\) is the smoothing parameter (typically 0.5)

  • \(m\) is the number of bins

Information Value (IV)

The Information Value quantifies the predictive power of a variable:

$$IV_i = (P(X_i|Y=1) - P(X_i|Y=0)) \times WoE_i$$

The total Information Value is the sum across all bins:

$$IV_{total} = \sum_{i=1}^{n} IV_i$$

IV can be interpreted as follows:

  • IV < 0.02: Not predictive

  • 0.02 <= IV < 0.1: Weak predictive power

  • 0.1 <= IV < 0.3: Medium predictive power

  • 0.3 <= IV < 0.5: Strong predictive power

  • IV >= 0.5: Suspicious (possible overfitting)

Monotonicity Constraint

Many algorithms enforce monotonicity of WoE values across bins, which means:

$$WoE_1 \leq WoE_2 \leq \ldots \leq WoE_n$$ (increasing)

or

$$WoE_1 \geq WoE_2 \geq \ldots \geq WoE_n$$ (decreasing)

Method Selection

When method = "auto", the function tests multiple algorithms and selects the one that produces the highest total Information Value while respecting the specified constraints. The selection process considers:

  • Total Information Value (IV)

  • Monotonicity of WoE values

  • Number of bins created

  • Bin frequency distribution

  • Statistical stability

References

  • Beltrami, M., Mach, M., & Dall'Aglio, M. (2021). Monotonic Optimal Binning Algorithm for Credit Risk Modeling. Risks, 9(3), 58.

  • Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons.

  • Thomas, L.C., Edelman, D.B., & Crook, J.N. (2002). Credit Scoring and Its Applications. SIAM.

  • Zeng, G. (2013). Metric Divergence Measures and Information Value in Credit Scoring. Journal of Mathematics, 2013, Article ID 848271, 10 pages.

  • Zeng, Y. (2014). Univariate feature selection and binner. arXiv preprint arXiv:1410.5420.

  • Mironchyk, P., & Tchistiakov, V. (2017). Monotone Optimal Binning Algorithm for Credit Risk Modeling. Working Paper.

  • Kerber, R. (1992). ChiMerge: Discretization of Numeric Attributes. In AAAI'92.

  • Liu, H. & Setiono, R. (1995). Chi2: Feature Selection and Discretization of Numeric Attributes. In TAI'95.

  • Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022-1027.

  • Barlow, R. E., & Brunk, H. D. (1972). The isotonic regression problem and its dual. Journal of the American Statistical Association, 67(337), 140-147.

  • Fisher, R. A. (1922). On the interpretation of X^2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85, 87-94.

  • Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151.

  • Bertsimas, D., & Tsitsiklis, J. N. (1997). Introduction to Linear Optimization. Athena Scientific.

  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4), 1360-1383.

  • Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.

  • Navas-Palencia, G. (2020). Optimal binning: mathematical programming formulations for binary classification. arXiv preprint arXiv:2001.08025.

Examples

if (FALSE) { # \dontrun{
# Example 1: Using the German Credit Data
library(OptimalBinningWoE)
library(data.table)
library(scorecard)
data(germancredit, package = "scorecard")
dt <- as.data.table(germancredit)

# Process all features with MBLP method
result <- obwoe(dt,
  target = "creditability", method = "jedi",
  min_bins = 3, max_bins = 5, positive = "bad|1"
)

# View WoE binning information
print(result)

# Process only numeric features with MBLP method and get detailed output
numeric_features <- names(dt)[sapply(dt, is.numeric)]
numeric_features <- setdiff(numeric_features, "creditability")

result_detailed <- obwoe(dt,
  target = "creditability", features = numeric_features,
  method = c("jedi", "cm"), preprocess = TRUE, outputall = FALSE,
  min_bins = 3, max_bins = 5, positive = "bad|1"
)

# View WoE-transformed data
head(result_detailed$data)

# View preprocessing report
print(result_detailed$report_preprocess)

# View best model report
print(result_detailed$report_best_model)

# Process only categoric features with UDT method
categoric_features <- names(dt)[sapply(dt, function(i) !is.numeric(i))]
categoric_features <- setdiff(categoric_features, "creditability")
result_cat <- obwoe(dt,
  target = "creditability", features = categoric_features,
  method = "udt", preprocess = TRUE,
  min_bins = 3, max_bins = 4, positive = "bad|1"
)

# View binning information for categorical features
print(result_cat)

# Example 2: Automatic method selection
result_auto <- obwoe(dt,
  target = "creditability",
  method = "auto", # Tries multiple methods and selects the best
  min_bins = 3, max_bins = 5, positive = "bad|1"
)

# View which methods were selected for each feature
print(result_auto$report_best_model)

# Example 3: Using specialized algorithms
# For numerical features with complex distributions
result_lpdb <- obwoe(dt,
  target = "creditability",
  features = numeric_features[1:3],
  method = "jedi", # Local Polynomial Density Binning
  min_bins = 3, max_bins = 5, positive = "bad|1",
  control = list(polynomial_degree = 3)
)

# For categorical features with many levels
result_jedi <- obwoe(dt,
  target = "creditability",
  features = categoric_features[1:3],
  method = "jedi", # Joint Entropy-Driven Information
  min_bins = 3, max_bins = 5, positive = "bad|1"
)
} # }