
Binning Categorical Variables using Custom Cutpoints
Source:R/ob_binning_cutpoints_cat.R
ob_cutpoints_cat.RdThis function applies user-defined binning to a categorical variable by grouping specified categories into bins and calculating Weight of Evidence (WoE) and Information Value (IV) for each bin.
Arguments
- feature
A character vector or factor representing the categorical predictor variable.
- target
An integer vector containing binary outcome values (0 or 1). Must be the same length as
feature.- cutpoints
A character vector where each element defines a bin by concatenating the original category names with "+" as separator.
Value
A list containing:
woefeatureNumeric vector of WoE values corresponding to each observation in the input
featurewoebinData frame with one row per bin containing:
bin: The bin definition (original categories joined by "+")count: Total number of observations in the bincount_pos: Number of positive outcomes (target=1) in the bincount_neg: Number of negative outcomes (target=0) in the binwoe: Weight of Evidence for the biniv: Information Value contribution of the bin
Details
The function takes a character vector defining how categories should be grouped.
Each element in the cutpoints vector defines one bin by listing the
original categories that should be merged, separated by "+" signs.
For example, if you want to create two bins from categories "A", "B", "C", "D":
Bin 1: "A+B"
Bin 2: "C+D"
Note
Target variable must contain only 0 and 1 values.
Every unique category in
featuremust be included in exactly one bin definition incutpoints.Categories not mentioned in
cutpointswill be assigned to bin 0 (which may lead to unexpected results).
Examples
# Sample data
feature <- c("A", "B", "C", "D", "A", "B", "C", "D")
target <- c(1, 0, 1, 0, 1, 1, 0, 0)
# Define custom bins: (A,B) and (C,D)
cutpoints <- c("A+B", "C+D")
# Apply binning
result <- ob_cutpoints_cat(feature, target, cutpoints)
# View bin statistics
print(result$woebin)
#> bin count count_pos count_neg woe iv
#> 1 A+B 4 3 1 1.098612 0.5493061
#> 2 C+D 4 1 3 -1.098612 0.5493061
# View WoE-transformed feature
print(result$woefeature)
#> [1] 1.098612 1.098612 -1.098612 -1.098612 1.098612 1.098612 -1.098612
#> [8] -1.098612