
Apply Optimal Weight of Evidence (WoE) to a Numerical Feature
Source:R/ob_apply_woe_num.R
ob_apply_woe_num.RdTransforms a numerical feature into its corresponding Weight of Evidence (WoE)
values using pre-computed binning results from an optimal binning algorithm
(e.g., ob_numerical_mdlp, ob_numerical_mob).
Usage
ob_apply_woe_num(
obresults,
feature,
include_upper_bound = TRUE,
missing_values = c(-999)
)Arguments
- obresults
List output from an optimal binning function for numerical variables. Must contain elements
cutpoints(numeric vector of bin boundaries) andwoe(numeric vector of WoE values). The number of WoE values should equallength(cutpoints) + 1.- feature
Numeric vector of values to be transformed. Automatically coerced to numeric if provided in another type.
- include_upper_bound
Logical flag controlling interval boundary behavior (default:
TRUE):TRUE: Intervals are(lower, upper](right-closed).FALSE: Intervals are[lower, upper)(left-closed).
This must match the convention used during binning.
- missing_values
Numeric vector of values to be treated as missing (default:
c(-999)). These values are assigned the WoE of the special missing bin if it exists inobresults, orNAotherwise.
Value
Numeric vector of WoE values with the same length as feature.
Values outside the range of cutpoints are assigned to the first or
last bin. NA values in feature are propagated to the output
unless explicitly listed in missing_values.
Details
This function is typically used in a two-step workflow:
Train binning on training data:
bins <- ob_numerical_mdlp(feature_train, target_train)Apply WoE to new data:
woe_test <- ob_apply_woe_num(bins, feature_test)
Bin Assignment Logic:
For k cutpoints \(c_1 < c_2 < \cdots < c_k\), values are assigned as:
Bin 1: \(x \le c_1\) (if
include_upper_bound = TRUE)Bin i: \(c_{i-1} < x \le c_i\) for \(i = 2, \ldots, k\)
Bin k+1: \(x > c_k\)
Handling of Edge Cases:
Values in
missing_valuesare matched against a bin labeled"NA"or"Missing"inobresults$bin(if available).Infand-Infare assigned to the last and first bins, respectively.Values exactly equal to cutpoints follow the
include_upper_boundconvention.
See also
ob_numerical_mdlp for MDLP binning,
ob_numerical_mob for monotonic binning,
ob_apply_woe_cat for applying WoE to categorical features.
Examples
# \donttest{
# Mock data
train_data <- data.frame(
income = c(50000, 75000, 30000, 45000, 80000, 60000),
default = c(0, 0, 1, 1, 0, 0)
)
test_data <- data.frame(
income = c(55000, 35000, 90000)
)
# Train binning on training set
train_bins <- ob_numerical_mdlp(
feature = train_data$income,
target = train_data$default
)
# Apply to test set
test_woe <- ob_apply_woe_num(
obresults = train_bins,
feature = test_data$income
)
# Handle custom missing indicators (e.g., -999, -1)
test_woe <- ob_apply_woe_num(
obresults = train_bins,
feature = test_data$income,
missing_values = c(-999, -1, -9999)
)
# Use left-closed intervals (match scikit-learn convention)
test_woe <- ob_apply_woe_num(
obresults = train_bins,
feature = test_data$income,
include_upper_bound = FALSE
)
# }