Apply Optimal Weight of Evidence (WoE) to a Categorical Feature
OBApplyWoECat.Rd
This function applies optimal Weight of Evidence (WoE) values to an original categorical feature based on the results from an optimal binning algorithm. It assigns each category in the feature to its corresponding optimal bin and maps the associated WoE value.
Arguments
- obresults
A list containing the output from an optimal binning algorithm for categorical variables. It must include at least the following elements:
bin
: Character vector of merged categories for each optimal binwoe
: Numeric vector of WoE values for each binid
: Numeric vector of bin IDs representing the optimal order
- feature
A character vector containing the original categorical feature data to which WoE values will be applied.
- bin_separator
A string representing the separator used in
bins
to separate categories within merged bins (default: "%;%").
Value
A data frame with four columns:
feature
: Original feature values.bin
: Optimal merged bins to which each feature value belongs.woe
: Optimal WoE values corresponding to each feature value.idbin
: ID of the bin to which each feature value belongs.
Details
The function processes the bin
from obresults
by splitting each merged bin into individual categories using bin_separator
. It then creates a mapping from each category to its corresponding bin index, WoE value, and bin ID.
For each value in feature
, the function assigns the appropriate bin, WoE value, and bin ID based on the category-to-bin mapping. If a category in feature
is not found in any bin, NA
is assigned to bin
, woe
, and idbin
.
The function handles missing values (NA
) in feature
by assigning NA
to bin
, woe
, and idbin
for those entries.
Examples
if (FALSE) { # \dontrun{
# Example usage with hypothetical obresults and feature vector
obresults <- list(
bin = c("business;repairs;car (used);retraining",
"car (new);furniture/equipment;domestic appliances;education;others",
"radio/television"),
woe = c(-0.2000211, 0.2892885, -0.4100628),
id = c(1, 2, 3)
)
feature <- c("business", "education", "radio/television", "unknown_category")
result <- OBApplyWoECat(obresults, feature, bin_separator = ";")
print(result)
} # }