Apply Optimal Weight of Evidence (WoE) to a Numerical Feature
OBApplyWoENum.Rd
This function applies optimal Weight of Evidence (WoE) values to an original numerical feature based on the results from an optimal binning algorithm. It assigns each value in the feature to a bin according to the specified cutpoints and interval inclusion rule, and maps the corresponding WoE value to it.
Arguments
- obresults
A list containing the output from an optimal binning algorithm for numerical variables. It must include at least the following elements:
cutpoints
: A numeric vector of cutpoints used to define the bins.woe
: A numeric vector of WoE values corresponding to each bin.id
: A numeric vector of bin IDs indicating the optimal order of the bins.
- feature
A numeric vector containing the original feature data to which WoE values will be applied.
- include_upper_bound
A logical value indicating whether the upper bound of the interval should be included (default is
TRUE
).
Value
A data frame with four columns:
feature
: Original feature values.bin
: Optimal bins represented as interval notation.woe
: Optimal WoE values corresponding to each feature value.idbin
: ID of the bin to which each feature value belongs.
Details
The function assigns each value in feature
to a bin based on the cutpoints
and the include_upper_bound
parameter. The intervals are defined mathematically as follows:
Let \(C = \{c_1, c_2, ..., c_n\}\) be the set of cutpoints.
If include_upper_bound = TRUE
:
$$
I_1 = (-\infty, c_1]
$$
$$
I_i = (c_{i-1}, c_i], \quad \text{for } i = 2, ..., n
$$
$$
I_{n+1} = (c_n, +\infty)
$$
If include_upper_bound = FALSE
:
$$
I_1 = (-\infty, c_1)
$$
$$
I_i = [c_{i-1}, c_i), \quad \text{for } i = 2, ..., n
$$
$$
I_{n+1} = [c_n, +\infty)
$$
The function uses efficient algorithms and data structures to handle large datasets. It implements binary search to assign bins, minimizing computational complexity.
Examples
if (FALSE) { # \dontrun{
# Example usage with hypothetical obresults and feature vector
obresults <- list(
cutpoints = c(1.5, 3.0, 4.5),
woe = c(-0.2, 0.0, 0.2, 0.4),
id = c(1, 2, 3, 4) # IDs for each bin
)
feature <- c(1.0, 2.0, 3.5, 5.0)
result <- OBApplyWoENum(obresults, feature, include_upper_bound = TRUE)
print(result)
} # }