Calculates a full gains table by aggregating a raw binned dataframe against a
binary target. Unlike ob_gains_table which expects pre-aggregated counts,
this function takes observation-level data, aggregates it by the specified
group variable (bin, WoE, or ID), and then computes all statistical metrics.
Arguments
- binned_df
A
data.frameresulting from a binning transformation (e.g., viaobwoe_apply), containing at least the following columns:featureOriginal feature values (optional, for reference).
binCharacter vector of bin labels.
woeNumeric vector of Weight of Evidence values.
idbinNumeric vector of bin IDs (required for correct sorting).
- target
A numeric vector of binary outcomes (0 for non-event, 1 for event). Must have the same length as
binned_df. Missing values are not allowed.- group_var
Character string specifying the aggregation key. Options:
"bin": Group by bin label (default)."woe": Group by WoE value."idbin": Group by bin ID.
Value
A data.frame containing the same extensive set of metrics as
ob_gains_table, aggregated by group_var and sorted by idbin.
Details
Aggregation and Sorting
The function first aggregates the binary target by the specified group_var.
Crucially, it uses the idbin column to sort the resulting groups. This ensures
that cumulative metrics (like KS and Gini) are calculated based on the logical
order of the bins (e.g., low score to high score), not alphabetical order.
Advanced Metrics
In addition to standard credit scoring metrics, this function computes:
Jensen-Shannon Divergence: A symmetrized and smoothed version of KL divergence, useful for measuring stability between the bin distribution and the population distribution.
F1-Score, Precision, Recall: Treating each bin as a potential classification threshold.
References
Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley.
Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics.
Examples
# \donttest{
# Mock data representing a binned feature
df_binned <- data.frame(
feature = c(10, 20, 30, 10, 20, 50),
bin = c("Low", "Mid", "High", "Low", "Mid", "High"),
woe = c(-0.5, 0.2, 1.1, -0.5, 0.2, 1.1),
idbin = c(1, 2, 3, 1, 2, 3)
)
target <- c(0, 0, 1, 1, 0, 1)
# Calculate gains table grouped by bin ID
gt <- ob_gains_table_feature(df_binned, target, group_var = "idbin")
# Inspect key metrics
print(gt[, c("id", "count", "pos_rate", "lift", "js_divergence")])
#> id count pos_rate lift js_divergence
#> 1 1 2 0.5 1 0.0000000
#> 2 2 2 0.0 0 0.2157616
#> 3 3 2 1.0 2 0.2157616
# }
