Skip to contents

This function serves as a high-performance engine (implemented in C++) to calculate a comprehensive set of credit scoring and classification metrics based on pre-aggregated binning results. It takes a list of bin counts and computes metrics such as Information Value (IV), Weight of Evidence (WoE), Kolmogorov-Smirnov (KS), Gini, Lift, and various entropy-based divergence measures.

Usage

ob_gains_table(binning_result)

Arguments

binning_result

A named list or data.frame containing the following atomic vectors (all must have the same length):

id

Numeric vector of bin identifiers. Determines the sort order for cumulative metrics (e.g., KS, Recall).

bin

Character vector of bin labels/intervals.

count

Numeric vector of total observations per bin (\(O_i\)).

count_pos

Numeric vector of positive (event) counts per bin (\(E_i\)).

count_neg

Numeric vector of negative (non-event) counts per bin (\(NE_i\)).

Value

A data.frame with the following columns (metrics calculated per bin):

Identifiers

id, bin

Counts & Rates

count, pos, neg, pos_rate (\(\pi_i\)), neg_rate (\(1-\pi_i\)), count_perc (\(O_i / O_{total}\))

Distributions (Shares)

pos_perc (\(D_1(i)\): Share of Bad), neg_perc (\(D_0(i)\): Share of Good)

Cumulative Statistics

cum_pos, cum_neg, cum_pos_perc (\(CDF_1\)), cum_neg_perc (\(CDF_0\)), cum_count_perc

Credit Scoring Metrics

woe, iv, total_iv, ks, lift, odds_pos, odds_ratio

Advanced Metrics

gini_contribution, log_likelihood, kl_divergence, js_divergence

Classification Metrics

precision, recall, f1_score

Details

Mathematical Definitions

Let \(E_i\) and \(NE_i\) be the number of events and non-events in bin \(i\), and \(E_{total}\), \(NE_{total}\) be the population totals.

Weight of Evidence (WoE) & Information Value (IV): $$WoE_i = \ln\left(\frac{E_i / E_{total}}{NE_i / NE_{total}}\right)$$ $$IV_i = \left(\frac{E_i}{E_{total}} - \frac{NE_i}{NE_{total}}\right) \times WoE_i$$

Kolmogorov-Smirnov (KS): $$KS_i = \left| \sum_{j=1}^i \frac{E_j}{E_{total}} - \sum_{j=1}^i \frac{NE_j}{NE_{total}} \right|$$

Lift: $$Lift_i = \frac{E_i / (E_i + NE_i)}{E_{total} / (E_{total} + NE_{total})}$$

Kullback-Leibler Divergence (Bernoulli): Measures the divergence between the bin's event rate \(p_i\) and the global event rate \(p_{global}\): $$KL_i = p_i \ln\left(\frac{p_i}{p_{global}}\right) + (1-p_i) \ln\left(\frac{1-p_i}{1-p_{global}}\right)$$

Examples

# Manually constructed binning result
bin_res <- list(
  id = 1:3,
  bin = c("Low", "Medium", "High"),
  count = c(100, 200, 50),
  count_pos = c(5, 30, 20),
  count_neg = c(95, 170, 30)
)

gt <- ob_gains_table(bin_res)
print(gt[, c("bin", "woe", "iv", "ks")])
#>      bin         woe          iv        ks
#> 1    Low -1.26479681 0.292325919 0.2311248
#> 2 Medium -0.05495888 0.001693648 0.2619414
#> 3   High  1.27417706 0.333759785 0.0000000