Skip to contents

Generates comprehensive summary statistics for optimal binning results, including predictive power classification based on established IV thresholds (Siddiqi, 2006), aggregate metrics, and feature-level diagnostics.

Usage

# S3 method for class 'obwoe'
summary(object, sort_by = "iv", decreasing = TRUE, ...)

Arguments

object

An object of class "obwoe".

sort_by

Character string specifying the column to sort by. Options: "iv" (default), "n_bins", "feature".

decreasing

Logical. Sort in decreasing order? Default is TRUE for IV, FALSE for feature names.

...

Additional arguments (currently ignored).

Value

An S3 object of class "summary.obwoe" containing:

feature_summary

Data frame with per-feature statistics including IV classification (Unpredictive/Weak/Medium/Strong/Suspicious)

aggregate

Named list of aggregate statistics:

n_features

Total features processed

n_successful

Features without errors

n_errors

Features with errors

total_iv_sum

Sum of all feature IVs

mean_iv

Mean IV across features

median_iv

Median IV across features

mean_bins

Mean number of bins

iv_range

Min and max IV values

iv_distribution

Table of IV classification counts

target

Target column name

target_type

Target type (binary/multinomial)

Details

IV Classification Thresholds

Following Siddiqi (2006), features are classified by predictive power:

ClassificationIV Range
Unpredictive< 0.02
Weak0.02 - 0.10
Medium0.10 - 0.30
Strong0.30 - 0.50
Suspicious> 0.50

Features with IV > 0.50 should be examined for data leakage or overfitting, as such high values are rarely observed in practice.

References

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons. doi:10.1002/9781119201731

See also

obwoe for the main binning function, print.obwoe, plot.obwoe.

Examples

# \donttest{
set.seed(42)
df <- data.frame(
  x1 = rnorm(500), x2 = rnorm(500), x3 = rnorm(500),
  target = rbinom(500, 1, 0.2)
)
result <- obwoe(df, target = "target")
summary(result)
#> Summary: Optimal Binning Weight of Evidence
#> ============================================
#> 
#> Target: target ( binary )
#> 
#> Aggregate Statistics:
#>   Features: 3 total, 3 successful, 0 errors
#>   Total IV: 0.0930
#>   Mean IV: 0.0310 (SD: 0.0423)
#>   Median IV: 0.0109
#>   IV Range: [0.0025, 0.0796]
#>   Mean Bins: 5.0
#> 
#> IV Classification (Siddiqi, 2006):
#>   Unpredictive: 2 features
#>   Weak        : 1 features
#> 
#> Feature Details:
#>  feature      type n_bins    total_iv     iv_class
#>       x3 numerical      5 0.079623601         Weak
#>       x1 numerical      5 0.010895822 Unpredictive
#>       x2 numerical      5 0.002463832 Unpredictive
# }