Skip to contents

Implements a greedy binning algorithm with quantile-based pre-binning and monotonicity enforcement. Important Note: Despite "Optimal Supervised Learning Partitioning" and "LP" in the name, the algorithm uses greedy heuristics without formal Linear Programming or convex optimization. The method is functionally equivalent to ob_numerical_mrblp with minor differences in pre-binning strategy and bin reduction criteria.

Users seeking true optimization-based binning should consider Mixed-Integer Programming (MIP) implementations (e.g., via ompr or lpSolve packages), though these scale poorly beyond N > 10,000 observations.

Usage

ob_numerical_oslp(
  feature,
  target,
  min_bins = 3,
  max_bins = 5,
  bin_cutoff = 0.05,
  max_n_prebins = 20,
  convergence_threshold = 1e-06,
  max_iterations = 1000,
  laplace_smoothing = 0.5
)

Arguments

feature

Numeric vector of feature values. Missing values (NA) and infinite values are not permitted and will trigger an error.

target

Integer or numeric vector of binary target values (must contain only 0 and 1). Must have the same length as feature. Unlike other binning methods, OSLP internally uses double for target, allowing implicit conversion from integer.

min_bins

Minimum number of bins (default: 3). Must be at least 2.

max_bins

Maximum number of bins (default: 5). Must be greater than or equal to min_bins.

bin_cutoff

Minimum fraction of total observations per bin (default: 0.05). Must be in (0, 1).

max_n_prebins

Maximum number of pre-bins (default: 20). Must be at least equal to min_bins.

convergence_threshold

Convergence threshold for IV change (default: 1e-6).

max_iterations

Maximum iterations (default: 1000).

laplace_smoothing

Laplace smoothing parameter (default: 0.5). Must be non-negative.

Value

A list containing:

id

Integer bin identifiers (1-based).

bin

Character bin intervals "[lower;upper)".

woe

Numeric WoE values (guaranteed monotonic).

iv

Numeric IV contributions per bin.

count

Integer total observations per bin.

count_pos

Integer positive class counts.

count_neg

Integer negative class counts.

event_rate

Numeric event rates.

cutpoints

Numeric bin boundaries (excluding ±Inf).

total_iv

Total Information Value.

converged

Logical convergence flag.

iterations

Integer iteration count.

Details

Algorithm Overview

OSLP executes in five phases:

Phase 1: Quantile-Based Pre-binning

Unlike equal-frequency methods that ensure balanced bin sizes, OSLP places cutpoints at quantiles of unique feature values:

$$\text{edge}_i = \text{unique\_vals}\left[\left\lfloor p_i \times (n_{\text{unique}} - 1) \right\rfloor\right]$$

where \(p_i = i / \text{max\_n\_prebins}\).

Critique: If unique values are clustered (e.g., many observations at specific values), bins may have vastly different sizes, violating the equal-frequency principle that ensures statistical stability.

Phase 2: Rare Bin Merging

Bins with \(n_i / N < \text{bin\_cutoff}\) are merged. The merge direction minimizes IV loss:

$$\Delta \text{IV} = \text{IV}_i + \text{IV}_{i+d} - \text{IV}_{\text{merged}}$$

where \(d \in \{-1, +1\}\) (left or right neighbor).

Phase 3: Initial WoE/IV Calculation

Standard WoE with Laplace smoothing:

$$\text{WoE}_i = \ln\left(\frac{n_i^{+} + \alpha}{n^{+} + k\alpha} \bigg/ \frac{n_i^{-} + \alpha}{n^{-} + k\alpha}\right)$$

Phase 4: Monotonicity Enforcement

Direction determined via majority vote (identical to MRBLP):

$$\text{increasing} = \begin{cases} \text{TRUE} & \text{if } \sum_i \mathbb{1}_{\{\text{WoE}_i > \text{WoE}_{i-1}\}} \ge \sum_i \mathbb{1}_{\{\text{WoE}_i < \text{WoE}_{i-1}\}} \\ \text{FALSE} & \text{otherwise} \end{cases}$$

Violations are merged iteratively.

Phase 5: Bin Count Reduction

If \(k > \text{max\_bins}\), merge bins with the smallest combined IV:

$$\text{merge\_idx} = \arg\min_{i=0}^{k-2} \left( \text{IV}_i + \text{IV}_{i+1} \right)$$

Rationale: Assumes bins with low total IV contribute least to predictive power. However, this ignores the interaction between bins; a low-IV bin may be essential for monotonicity or preventing gaps.

Theoretical Foundations

Despite the name "Optimal Supervised Learning Partitioning", the algorithm lacks:

  • Global optimality guarantees: Greedy merging is myopic

  • Formal loss function: No explicit objective being minimized

  • LP formulation: No constraint matrix, simplex solver, or dual variables

A true optimal partitioning approach would formulate the problem as:

$$\min_{\mathbf{z}, \mathbf{b}} \left\{ -\sum_{i=1}^{k} \text{IV}_i(\mathbf{b}) + \lambda k \right\}$$

subject to: $$\sum_{j=1}^{k} z_{ij} = 1 \quad \forall i \in \{1, \ldots, N\}$$ $$\text{WoE}_j \le \text{WoE}_{j+1} \quad \forall j$$ $$z_{ij} \in \{0, 1\}, \quad b_j \in \mathbb{R}$$

where \(z_{ij}\) indicates observation \(i\) assigned to bin \(j\), and \(\lambda\) is a complexity penalty. This requires MILP solvers (CPLEX, Gurobi) and is intractable for \(N > 10^4\).

Comparison with Related Methods

MethodPre-binningDirectionMerge (max_bins)Target Type
OSLPQuantile (unique vals)Majority voteMin (IV(i) + IV(i+1))double
MRBLPEqual-frequencyMajority voteMin |IV(i) - IV(i+1)|int
MOBEqual-frequencyFirst two binsMin IV lossint
MBLPQuantile (data)CorrelationMin IV lossint

When to Use OSLP

  • Use OSLP: Never. Use MBLP or MOB instead for better pre-binning and merge strategies.

  • Use MBLP: For robust direction detection via correlation.

  • Use MDLP: For information-theoretic stopping criteria.

  • Use True LP: For small datasets (N < 1000) where global optimality is critical and computational cost is acceptable.

References

  • Mironchyk, P., & Tchistiakov, V. (2017). "Monotone optimal binning algorithm for credit risk modeling". Frontiers in Applied Mathematics and Statistics, 3, 2.

  • Zeng, G. (2014). "A Necessary Condition for a Good Binning Algorithm in Credit Scoring". Applied Mathematical Sciences, 8(65), 3229-3242.

  • Fayyad, U. M., & Irani, K. B. (1993). "Multi-Interval Discretization of Continuous-Valued Attributes". IJCAI, pp. 1022-1027.

  • Good, I. J. (1952). "Rational Decisions". Journal of the Royal Statistical Society B, 14(1), 107-114.

  • Siddiqi, N. (2006). Credit Risk Scorecards. Wiley.

See also

ob_numerical_mrblp for nearly identical algorithm with better pre-binning, ob_numerical_mblp for correlation-based direction detection, ob_numerical_mdlp for information-theoretic optimality.

Author

Lopes, J. E.

Examples

# \donttest{
set.seed(123)
n <- 5000
feature <- c(
  rnorm(2000, 600, 50),
  rnorm(2000, 680, 40),
  rnorm(1000, 740, 30)
)
target <- c(
  rbinom(2000, 1, 0.25),
  rbinom(2000, 1, 0.10),
  rbinom(1000, 1, 0.03)
)

result <- ob_numerical_oslp(
  feature = feature,
  target = target,
  min_bins = 3,
  max_bins = 5
)

print(result$woe)
#> [1]  0.8025010  0.5114407 -0.2910587 -1.2743136 -1.3945886
print(result$total_iv)
#> [1] 0.3691969

# Compare with MRBLP (should be nearly identical)
result_mrblp <- ob_numerical_mrblp(
  feature = feature,
  target = target,
  min_bins = 3,
  max_bins = 5
)

data.frame(
  Method = c("OSLP", "MRBLP"),
  Total_IV = c(result$total_iv, result_mrblp$total_iv),
  N_Bins = c(length(result$woe), length(result_mrblp$woe))
)
#>   Method  Total_IV N_Bins
#> 1   OSLP 0.3691969      5
#> 2  MRBLP 0.3847959      5
# }