Optimized Logistic Regression — oblr • OptimalBinningWoE

Fits logistic regression models using an optimized C++ implementation via Rcpp.

Usage

oblr(formula, data, max_iter = 1000, tol = 1e-06)

Arguments

formula: An object of class formula describing the model to be fitted.
data: A data frame or data.table containing the model data.
max_iter: Maximum number of iterations for the optimization algorithm. Default is 1000.
tol: Convergence tolerance for the optimization algorithm. Default is 1e-6.

Value

An object of class oblr containing the results of the logistic regression fit, including:

coefficients: Vector of estimated coefficients.
se: Standard errors of the coefficients.
z_scores: Z-statistics for the coefficients.
p_values: P-values for the coefficients.
loglikelihood: Log-likelihood of the model.
convergence: Convergence indicator.
iterations: Number of iterations performed.
message: Convergence message.
data: List containing the design matrix X, response y, and the function call.

Details

The oblr function fits a logistic regression model using an optimized C++ implementation via Rcpp. This implementation is designed to be efficient, especially for large or sparse datasets.

The logistic regression model is defined as:

$$P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + ... + \beta_p X_p)}}$$

where $\beta$ are the coefficients to be estimated.

The optimization method used is L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno), a variant of the BFGS method that uses a limited amount of memory. This method is particularly effective for optimization problems with many variables.

The estimation process involves the following steps:

Data preparation: The design matrix X is created using sparse.model.matrix from the Matrix package, which is efficient for sparse data.
Optimization: The C++ function fit_logistic_regression is called to perform the optimization using L-BFGS.
Statistics calculation: Standard errors, z-statistics, and p-values are calculated using the Hessian matrix returned by the optimization function.

Convergence is determined by the relative change in the objective function (log-likelihood) between successive iterations, compared to the specified tolerance.

Examples

if (FALSE) { # \dontrun{
library(data.table)

# Create example data
set.seed(123)
n <- 10000
X1 <- rnorm(n)
X2 <- rnorm(n)
Y <- rbinom(n, 1, plogis(1 + 0.5 * X1 - 0.5 * X2))
dt <- data.table(Y, X1, X2)

# Fit logistic regression model
model <- oblr(Y ~ X1 + X2, data = dt, max_iter = 1000, tol = 1e-6)

# View results
print(model)
} # }