Skip to contents

Fits logistic regression models using an optimized C++ implementation via Rcpp.

Usage

oblr(formula, data, max_iter = 1000, tol = 1e-06)

Arguments

formula

An object of class formula describing the model to be fitted.

data

A data frame or data.table containing the model data.

max_iter

Maximum number of iterations for the optimization algorithm. Default is 1000.

tol

Convergence tolerance for the optimization algorithm. Default is 1e-6.

Value

An object of class oblr containing the results of the logistic regression fit, including:

coefficients

Vector of estimated coefficients.

se

Standard errors of the coefficients.

z_scores

Z-statistics for the coefficients.

p_values

P-values for the coefficients.

loglikelihood

Log-likelihood of the model.

convergence

Convergence indicator.

iterations

Number of iterations performed.

message

Convergence message.

data

List containing the design matrix X, response y, and the function call.

Details

The oblr function fits a logistic regression model using an optimized C++ implementation via Rcpp. This implementation is designed to be efficient, especially for large or sparse datasets.

The logistic regression model is defined as:

$$P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + ... + \beta_p X_p)}}$$

where \(\beta\) are the coefficients to be estimated.

The optimization method used is L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno), a variant of the BFGS method that uses a limited amount of memory. This method is particularly effective for optimization problems with many variables.

The estimation process involves the following steps:

  1. Data preparation: The design matrix X is created using sparse.model.matrix from the Matrix package, which is efficient for sparse data.

  2. Optimization: The C++ function fit_logistic_regression is called to perform the optimization using L-BFGS.

  3. Statistics calculation: Standard errors, z-statistics, and p-values are calculated using the Hessian matrix returned by the optimization function.

Convergence is determined by the relative change in the objective function (log-likelihood) between successive iterations, compared to the specified tolerance.

Examples

if (FALSE) { # \dontrun{
library(data.table)

# Create example data
set.seed(123)
n <- 10000
X1 <- rnorm(n)
X2 <- rnorm(n)
Y <- rbinom(n, 1, plogis(1 + 0.5 * X1 - 0.5 * X2))
dt <- data.table(Y, X1, X2)

# Fit logistic regression model
model <- oblr(Y ~ X1 + X2, data = dt, max_iter = 1000, tol = 1e-6)

# View results
print(model)
} # }