Validates and transforms raw data into the format required by
brs.
The analyst can supply data in several ways:
Minimal (Mode 1): only the score
y. Censoring is inferred automatically: \(y = 0 \to \delta = 1\), \(y = K \to \delta = 2\), \(0 < y < K \to \delta = 3\), \(y \in (0, 1) \to \delta = 0\).Classic (Mode 2):
y+ explicitdelta. The analyst declares the censoring type; interval endpoints are computed using the actualyvalue.Interval (Mode 3):
leftand/orrightcolumns (on the original scale). Censoring is inferred from the NA pattern.Full (Mode 4):
y,left, andrighttogether. The analyst's own endpoints are rescaled directly to \((0, 1)\).
All covariate columns are preserved unchanged in the output.
Usage
brs_prep(
data,
y = "y",
delta = "delta",
left = "left",
right = "right",
ncuts = 100L,
lim = 0.5
)Arguments
- data
Data frame containing the response variable and covariates.
- y
Character: name of the score column (default
"y").- delta
Character: name of the censoring indicator column (default
"delta"). Values must be in{0, 1, 2, 3}.- left
Character: name of the left-endpoint column (default
"left").- right
Character: name of the right-endpoint column (default
"right").- ncuts
Integer: number of scale categories (default 100).
- lim
Numeric: half-width of the uncertainty region (default 0.5). Used only when constructing intervals from
yalone.
Value
A data.frame with the following columns appended or
replaced:
leftLower endpoint on \((0, 1)\).
rightUpper endpoint on \((0, 1)\).
ytMidpoint approximation on \((0, 1)\).
yOriginal scale value (preserved for reference).
deltaCensoring indicator: 0 = exact, 1 = left, 2 = right, 3 = interval.
Covariate columns are preserved.
The output carries attributes "is_prepared" (TRUE),
"ncuts" and "lim" so that
brs can detect prepared data and skip the
internal brs_check call.
Details
Priority rule: if delta is provided (non-NA),
it takes precedence over all automatic classification rules.
When delta is NA, the function infers the censoring type
from the pattern of left, right, and y:
left | right | y | delta | Interpretation | Inferred \(\delta\) |
NA | 5 | NA | NA | Left-censored (below 5) | 1 |
| 20 | NA | NA | NA | Right-censored (above 20) | 2 |
| 30 | 45 | NA | NA | Interval-censored [30, 45] | 3 |
NA | NA | 50 | NA | Exact observation | 0 |
NA | NA | 50 | 3 | Analyst says interval | 3 |
NA | NA | 0 | 1 | Analyst says left-censored | 1 |
NA | NA | 99 | 2 | Analyst says right-censored | 2 |
When y, left, and right are all present for the
same observation, the analyst's left/right values are
used directly (rescaled by \(K =\) ncuts) and delta
is set to 3 (interval-censored) unless the analyst supplied
delta explicitly.
Endpoint formulas for Mode 2 (y + explicit delta):
When the analyst supplies delta explicitly, the endpoint
computation uses the actual y value to produce
observation-specific bounds. This is the same logic used by
brs_check with a user-supplied delta
vector:
| \(\delta\) | Condition | \(l_i\) (left) | \(u_i\) (right) |
| 0 | (any) | \(y / K\) | \(y / K\) |
| 1 | \(y = 0\) | \(\epsilon\) | \(\mathrm{lim} / K\) |
| 1 | \(y \neq 0\) | \(\epsilon\) | \((y + \mathrm{lim}) / K\) |
| 2 | \(y = K\) | \((K - \mathrm{lim}) / K\) | \(1 - \epsilon\) |
| 2 | \(y \neq K\) | \((y - \mathrm{lim}) / K\) | \(1 - \epsilon\) |
| 3 | type "m" | \((y - \mathrm{lim}) / K\) | \((y + \mathrm{lim}) / K\) |
Consistency warnings: when the analyst supplies delta
values that are unusual for the given y (e.g.,
\(\delta = 1\) but \(y \neq 0\)), the function emits a warning
but proceeds. This is by design for Monte Carlo workflows where
forced delta on non-boundary observations is intentional.
All endpoints are clamped to \([\epsilon, 1 - \epsilon]\) with \(\epsilon = 10^{-5}\).
References
Hawker, G. A., Mian, S., Kendzerska, T., and French, M. (2011). Measures of adult pain: Visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ), Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP). Arthritis Care and Research, 63(S11), S240-S252. doi:10.1002/acr.20543.
Hjermstad, M. J., Fayers, P. M., Haugen, D. F., et al. (2011). Studies comparing Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for assessment of pain intensity in adults: a systematic literature review. Journal of Pain and Symptom Management, 41(6), 1073-1093. doi:10.1016/j.jpainsymman.2010.08.016.
Examples
# --- Mode 1: y only (automatic classification, like brs_check) ---
d1 <- data.frame(y = c(0, 3, 5, 7, 10), x1 = rnorm(5))
brs_prep(d1, ncuts = 10)
#> brs_prep: n = 5 | exact = 0, left = 1, right = 1, interval = 3
#> left right yt y delta x1
#> 1 0.00001 0.05000 0.00001 0 1 -1.27114327
#> 2 0.25000 0.35000 0.30000 3 3 -0.38254183
#> 3 0.45000 0.55000 0.50000 5 3 0.64176604
#> 4 0.65000 0.75000 0.70000 7 3 0.80906191
#> 5 0.95000 0.99999 0.99999 10 2 0.07706487
# --- Mode 2: y + explicit delta ---
d2 <- data.frame(
y = d1$y,
delta = c(0, 3, 3, 3, 0), # Force interval-censoring for 3,5,7
x1 = d1$x1
)
brs_prep(d2, ncuts = 100)
#> brs_prep: n = 5 | exact = 2, left = 0, right = 0, interval = 3
#> left right yt y delta x1
#> 1 0.00001 0.00001 1e-05 0 0 -1.27114327
#> 2 0.02500 0.03500 3e-02 3 3 -0.38254183
#> 3 0.04500 0.05500 5e-02 5 3 0.64176604
#> 4 0.06500 0.07500 7e-02 7 3 0.80906191
#> 5 0.10000 0.10000 1e-01 10 0 0.07706487
# --- Mode 3: left/right with NA patterns ---
d3 <- data.frame(
left = c(NA, 20, 30, NA),
right = c(5, NA, 45, NA),
y = c(NA, NA, NA, 50),
x1 = d1$x1[1:4]
)
brs_prep(d3, ncuts = 100)
#> brs_prep: n = 4 | exact = 1, left = 1, right = 1, interval = 1
#> left right yt y delta x1
#> 1 1e-05 0.05000 0.025 NA 1 -1.2711433
#> 2 2e-01 0.99999 0.600 NA 2 -0.3825418
#> 3 3e-01 0.45000 0.375 NA 3 0.6417660
#> 4 5e-01 0.50000 0.500 50 0 0.8090619
# --- Mode 4: y + left + right (analyst-supplied intervals) ---
d4 <- data.frame(
y = c(50, 75),
left = c(48, 73),
right = c(52, 77),
x1 = rnorm(2)
)
brs_prep(d4, ncuts = 100)
#> brs_prep: n = 2 | exact = 0, left = 0, right = 0, interval = 2
#> left right yt y delta x1
#> 1 0.48 0.52 0.50 50 3 -0.9877414
#> 2 0.73 0.77 0.75 75 3 -0.6175175
# --- Simulation Example ---
# \donttest{
set.seed(42)
n <- 200
dat <- data.frame(x1 = rnorm(n), x2 = rnorm(n))
sim <- brs_sim(
formula = ~ x1 + x2, data = dat,
beta = c(0.2, -0.5, 0.3), phi = 1 / 5
)
prep <- brs_prep(sim, ncuts = 100)
#> brs_prep: n = 200 | exact = 0, left = 18, right = 21, interval = 161
fit <- brs(y ~ x1 + x2, data = prep)
summary(fit)
#>
#> Call:
#> brs(formula = y ~ x1 + x2, data = prep)
#>
#> Quantile residuals:
#> Min 1Q Median 3Q Max
#> -3.0625 -0.5896 0.2555 0.6723 1.5528
#>
#> Coefficients (mean model with logit link):
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -5.15677 0.08757 -58.886 < 2e-16 ***
#> x1 -0.38266 0.06216 -6.156 7.45e-10 ***
#> x2 0.12275 0.06556 1.872 0.0612 .
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Phi coefficients (precision model with logit link):
#> Estimate Std. Error z value Pr(>|z|)
#> (phi) -4.8810 0.1352 -36.11 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> ---
#> Log-likelihood: -896.1340 on 4 Df
#> Pseudo R-squared: 0.1097
#> Number of iterations: 29 (BFGS)
#> Censoring: 161 interval | 18 left | 21 right
#>
# }
