Definition of Inverse Set and Introduction of the Estimation Method
The identification of domain sets whose outcomes belong to predefined subsets can address fundamental risk assessment challenges in climatology and medicine. A motivating example involves estimating geographical regions where average difference between summer and winter temperatures exceeds a certain benchmark, which helps policymakers focus on specific areas that are at higher risk for effects of climate change.
Mathematically, the target region corresponding to the inverse image of under an unknown function , can be defined as , with a pre-specified subset of a real line (e.g., ).
A point estimator for the inverse set can be constructed as , where is an empirical estimator of based on observations. To quantify the spatial uncertainty of this estimate, Sommerfeld et al. (2018) introduced the Coverage Probability Excursion (CoPE) sets, defined as: which satisfy: for a pre-specified confidence level (e.g., ).
Existing approaches require restrictive assumptions, including domain density of in , continuity of and near thresholds, and large-sample guarantees, which limit the applicability. Besides, the estimation and coverage depend on setting a fixed threshold level, which is difficult to determine.
Ren et al. (2024) proposed a framework that generalizes the estimation of such inverse sets to dense and non-dense domains with protection against inflated Type I error, and constructs multiple upper, lower or interval confidence regions of over arbitrary chosen thresholds. The coverage probability is achieved non-asymptotically and simultaneously through inverting simultaneous confidence intervals. For instance, suppose we are interested in inverse regions for a single value , the inverse confidence regions are constructed by inverting simultaneous confidence intervals (SCIs). Given SCI bounds and satisfying:
The inner and outer confidence regions (CRs) for the inverse upper
excursion set
are calculated as:
The outer and inner confidence regions for the inverse lower excursion set are calculated as:
The inner and outer CRs for the inverse interval set are calculated as:
By Theorem 1 from Ren et al. (2024), these CRs are valid for all . Therefore we name them as simultaneous confidence regions (SCRs).
Linear Function-on-Scalar Regression (FoSR)
A simple example for function-on-scalar regression model is
where:
- is a functional outcome
- is a scalar covariate
- Each
is a coefficient function
-
is a subject-specific functional random effect. This captures
correlation within subjects over time that is not captured by the mean.
This term is not always included in FoSR models, but it’s generally a
good idea because it gives better inference
- are normally distributed iid errors
Here, the same bases are used for and .
Accounting for Error Correlation
The fitted GAM model above didn’t take the correlation of residuals into account. Here, we combined random effects and FPCA into the GAM model to resolve this.
Estimate the FPCA for GAMM FPCA model
After obtaining residuals from the mean model, FPCA models the residuals using equation , where follows a mean zero Gaussian Process (GP) with covariance function , and are independent random errors.
Assuming that are the eigenfunctions of the covariance operator of , one can express:
The GAMM-FPCA model will be :
Construction of Simultaneous Confidence Bands for Functional Regression Model
We consider simultaneous confidence bands (SCBs) for a target function on a grid . For example, could be or . Given an estimator with pointwise standard error and a normalizing factor , we can define the simultaneous confidence band for as: And we can verify that these bands achieve simultaneous coverage by whenever the critical value satisfies
The is unknown in practice. In SCoRES, we implement two approaches for estimating the for functional parameter functions in FoSR model: the Correlation and Multiplicity-Adjusted (CMA) bands from Crainiceanu et al. (2024) based on parameter simulations, and a multiplier bootstrap procedure from Telschow & Schwartzman (2022). We detail the CMA procedure below:
Correlation and Multiplicity Adjusted (CMA) Confidence Bands Based on Parameter Simulations
Simulate model parameters , where are estimated from a fitted FoSR model.
For each , compute where the division is element-wise and maps parameters to functional effects.
Define where the absolute value is taken element-wise.
Estimate as the quantile of .
Multiplier-t Bootstrap Procedure for Constructing Confidence Bands
The second is the multiplier-t bootstrap procedure for constructing confidence bands. A full introduction to the method can be found in previous work from .
Compute residuals , where , and multipliers with and , where is the fitted mean value from a FoSR model.
Estimate from .
Compute
Repeat steps 1 to 3 many times to approximate the condition law . Take the quantile of to estimate .
At step 2, we allow three choices for the multiplier distribution , each with mean zero and unit variance:
rademacher: with equal probability;gaussian: ;mammen: a two–point distribution with mean and variance from .
Unless stated otherwise, we use the rademacher
multipliers.
At step 3, we consider two alternatives for the pointwise standard errors :
regular,t, where expectations are taken over bootstrap replicates and denotes the perturbed sample at iteration . The absolute value improves numerical stability when subtracting nearly equal quantities.
Unless noted otherwise, we report results using the t
estimator.
Construction of Simultaneous Confidence Bands for Linear/Logistic Regression Model
We next describe the bootstrap algorithm used to construct simultaneous confidence intervals (SCIs) for the mean outcome of regression on a fixed test design matrix. We also use the same procedures for the construction of SCIs for the regression coefficients. For details of this algorithm, please refer to .
Suppose we have training data outcome , design matrix , and a fixed test design matrix . Let denote the fitted regression function. The algorithm proceeds as follows.
First, we estimate on the training data using least squares. Then we compute the estimated mean outcome on the test design matrix , together with its standard deviation .
For bootstrap samples , repeat:
Resample with replacement from the training data.
Fit the model on the resampled data to obtain .
Compute the estimated mean and its pointwise standard deviation .
Calculate the standardized absolute residuals on the test design grid,
Record the maximum value of as .
After bootstrap iterations, take the quantile of the empirical distribution of as the threshold . The SCI on the test design matrix is then given by For logistic regression, the SCI is further transformed back to the data scale using the link function.
Construction of SCB for Spatial Generalized Least Squares Model
We also provide tools in SCoRES for estimating SCB of geographic data, however, their use is currently limited to the spatial generalized least squares model.
Let the spatial domain be sampled on spots with coordinates . At each spot , we observe outcomes with design matrix . We fit a spot-specific generalized least squares (GLS) model
where the variance-covariance matrix within the spot encodes the prespecified correlation structure .
SCoRES allows users to directly input the matrix for each spot , or specify the correlation structure .
Assume we are interested in the linear functional , with as a weight vector for the specific linear combination. The pointwise estimated standard error is
To quantify uncertainty uniformly across spots, we construct a simultaneous confidence band (SCB) for on the grid :
where satisfies
We estimate by multiplier- bootstrap method introduced before.