Subscribe: Biometrika - current issue
Added By: Feedage Forager Feedage Grade B rated
Language: English
average  data  empirical likelihood  estimation  functional data  functional  likelihood  proposed  show  treatment effects 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Biometrika - current issue

Biometrika Current Issue

Published: Tue, 08 Aug 2017 00:00:00 GMT

Last Build Date: Tue, 22 Aug 2017 09:47:26 GMT


Uncertainty quantification under group sparsity


Quantifying the uncertainty in penalized regression under group sparsity is an important open question. We establish, under a high-dimensional scaling, the asymptotic validity of a modified parametric bootstrap method for the group lasso, assuming a Gaussian error model and mild conditions on the design matrix and the true coefficients. Simulation of bootstrap samples provides simultaneous inferences on large groups of coefficients. Through extensive numerical comparisons, we demonstrate that our bootstrap method performs much better than popular competitors, highlighting its practical utility. The theoretical results generalize to other block norm penalization and sub-Gaussian errors, which further broadens the potential applications.

On the Pitman–Yor process with spike and slab base measure


For the most popular discrete nonparametric models, beyond the Dirichlet process, the prior guess at the shape of the data-generating distribution, also known as the base measure, is assumed to be diffuse. Such a specification greatly simplifies the derivation of analytical results, allowing for a straightforward implementation of Bayesian nonparametric inferential procedures. However, in several applied problems the available prior information leads naturally to the incorporation of an atom into the base measure, and then the Dirichlet process is essentially the only tractable choice for the prior. In this paper we fill this gap by considering the Pitman–Yor process with an atom in its base measure. We derive computable expressions for the distribution of the induced random partitions and for the predictive distributions. These findings allow us to devise an effective generalized Pólya urn Gibbs sampler. Applications to density estimation, clustering and curve estimation, with both simulated and real data, serve as an illustration of our results and allow comparisons with existing methodology. In particular, we tackle a functional data analysis problem concerning basal body temperature curves.

Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data


Interval-censored multivariate failure time data arise when there are multiple types of failure or there is clustering of study subjects and each failure time is known only to lie in a certain interval. We investigate the effects of possibly time-dependent covariates on multivariate failure times by considering a broad class of semiparametric transformation models with random effects, and we study nonparametric maximum likelihood estimation under general interval-censoring schemes. We show that the proposed estimators for the finite-dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we develop an EM algorithm that converges stably for arbitrary datasets. Finally, we assess the performance of the proposed methods in extensive simulation studies and illustrate their application using data derived from the Atherosclerosis Risk in Communities Study.

Robust reduced-rank regression


In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. The problem is formulated as a regularized multivariate regression with a sparse mean-shift parameterization, which generalizes and unifies some popular robust multivariate methods. An efficient thresholding-based iterative procedure is developed for optimization. We show that the algorithm is guaranteed to converge and that the coordinatewise minimum point produced is statistically accurate under regularity conditions. Our theoretical investigations focus on non-asymptotic robust analysis, demonstrating that joint rank reduction and outlier detection leads to improved prediction accuracy. In particular, we show that redescending ψ-functions can essentially attain the minimax optimal error rate, and in some less challenging problems convex regularization guarantees the same low error rate. The performance of the proposed method is examined through simulation studies and real-data examples.

Identification and estimation of causal effects with outcomes truncated by death


It is common in medical studies that the outcome of interest is truncated by death, meaning that a subject has died before the outcome could be measured. In this case, restricted analysis among survivors may be subject to selection bias. Hence, it is of interest to estimate the survivor average causal effect, defined as the average causal effect among the subgroup consisting of subjects who would survive under either exposure. In this paper, we consider the identification and estimation problems of the survivor average causal effect. We propose to use a substitution variable in place of the latent membership in the always-survivor group. The identification conditions required for a substitution variable are conceptually similar to conditions for a conditional instrumental variable, and may apply to both randomized and observational studies. We show that the survivor average causal effect is identifiable with use of such a substitution variable, and propose novel model parameterizations for estimation of the survivor average causal effect under our identification assumptions. Our approaches are illustrated via simulation studies and a data analysis.

Maximum empirical likelihood estimation for abundance in a closed population from capture-recapture data


Capture-recapture experiments are widely used to collect data needed for estimating the abundance of a closed population. To account for heterogeneity in the capture probabilities, Huggins (1989) and Alho (1990) proposed a semiparametric model in which the capture probabilities are modelled parametrically and the distribution of individual characteristics is left unspecified. A conditional likelihood method was then proposed to obtain point estimates and Wald-type confidence intervals for the abundance. Empirical studies show that the small-sample distribution of the maximum conditional likelihood estimator is strongly skewed to the right, which may produce Wald-type confidence intervals with lower limits that are less than the number of captured individuals or even are negative. In this paper, we propose a full empirical likelihood approach based on Huggins and Alho’s model. We show that the null distribution of the empirical likelihood ratio for the abundance is asymptotically chi-squared with one degree of freedom, and that the maximum empirical likelihood estimator achieves semiparametric efficiency. Simulation studies show that the empirical likelihood-based method is superior to the conditional likelihood-based method: its confidence interval has much better coverage, and the maximum empirical likelihood estimator has a smaller mean square error. We analyse three datasets to illustrate the advantages of our empirical likelihood approach.

Interleaved lattice-based minimax distance designs


We propose a new method for constructing minimax distance designs, which are useful for computer experiments. To circumvent computational difficulties, we consider designs with an interleaved lattice structure, a newly defined class of lattice that has repeated or alternated layers based on any single dimension. Such designs have boundary adaptation and low-thickness properties. From our numerical results, the proposed designs are by far the best minimax distance designs for moderate or large samples.

Simple, scalable and accurate posterior interval estimation


Standard posterior sampling algorithms, such as Markov chain Monte Carlo procedures, face major challenges in scaling up to massive datasets. We propose a simple and general posterior interval estimation algorithm to rapidly and accurately estimate quantiles of the posterior distributions for one-dimensional functionals. Our algorithm runs Markov chain Monte Carlo in parallel for subsets of the data, and then averages quantiles estimated from each subset. We provide strong theoretical guarantees and show that the credible intervals from our algorithm asymptotically approximate those from the full posterior in the leading parametric order. Our algorithm has a better balance of accuracy and efficiency than its competitors across a variety of simulations and a real-data example.

Pseudo-marginal Metropolis–Hastings sampling using averages of unbiased estimators


We consider a pseudo-marginal Metropolis–Hastings kernel ${\mathbb{P}}_m$ that is constructed using an average of $m$ exchangeable random variables, and an analogous kernel ${\mathbb{P}}_s$ that averages $s

Expandable factor analysis


Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor models when the number of factors is unknown. The method relies on a continuous shrinkage prior for efficient maximum a posteriori estimation of a low-rank and sparse loadings matrix. The structure of the prior leads to an estimation algorithm that accommodates uncertainty in the number of factors. We propose an information criterion to select the hyperparameters of the prior. Expandable factor analysis has better false discovery rates and true positive rates than its competitors across diverse simulation settings. We apply the proposed approach to a gene expression study of ageing in mice, demonstrating superior results relative to four competing methods.

Multiple robustness in factorized likelihood models


We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.

Weighted envelope estimation to handle variability in model selection


Envelope methodology can provide substantial efficiency gains in multivariate statistical problems, but in some applications the estimation of the envelope dimension can induce selection volatility that may mitigate those gains. Current envelope methodology does not account for the added variance that can result from this selection. In this article, we circumvent dimension selection volatility through the development of a weighted envelope estimator. Theoretical justification is given for our estimator, and the validity of the residual bootstrap for estimating its asymptotic variance is established. A simulation study and real-data analysis illustrate the utility of our weighted envelope estimator.

Non-strange weird resampling for complex survival data


This paper introduces a new data-dependent multiplier bootstrap for nonparametric analysis of survival data, possibly subject to competing risks. The nw procedure includes the general wild bootstrap and the weird bootstrap as special cases. The data may be subject to independent right-censoring and left-truncation. The asymptotic correctness of the proposed resampling procedure is proven under standard assumptions. Simulation results on time-simultaneous inference suggest that the weird bootstrap performs better than the standard normal multiplier approach.

Optimal Bayes classifiers for functional data and density ratios


Bayes classifiers for functional data pose a challenge. One difficulty is that probability density functions do not exist for functional data, so the classical Bayes classifier using density quotients needs to be modified. We propose to use density ratios of projections onto a sequence of eigenfunctions that are common to the groups to be classified. The density ratios are then factorized into density ratios of individual projection scores, reducing the classification problem to obtaining a series of one-dimensional nonparametric density estimates. The proposed classifiers can be viewed as an extension to functional data of some of the earliest nonparametric Bayes classifiers that were based on simple density ratios in the one-dimensional case. By means of the factorization of the density quotients, the curse of dimensionality that would otherwise severely affect Bayes classifiers for functional data can be avoided. We demonstrate that in the case of Gaussian functional data, the proposed functional Bayes classifier reduces to a functional version of the classical quadratic discriminant. A study of the asymptotic behaviour of the proposed classifiers in the large-sample limit shows that under certain conditions the misclassification rate converges to zero, a phenomenon that has been referred to as perfect classification. The proposed classifiers also perform favourably in finite-sample settings, as we demonstrate through comparisons with other functional classifiers in simulations and various data applications, including spectral data, functional magnetic resonance imaging data from attention deficit hyperactivity disorder patients, and yeast gene expression data.

Joint sufficient dimension reduction and estimation of conditional and average treatment effects


The estimation of treatment effects based on observational data usually involves multiple confounders, and dimension reduction is often desirable and sometimes inevitable. We first clarify the definition of a central subspace that is relevant for the efficient estimation of average treatment effects. A criterion is then proposed to simultaneously estimate the structural dimension, the basis matrix of the joint central subspace, and the optimal bandwidth for estimating the conditional treatment effects. The method can easily be implemented by forward selection. Semiparametric efficient estimation of average treatment effects can be achieved by averaging the conditional treatment effects with a different data-adaptive bandwidth to ensure optimal undersmoothing. Asymptotic properties of the estimated joint central subspace and the corresponding estimator of average treatment effects are studied. The proposed methods are applied to a nutritional study, where the covariate dimension is reduced from 11 to an effective dimension of one.

Conditional moment models with data missing at random


We consider a general statistical model defined by moment restrictions when data are missing at random. Using inverse probability weighting, we show that such a model is equivalent to a model for the observed variables only, augmented by a moment condition defined by the missingness mechanism. Our framework covers parametric and semiparametric mean regressions and quantile regressions. We allow for missing responses, missing covariates and any combination of them. The equivalence result sheds new light on various aspects of missing data, and provides guidelines for building efficient estimators.