Preview: Biometrika - current issue
Biometrika Current Issue
Published: Thu, 16 Nov 2017 00:00:00 GMT
Last Build Date: Fri, 17 Nov 2017 02:47:38 GMT
Editor of Biometrika
2017-11-16
Professor Anthony C. Davison has announced his wish to retire as Editor of Biometrika on 31 December 2017, when he will have completed 10 years’ service.
On two-stage estimation of structural instrumental variable models
2017-10-26
Summary
Two-stage least squares estimation is popular for structural equation models with unmeasured confounders. In such models, both the outcome and the exposure are assumed to follow linear models conditional on the measured confounders and instrumental variable, which is related to the outcome only via its relation with the exposure. We consider data where both the outcome and the exposure may be incompletely observed, with particular attention to the case where both are censored event times. A general class of two-stage minimum distance estimators is proposed that separately fits linear models for the outcome and exposure and then uses a minimum distance criterion based on the reduced-form model for the outcome to estimate the regression parameters of interest. An optimal minimum distance estimator is identified which may be superior to the usual two-stage least squares estimator with fully observed data. Simulation studies demonstrate that the proposed methods perform well with realistic sample sizes. Their practical utility is illustrated in a study of the comparative effectiveness of colon cancer treatments, where the effect of chemotherapy on censored survival times may be confounded with patient status.
Doubly robust nonparametric inference on the average treatment effect
2017-10-16
Summary
Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.
Blocking strategies and stability of particle Gibbs samplers
2017-10-16
Summary
Sampling from the posterior probability distribution of the latent states of a hidden Markov model is nontrivial even in the context of Markov chain Monte Carlo. To address this, Andrieu et al. (2010) proposed a way of using a particle filter to construct a Markov kernel that leaves the posterior distribution invariant. Recent theoretical results have established the uniform ergodicity of this Markov kernel and shown that the mixing rate does not deteriorate provided the number of particles grows at least linearly with the number of latent states. However, this gives rise to a cost per application of the kernel that is quadratic in the number of latent states, which can be prohibitive for long observation sequences. Using blocking strategies, we devise samplers that have a stable mixing rate for a cost per iteration that is linear in the number of latent states and which are easily parallelizable.
Differential network analysis via lasso penalized D-trace loss
2017-10-12
Summary
Biological networks often change under different environmental and genetic conditions. In this paper, we model network change as the difference of two precision matrices and propose a novel loss function called the D-trace loss, which allows us to directly estimate the precision matrix difference without attempting to estimate the precision matrices themselves. Under a new irrepresentability condition, we show that the D-trace loss function with the lasso penalty can yield consistent estimators in high-dimensional settings if the difference network is sparse. A very efficient algorithm is developed based on the alternating direction method of multipliers to minimize the penalized loss function. Simulation studies and a real-data analysis show that the proposed method outperforms other methods.
Optimal designs for active controlled dose-finding trials with efficacy-toxicity outcomes
2017-10-09
Summary
We derive optimal designs to estimate efficacy and toxicity in active controlled dose-finding trials when the bivariate continuous outcomes are described using nonlinear regression models. We determine upper bounds on the required number of different doses and provide conditions under which the boundary points of the design space are included in the optimal design. We provide an analytical description of minimally supported optimal designs and show that they do not depend on the correlation between the bivariate outcomes.
Partition-based ultrahigh-dimensional variable screening
2017-10-09
Summary
Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. We consider two special cases: correlation-guided partitioning and spatial location-guided partitioning. In the absence of a single partition, we propose a theoretically justified strategy for combining statistics from various partitioning methods. The utility of the proposed methods is demonstrated via simulation and analysis of functional neuroimaging data.
A $C_p$ criterion for semiparametric causal inference
2017-10-09
Summary
For marginal structural models, which play an important role in causal inference, we consider a model selection problem within a semiparametric framework using inverse-probability-weighted estimation or doubly robust estimation. In this framework, the modelling target is a potential outcome that may be missing, so there is no classical information criterion. We define a mean squared error for treating the potential outcome and derive an asymptotic unbiased estimator as a $C_{p}$ criterion using an ignorable treatment assignment condition. Simulation shows that the proposed criterion outperforms a conventional one by providing smaller squared errors and higher frequencies of selecting the true model in all the settings considered. Moreover, in a real-data analysis we found a clear difference between the two criteria.
Distribution-free tests of independence in high dimensions
2017-10-03
Summary
We consider the testing of mutual independence among all entries in a $d$-dimensional random vector based on $n$ independent observations. We study two families of distribution-free test statistics, which include Kendall’s tau and Spearman’s rho as important examples. We show that under the null hypothesis the test statistics of these two families converge weakly to Gumbel distributions, and we propose tests that control the Type I error in the high-dimensional setting where $d >n$. We further show that the two tests are rate-optimal in terms of power against sparse alternatives and that they outperform competitors in simulations, especially when $d$ is large.
Robust rank estimation for transformation models with random effects
2017-10-03
Summary
Semiparametric transformation models with random effects are useful in analysing recurrent and clustered data. With specified error and random effect distributions, Zeng & Lin (2007a) proved that nonparametric maximum likelihood estimators are semiparametric efficient. In this paper we consider a more general class of transformation models with random effects, under which an unknown monotonic transformation of the response is linearly related to the covariates and the random effects with unspecified error and random effect distributions. This includes many popular models. We propose an estimator based on the maximum rank correlation, which relies on symmetry of the random effect distribution, and establish its consistency and asymptotic normality. A random weighting resampling scheme is employed for inference. The proposed method can be extended to censored and clustered data. Numerical studies demonstrate that the proposed method performs well in practical situations. Application of the method is illustrated with the Framingham cholesterol data.
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
2017-09-27
Summary
Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the $k$th principal component in Euclidean space: the locus of the weighted Fréchet mean of $k+1$ vertex trees when the weights vary over the $k$-simplex. We establish some basic properties of these objects, in particular showing that they have dimension $k$, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
Estimating network edge probabilities by neighbourhood smoothing
2017-09-15
Summary
The estimation of probabilities of network edges from the observed adjacency matrix has important applications to the prediction of missing links and to network denoising. It is usually addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but this is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally efficient method, based on neighbourhood smoothing, to estimate the expectation of the adjacency matrix directly, without making the structural assumptions that graphon estimation requires. The neighbourhood smoothing method requires little tuning, has a competitive mean squared error rate and outperforms many benchmark methods for link prediction in simulated and real networks.
Semiparametric analysis of complex polygenic gene-environment interactions in case-control studies
2017-09-15
Summary
Many methods have recently been proposed for efficient analysis of case-control studies of gene-environment interactions using a retrospective likelihood framework that exploits the natural assumption of gene-environment independence in the underlying population. However, for polygenic modelling of gene-environment interactions, which is a topic of increasing scientific interest, applications of retrospective methods have been limited due to a requirement in the literature for parametric modelling of the distribution of the genetic factors. We propose a general, computationally simple, semiparametric method for analysis of case-control studies that allows exploitation of the assumption of gene-environment independence without any further parametric modelling assumptions about the marginal distributions of any of the two sets of factors. The method relies on the key observation that an underlying efficient profile likelihood depends on the distribution of genetic factors only through certain expectation terms that can be evaluated empirically. We develop asymptotic inferential theory for the estimator and evaluate its numerical performance via simulation studies. An application of the method is presented.
Contours and dimple for the Gneiting class of space-time correlation functions
2017-09-08
Abstract
We offer a dual view of the dimple problem related to space-time correlation functions in terms of their contours. We find that the dimple property (Kent et al., 2011) in the Gneiting class of correlations is in one-to-one correspondence with nonmonotonicity of the parametric curve describing the associated contour lines. Further, we show that given such a nonmonotonic parametric curve associated with a given level set, all the other parametric curves at smaller levels inherit the nonmonotonicity. We propose a modified Gneiting class of correlations having monotonically decreasing parametric curves and no dimple along the temporal axis.
Bayesian local extremum splines
2017-09-05
Summary
We consider shape-restricted nonparametric regression on a closed set $\mathcal{X} \subset \mathbb{R},$ where it is reasonable to assume that the function has no more than $H$ local extrema interior to $\mathcal{X}$. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modelling any continuously differentiable function within the class considered, and we use itto develop methods for testing hypotheses on the shape of the curve. Sampling algorithms are developed, and the method is applied in simulation studies and data examples where the shape of the curve is of interest.
Projection correlation between two random vectors
2017-09-04
Abstract
We propose the use of projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. It equals zero if and only if the two random vectors are independent, it is not sensitive to the dimensions of the two random vectors, it is invariant with respect to the group of orthogonal transformations, and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. We show that the sample estimate of the projection correction is $n$-consistent if the two random vectors are independent and root-$n$-consistent otherwise. Monte Carlo simulation studies indicate that the projection correlation has higher power than the distance correlation and the ranks of distances in tests of independence, especially when the dimensions are relatively large or the moment conditions required by the distance correlation are violated.
Median bias reduction of maximum likelihood estimates
2017-09-04
Abstract
For regular parametric problems, we show how median centring of the maximum likelihood estimate can be achieved by a simple modification of the score equation. For a scalar parameter of interest, the estimator is equivariant under interest-respecting reparameterizations and is third-order median unbiased. With a vector parameter of interest, componentwise equivariance and third-order median centring are obtained. Like the implicit method of Firth (1993) for bias reduction, the new method does not require finiteness of the maximum likelihood estimate and is effective in preventing infinite estimates. Simulation results for continuous and discrete models, including binary and beta regression, confirm that the method succeeds in achieving componentwise median centring and in solving the boundary estimate problem, while keeping comparable dispersion and the same approximate distribution as its main competitors.
Dependent generalized functional linear models
2017-09-02
Summary
This paper considers testing for no effect of functional covariates on response variables in multivariate regression. We use generalized estimating equations to determine the underlying parameters and establish their joint asymptotic normality. This is then used to test the significance of the effect of predictors on the vector of response variables. Simulations demonstrate the importance of considering existing correlation structures in the data. To explore the effect of treating genetic data as a function, we perform a simulation study using gene sequencing data and find that the performance of our test is comparable to that of another popular method used in sequencing studies. We present simulations to explore the behaviour of our test under varying sample size, cluster size and dimension of the parameter to be estimated, and an application where we are able to confirm known associations between nicotine dependence and neuronal nicotinic acetylcholine receptor subunit genes.