Subscribe: Biometrika - current issue
Language: English
Tags:
conditional  data  distribution  estimation  estimator  likelihood  method  models  proposed  regression  statistic  test  tests
Rate this Feed

Feed Details and Statistics
Preview: Biometrika - current issue

Biometrika Current Issue

Published: Tue, 23 Jan 2018 00:00:00 GMT

Last Build Date: Mon, 12 Feb 2018 09:47:46 GMT

Dual regression

Fri, 19 Jan 2018 00:00:00 GMT

Summary
We propose dual regression as an alternative to quantile regression for the global estimation of conditional distribution functions. Dual regression provides the interpretational power of quantile regression while avoiding the need to repair intersecting conditional quantile surfaces. We introduce a mathematical programming characterization of conditional distribution functions which, in its simplest form, is the dual program of a simultaneous estimator for linear location-scale models, and use it to specify and estimate a flexible class of conditional distribution functions. We present asymptotic theory for the corresponding empirical dual regression process.

Scalar-on-image regression via the soft-thresholded Gaussian process

Fri, 19 Jan 2018 00:00:00 GMT

Summary
This work concerns spatial variable selection for scalar-on-image regression. We propose a new class of Bayesian nonparametric models and develop an efficient posterior computational algorithm. The proposed soft-thresholded Gaussian process provides large prior support over the class of piecewise-smooth, sparse, and continuous spatially varying regression coefficient functions. In addition, under some mild regularity conditions the soft-thresholded Gaussian process prior leads to the posterior consistency for parameter estimation and variable selection for scalar-on-image regression, even when the number of predictors is larger than the sample size. The proposed method is compared to alternatives via simulation and applied to an electroencephalography study of alcoholism.

On overfitting and post-selection uncertainty assessments

Thu, 04 Jan 2018 00:00:00 GMT

Summary
In a regression context, when the relevant subset of explanatory variables is uncertain, it is common to use a data-driven model selection procedure. Classical linear model theory, applied naively to the selected submodel, may not be valid because it ignores the selected submodel’s dependence on the data. We provide an explanation of this phenomenon, in terms of overfitting, for a class of model selection criteria.

Choosing between methods of combining $p$-values

Thu, 04 Jan 2018 00:00:00 GMT

Summary
Combining $p$-values from independent statistical tests is a popular approach to meta-analysis, particularly when the data underlying the tests are either no longer available or are difficult to combine. Numerous $p$-value combination methods appear in the literature, each with different statistical properties, yet often the final choice used in a meta-analysis can seem arbitrary, as if all effort has been expended in building the models that gave rise to the $p$-values. Birnbaum (1954) showed that any reasonable $p$-value combiner must be optimal against some alternative hypothesis. Starting from this perspective and recasting each method of combining $p$-values as a likelihood ratio test, we present theoretical results for some standard combiners that provide guidance on how a powerful combiner might be chosen in practice.

On bias reduction and incidental parameters

Wed, 03 Jan 2018 00:00:00 GMT

Summary
Firth (1993) introduced a method for reducing the bias of the maximum likelihood estimator. Here it is shown that the approach is also effective in reducing the sensitivity of inferential procedures to incidental parameters.

Shape-constrained partial identification of a population mean under unknown probabilities of sample selection

Tue, 26 Dec 2017 00:00:00 GMT

Summary
Estimating a population mean from a sample obtained with unknown selection probabilities is important in the biomedical and social sciences. Using a ratio estimator, Aronow & Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed values. In this paper, we show how to use auxiliary shape constraints on the population outcome distribution, such as symmetry or log-concavity, to obtain tighter bounds on the population mean. We use this method to estimate the performance of Aymara students, an ethnic minority in the north of Chile, in a national educational standardized test. We implement this method in the R package scbounds.

Robust and consistent variable selection in high-dimensional generalized linear models

Mon, 25 Dec 2017 00:00:00 GMT

Summary
Generalized linear models are popular for modelling a large variety of data. We consider variable selection through penalized methods by focusing on resistance issues in the presence of outlying data and other deviations from assumptions. We highlight the weaknesses of widely-used penalized M-estimators, propose a robust penalized quasilikelihood estimator, and show that it enjoys oracle properties in high dimensions and is stable in a neighbourhood of the model. We illustrate its finite-sample performance on simulated and real data.

Approximate Bayesian inference under informative sampling

Mon, 18 Dec 2017 00:00:00 GMT

Summary
Statistical inference with complex survey data is challenging because the sampling design can be informative, and ignoring it can produce misleading results. Current methods of Bayesian inference under complex sampling assume that the sampling design is noninformative for the specified model. In this paper, we propose a Bayesian approach which uses the sampling distribution of a summary statistic to derive the posterior distribution of the parameters of interest. Asymptotic properties of the method are investigated. It is directly applicable to combining information from two independent surveys and to calibration estimation in survey sampling. A simulation study confirms that it can provide valid estimation under informative sampling. We apply it to a measurement error problem using data from the Korean Longitudinal Study of Aging.

On causal estimation using $U$-statistics

Thu, 14 Dec 2017 00:00:00 GMT

Summary
We introduce a general class of causal estimands which extends the familiar notion of average treatment effect. The class is defined by a contrast function, prespecified to quantify the relative favourability of one outcome over another, averaged over the marginal distributions of two potential outcomes. Natural estimators arise in the form of $U$-statistics. We derive both a naive inverse propensity score weighted estimator and a class of locally efficient and doubly robust estimators. The usefulness of our theory is illustrated by two examples, one for causal estimation with ordinal outcomes, and the other for causal tests that are robust with respect to outliers.

A structural Markov property for decomposable graph laws that allows control of clique intersections

Mon, 11 Dec 2017 00:00:00 GMT

Summary
We present a new kind of structural Markov property for probabilistic laws on decomposable graphs, which allows the explicit control of interactions between cliques and so is capable of encoding some interesting structure. We prove the equivalence of this property to an exponential family assumption, and discuss identifiability, modelling, inferential and computational implications.

Kernel-based covariate functional balancing for observational studies

Fri, 08 Dec 2017 00:00:00 GMT

Summary
Covariate balance is often advocated for objective causal inference since it mimics randomization in observational data. Unlike methods that balance specific moments of covariates, our proposal attains uniform approximate balance for covariate functions in a reproducing-kernel Hilbert space. The corresponding infinite-dimensional optimization problem is shown to have a finite-dimensional representation in terms of an eigenvalue optimization problem. Large-sample results are studied, and numerical examples show that the proposed method achieves better balance with smaller sampling variability than existing methods.

Testing for the presence of significant covariates through conditional marginal regression

Fri, 08 Dec 2017 00:00:00 GMT

Summary
Researchers sometimes have a priori information on the relative importance of predictors that can be used to screen out covariates. An important question is whether any of the discarded covariates have predictive power when the most relevant predictors are included in the model. We consider testing whether any discarded covariate is significant conditional on some pre-chosen covariates. We propose a maximum-type test statistic and show that it has a nonstandard asymptotic distribution, giving rise to the conditional adaptive resampling test. To accommodate signals of unknown sparsity, we develop a hybrid test statistic, which is a weighted average of maximum- and sum-type statistics. We prove the consistency of the test procedure under general assumptions and illustrate how it can be used as a stopping rule in forward regression. We show, through simulation, that the proposed method provides adequate control of the familywise error rate with competitive power for both sparse and dense signals, even in high-dimensional cases, and we demonstrate its advantages in cases where the covariates are heavily correlated. We illustrate the application of our method by analysing an expression quantitative trait locus dataset.

Partial likelihood estimation of isotonic proportional hazards models

Tue, 05 Dec 2017 00:00:00 GMT

Summary
We consider the estimation of the semiparametric proportional hazards model with an unspecified baseline hazard function where the effect of a continuous covariate is assumed to be monotone. Previous work on nonparametric maximum likelihood estimation for isotonic proportional hazard regression with right-censored data is computationally intensive, lacks theoretical justification, and may be prohibitive in large samples. In this paper, partial likelihood estimation is studied. An iterative quadratic programming method is considered, which has performed well with likelihoods for isotonic parametric regression models. However, the iterative quadratic programming method for the partial likelihood cannot be implemented using standard pool-adjacent-violators techniques, increasing the computational burden and numerical instability. The iterative convex minorant algorithm which uses pool-adjacent-violators techniques has also been shown to perform well in related parametric likelihood set-ups, but evidences computational difficulties under the proportional hazards model. An alternative pseudo-iterative convex minorant algorithm is proposed which exploits the pool-adjacent-violators techniques, is theoretically justified, and exhibits computational stability. A separate estimator of the baseline hazard function is provided. The algorithms are extended to models with time-dependent covariates. Simulation studies demonstrate that the pseudo-iterative convex minorant algorithm may yield orders-of-magnitude reduction in computing time relative to the iterative quadratic programming method and the iterative convex minorant algorithm, with moderate reductions in the bias and variance of the estimators. Analysis of data from a recent HIV prevention study illustrates the practical utility of the isotonic methodology in estimating nonlinear, monotonic covariate effects.

Simple least squares estimator for treatment effects using propensity score residuals

Tue, 05 Dec 2017 00:00:00 GMT

Summary
Propensity score matching is widely used to control covariates when analysing the effects of a nonrandomized binary treatment. However, it requires several arbitrary decisions, such as how many matched subjects to use and how to choose them. In this paper a simple least squares estimator is proposed, where the treatment, and possibly the response variable, is replaced by the propensity score residual. The proposed estimator controls covariates semiparametrically if the propensity score function is correctly specified. Furthermore, it is numerically stable and relatively easy to use, compared with alternatives such as matching, regression imputation, weighting, and doubly robust estimators. The proposed estimator also has a simple valid asymptotic variance estimator that works well in small samples. The least squares estimator is extended to multiple treatments and noncontinuously distributed responses. A simulation study demonstrates that it has lower mean squared error than its competitors.

A conditional composite likelihood ratio test with boundary constraints

Sat, 25 Nov 2017 00:00:00 GMT

Summary
Composite likelihood has been widely used in applications. The asymptotic distribution of the composite likelihood ratio statistic at the boundary of the parameter space is a complicated mixture of weighted $\chi^2$ distributions. In this paper we propose a conditional test with data-dependent degrees of freedom. We consider a modification of the composite likelihood which satisfies the second-order Bartlett identity. We show that the modified composite likelihood ratio statistic given the number of estimated parameters lying on the boundary converges to a simple $\chi^2$ distribution. This conditional testing procedure is validated through simulation studies.

A robust goodness-of-fit test for generalized autoregressive conditional heteroscedastic models

Wed, 22 Nov 2017 00:00:00 GMT

Summary
The estimation of time series models with heavy-tailed innovations has been widely discussed, but corresponding goodness-of-fit tests have attracted less attention, primarily because the autocorrelation function commonly used in constructing goodness-of-fit tests necessarily imposes certain moment conditions on the innovations. As a bounded random variable has finite moments of all orders, we address the problem by first transforming the residuals with a bounded function. More specifically, we consider the sample autocorrelation function of the transformed absolute residuals of a fitted generalized autoregressive conditional heteroscedastic model. With the corresponding residual empirical distribution function naturally employed as the transformation, a robust goodness-of-fit test is then constructed. The asymptotic distributions of the test statistic under the null hypothesis and local alternatives are derived, and Monte Carlo experiments are conducted to examine finite-sample properties. The proposed test is shown to be more powerful than existing tests when the innovations are heavy-tailed.

Two-sample tests of high-dimensional means for compositional data

Fri, 03 Nov 2017 00:00:00 GMT

Summary
Compositional data are ubiquitous in many scientific endeavours. Motivated by microbiome and metagenomic research, we consider a two-sample testing problem for high-dimensional compositional data and formulate a testable hypothesis of compositional equivalence for the means of two latent log basis vectors. We propose a test through the centred log-ratio transformation of the compositions. The asymptotic null distribution of the test statistic is derived and its power against sparse alternatives is investigated. A modified test for paired samples is also considered. Simulations show that the proposed tests can be significantly more powerful than tests that are applied to the raw and log-transformed compositions. The usefulness of our tests is illustrated by applications to gut microbiome composition in obesity and Crohn’s disease.

A randomization-based perspective on analysis of variance: a test statistic robust to treatment effect heterogeneity

Tue, 31 Oct 2017 00:00:00 GMT

Summary
Fisher randomization tests for Neyman’s null hypothesis of no average treatment effect are considered in a finite-population setting associated with completely randomized experiments involving more than two treatments. The consequences of using the $F$ statistic to conduct such a test are examined, and we argue that under treatment effect heterogeneity, use of the $F$ statistic in the Fisher randomization test can severely inflate the Type I error under Neyman’s null hypothesis. We propose to use an alternative test statistic, derive its asymptotic distributions under Fisher’s and Neyman’s null hypotheses, and demonstrate its advantages through simulations.

Optimal discrimination designs for semiparametric models

Thu, 26 Oct 2017 00:00:00 GMT

Summary
Much work on optimal discrimination designs assumes that the models of interest are fully specified, apart from unknown parameters. Recent work allows errors in the models to be nonnormally distributed but still requires the specification of the mean structures. Otsu (2008) proposed optimal discriminating designs for semiparametric models by generalizing the Kullback–Leibler optimality criterion proposed by López-Fidalgo et al. (2007). This paper develops a relatively simple strategy for finding an optimal discrimination design. We also formulate equivalence theorems to confirm optimality of a design and derive relations between optimal designs found here for discriminating semiparametric models and those commonly used in optimal discrimination design problems.