publications
preprints
- PreprintOn universal inference in Gaussian mixture modelsHongjian Shi, and Mathias Drton2024
Recent work on game-theoretic statistics and safe anytime-valid inference (SAVI) provides new tools for statistical inference without assuming any regularity conditions. In particular, the framework of universal inference proposed by Wasserman, Ramdas, and Balakrishnan (2020) offers new solutions by modifying the likelihood ratio test in a data-splitting scheme. In this paper, we study the performance of the resulting split likelihood ratio test under Gaussian mixture models, which are canonical examples for models in which classical regularity conditions fail to hold. We first establish that under the null hypothesis, the split likelihood ratio statistic is asymptotically normal with increasing mean and variance. Moreover, contradicting the usual belief that the flexibility of SAVI and universal methods comes at the price of a significant loss of power, we are able to prove that universal inference surprisingly achieves the same detection rate (n^−1 \log \log n)^1/2 as the classical likelihood ratio test.
journal articles
2025
- Distribution-free tests of multivariate independence based on center-outward quadrant, Spearman, Kendall, and van der Waerden statisticsHongjian Shi, Mathias Drton, Marc Hallin, and Fang HanBernoulli 2025
Due to the lack of a canonical ordering in ℝ^d for d>1, defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a solution in which multivariate ranks are obtained by transporting data points to a grid that approximates a uniform reference measure (Chernozhukov et al., 2017, Hallin, 2017, Hallin et al., 2021), thereby inducing ranks, signs, and a data-driven ordering of ℝ^d. We take up this new perspective to define and study multivariate analogues of the sign covariance/quadrant statistic, Spearman’s rho, Kendall’s tau, and van der Waerden covariances. The resulting tests of multivariate independence are fully distribution-free, hence uniformly valid irrespective of the actual (absolutely continuous) distribution of the observations. Our results provide the asymptotic distribution theory for these new test statistics, with asymptotic approximations to critical values to be used for testing independence between random vectors, as well as a power analysis of the resulting tests in an extension of the so-called (bivariate) Konijn model. This power analysis includes a multivariate Chernoff–Savage property guaranteeing that, under elliptical generalized Konijn models, the asymptotic relative efficiency of our van der Waerden tests with respect to Wilks’ classical (pseudo-)Gaussian procedure is strictly larger than or equal to one, where equality is achieved under Gaussian distributions only. We similarly provide a lower bound for the asymptotic relative efficiency of our Spearman procedure with respect to Wilks’ test, thus extending the classical result by Hodges and Lehmann on the asymptotic relative efficiency, in univariate location models, of Wilcoxon tests with respect to the Student ones.
2024
- On Azadkia-Chatterjee’s conditional dependence coefficientHongjian Shi, Mathias Drton, and Fang HanBernoulli 2024
In recent work, Azadkia and Chatterjee (2021) laid out an ingenious approach to defining consistent measures of conditional dependence. Their fully nonparametric approach forms statistics based on ranks and nearest neighbor graphs. The appealing nonparametric consistency of the resulting conditional dependence measure and the associated empirical conditional dependence coefficient has quickly prompted follow-up work that seeks to study its statistical efficiency. In this paper, we take up the framework of conditional randomization tests (CRT) for conditional independence and conduct a power analysis that considers two types of local alternatives, namely, parametric quadratic mean differentiable alternatives and nonparametric Hölder smooth alternatives. Our local power analysis shows that conditional independence tests using the Azadkia–Chatterjee coefficient remain inefficient even when aided with the CRT framework, and serves as motivation to develop variants of the approach; cf. Lin and Han (2022+). As a byproduct, we resolve a conjecture of Azadkia and Chatterjee by proving central limit theorems for the considered conditional dependence coefficients, with explicit formulas for the asymptotic variances.
2023
2022
- On the power of Chatterjee’s rank correlationHongjian Shi, Mathias Drton, and Fang HanBiometrika 2022
Chatterjee (2021) introduced a simple new rank correlation coefficient that has attracted much attention recently. The coefficient has the unusual appeal that it not only estimates a population quantity first proposed by Dette et al. (2013) that is zero if and only if the underlying pair of random variables is independent, but also is asymptotically normal under independence. This paper compares Chatterjee’s new correlation coefficient with three established rank correlations that also facilitate consistent tests of independence, namely Hoeffding’s D, Blum–Kiefer–Rosenblatt’s R, and Bergsma–Dassios–Yanagimoto’s τ∗. We compare the computational efficiency of these rank correlation coefficients in light of recent advances, and investigate their power against local rotation and mixture alternatives. Our main results show that Chatterjee’s coefficient is unfortunately rate-suboptimal compared to D, R and τ∗. The situation is more subtle for a related earlier estimator of Dette et al. (2013). These results favour D, R and τ∗ over Chatterjee’s new correlation coefficient for the purpose of testing independence.
- On universally consistent and fully distribution-free rank tests of vector independenceHongjian Shi, Marc Hallin, Mathias Drton, and Fang HanAnn. Statist. 2022
Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical Hájek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-n neighborhoods within the class of quadratic mean differentiable alternatives.
- Distribution-free consistent independence tests via center-outward ranks and signsHongjian Shi, Mathias Drton, and Fang HanJ. Amer. Statist. Assoc. 2022
This article investigates the problem of testing independence of two random vectors of general dimensions. For this, we give for the first time a distribution-free consistent test. Our approach combines distance covariance with the center-outward ranks and signs developed by Marc Hallin and collaborators. In technical terms, the proposed test is consistent and distribution-free in the family of multivariate distributions with nonvanishing (Lebesgue) probability densities. Exploiting the (degenerate) U-statistic structure of the distance covariance and the combinatorial nature of Hallin’s center-outward ranks and signs, we are able to derive the limiting null distribution of our test statistic. The resulting asymptotic approximation is accurate already for moderate sample sizes and makes the test implementable without requiring permutation. The limiting distribution is derived via a more general result that gives a new type of combinatorial noncentral limit theorem for double- and multiple-indexed permutation statistics. Supplementary materials for this article are available online.
2020
- High-dimensional consistent independence testing with maxima of rank correlationsMathias Drton, Fang Han, and Hongjian ShiAnn. Statist. 2020
Testing mutual independence for high-dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting nonlinear, nonmonotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence. Built upon a newly developed Cramér-type moderate deviation theorem for degenerate U-statistics, our results cover a variety of rank correlations including Hoeffding’s D, Blum–Kiefer–Rosenblatt’s R and Bergsma–Dassios–Yanagimoto’s τ∗. The proposed tests are distribution-free in the class of multivariate distributions with continuous margins, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model. As a by-product of the study, we reveal an identity between the aforementioned three rank correlation statistics, and hence make a step towards proving a conjecture of Bergsma and Dassios.
Ph.D. thesis
- Ph.D.Distribution-Free Consistent Tests of Independence via Marginal and Multivariate RanksHongjian Shi2021
Testing independence is a fundamental statistical problem that has received much attention in literature. In this dissertation, we consider testing independence under two different settings. The first is testing mutual independence of many covariates, and the second is testing independence of two random vectors. For both settings, we propose, for the first time, distribution-free and consistent tests of independence via marginal or multivariate ranks. Moreover, we establish the optimal efficiency in the statistical sense of both tests. In addition, we also investigate the power of a simple consistent rank correlation coefficient recently proposed by Chatterjee (2021) against local alternatives. Our results show that Chatterjee’s coefficient is unfortunately statistically inefficient.