Abstract

In this paper, we proposed a new testing statistic for testing the equality of mean vectors from two multivariate normal populations when the covariance matrices are unknown and unequal in high–dimensional data. A new test is proposed based on the idea of keeping more information from the sample covariance matrices as much as possible. A proposed test is invariant under scalar transformations and location shifts. We showed that the asymptotic distribution of proposed statistic is standard normal distribution when number of random variables approach infinity. We also compared the performance of the proposed test with other three existing tests by the simulation study. The simulation results showed that the attained significance level of proposed test close to setting nominal significance level satisfactorily. The attained power of proposed test outperforms as the other comparative tests under form of covariance matrices considered which can be arranged to block diagonal matrix structure. The attained power becomes more powerful when the dimension increases for a given sample size or vice versa, or relationship level between random variables in each sample increases. Finally, the proposed test is also illustrated with an analysis of DNA microarray data.

Highlights

  • IntroductionData collecting technology is rapidly evolving. Its evolution makes the statistical methods going to two directions

  • Since theoretically the proposed test statistics T based on the solution to approximation distribution of T 2 by Krishnamoorthy and Yu (2004), so it only requires block sizes as qk ≤ vk – 6, ∀k, k = 1,2,...,m, whereas they gives recommendations about their solution that this solution has the attained significance level are very close to the nominal level provided p ≤ min(n1 – 1,n2 – 1)/5 in unequal sample size cases and this condition is somewhat relaxed to p ≤ n/4, in equal sample size cases (n1 = n2 = n)

  • The attained significance level values which is closest to the nominal significance level 0.05 in each row in each table are shown in bold and the last row of each table provides the Average Absolute Discrepancy (AAD) between the nominal significance level and the estimated attained significance over that 10 conditions computed by AAD = ∑ α − 0.05 10 (Yanagihara and Yuan, 2005), a smaller AAD value indicates better overall performance of the other competing tests in 10 situations of maintaining the nominal significance level

Read more

Summary

Introduction

Data collecting technology is rapidly evolving. Its evolution makes the statistical methods going to two directions. High–dimensional data appears in various fields, such as online data from markets around the world are accumulated on a Giga–octet basis every day in financial studies, gene expression data that collects from DNA microarray technology in genetic experiments (Yao et al, 2015) In such high–dimensional data, classical multivariate statistical methods is not often applicable because they involved with the inversion of sample covariance matrix which does not exist. In high–dimensional data, for one population when the data has the number of variable exceed sample size (minus 1), p > ni – 1, for example the data that collects from DNA microarrays technology where a large number of gene expression levels may be in the hundreds or thousands, are measured on relatively few subjects (Zhou et al, 2017), the sample covariance matrix Si lose its full rank and will be singular, which makes Si does not have an inverse (Chongcharoen, 2011).

A Proposed Test Statistic and Its Asymptotic Distribution
Simulation Results
A Real Data Example
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call