Computes Ball Covariance and Ball Correlation statistics, which are generic dependence measures in Banach spaces.

bcor(x, y, distance = FALSE, weight = FALSE)

bcov(x, y, distance = FALSE, weight = FALSE)

Arguments

x

a numeric vector, matrix, data.frame, or a list containing at least two numeric vectors, matrices, or data.frames.

y

a numeric vector, matrix, or data.frame.

distance

if distance = TRUE, the elements of x and y are considered as distance matrices.

weight

a logical or character string used to choose the weight form of Ball Covariance statistic.. If input is a character string, it must be one of "constant", "probability", or "chisquare". Any unambiguous substring can be given. If input is a logical value, it is equivalent to weight = "probability" if weight = TRUE while equivalent to weight = "constant" if weight = FALSE. Default: weight = FALSE.

Value

bcor

Ball Correlation statistic.

bcov

Ball Covariance statistic.

Details

The sample sizes of the two variables must agree, and samples must not contain missing and infinite values. If we set distance = TRUE, arguments x, y can be a dist object or a symmetric numeric matrix recording distance between samples; otherwise, these arguments are treated as data.

bcov and bcor compute Ball Covariance and Ball Correlation statistics.

Ball Covariance statistics is a generic dependence measure in Banach spaces. It enjoys the following properties:

  • It is nonnegative and it is equal to zero if and only if variables are unassociated;

  • It is highly robust;

  • It is distribution-free and model-free;

  • it is interesting that the HHG is a special case of Ball Covariance statistics.

Ball correlation statistics, a normalized version of Ball Covariance statistics, generalizes Pearson correlation in two fundamental ways:

  • It is well-defined for random variables in arbitrary dimension in Banach spaces

  • BCor is equal to zero implies random variables are unassociated.

The definitions of the Ball Covariance and Ball Correlation statistics between two random variables are as follows. Suppose, we are given pairs of independent observations \(\{(x_1, y_1),...,(x_n,y_n)\}\), where \(x_i\) and \(y_i\) can be of any dimension and the dimensionality of \(x_i\) and \(y_i\) need not be the same. Then, we define sample version Ball Covariance as: $$\mathbf{BCov}_{\omega, n}^{2}(X, Y)=\frac{1}{n^{2}}\sum_{i,j=1}^{n}{(\Delta_{ij,n}^{X,Y}-\Delta_{ij,n}^{X}\Delta_{ij,n}^{Y})^{2}} $$ where: $$ \Delta_{ij,n}^{X,Y}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{X} \delta_{ij,k}^{Y}}, \Delta_{ij,n}^{X}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{X}}, \Delta_{ij,n}^{Y}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{Y}} $$ $$\delta_{ij,k}^{X} = I(x_{k} \in \bar{B}(x_{i}, \rho(x_{i}, x_{j}))), \delta_{ij,k}^{Y} = I(y_{k} \in \bar{B}(y_{i}, \rho(y_{i}, y_{j})))$$ Among them, \(\bar{B}(x_{i}, \rho(x_{i}, x_{j}))\) is a closed ball with center \(x_{i}\) and radius \(\rho(x_{i}, x_{j})\). Similarly, we can define \( \mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{X}) \) and \( \mathbf{BCov}_{\omega,n}^2(\mathbf{Y},\mathbf{Y}) \). We define Ball Correlation statistic as follows. $$\mathbf{BCor}_{\omega,n}^2(\mathbf{X},\mathbf{Y})= \mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{Y})/\sqrt{\mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{X})\mathbf{BCov}_{\omega,n}^2(\mathbf{Y},\mathbf{Y})} $$

We can extend \(\mathbf{BCov}_{\omega,n}\) to measure the mutual independence between \(K\) random variables: $$\frac{1}{n^{2}}\sum_{i,j=1}^{n}{\left[ (\Delta_{ij,n}^{X_{1}, ..., X_{K}}-\prod_{k=1}^{K}\Delta_{ij,n}^{X_{k}})^{2}\prod_{k=1}^{K}{\hat{\omega}_{k}(X_{ki},X_{kj})} \right]}$$ where \(X_{k}(k=1,\ldots,K)\) are random variables and \(X_{ki}\) is the \(i\)-th observations of \(X_{k}\).

See bcov.test for a test of independence based on the Ball Covariance statistic.

References

Wenliang Pan, Xueqin Wang, Heping Zhang, Hongtu Zhu & Jin Zhu (2019) Ball Covariance: A Generic Measure of Dependence in Banach Space, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1543600

Wenliang Pan, Xueqin Wang, Weinan Xiao & Hongtu Zhu (2018) A Generic Sure Independence Screening Procedure, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1462709

Jin Zhu, Wenliang Pan, Wei Zheng, and Xueqin Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, Vol.97(6), doi: 10.18637/jss.v097.i06.

See also

Examples

############# Ball Correlation #############
num <- 50
x <- 1:num
y <- 1:num
bcor(x, y)
#> bcor.constant 
#>             1 
bcor(x, y, weight = "prob")
#> bcor.probability 
#>                1 
bcor(x, y, weight = "chisq")
#> bcor.chisquare 
#>              1 
############# Ball Covariance #############
num <- 50
x <- rnorm(num)
y <- rnorm(num)
bcov(x, y)
#> bcov.constant 
#>  0.0005716833 
bcov(x, y, weight = "prob")
#> bcov.probability 
#>       0.02801657 
bcov(x, y, weight = "chisq")
#> bcov.chisquare 
#>     0.05282778