---
title: "MultiStatM: overview"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Overview}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(MultiStatM)
```
## Background
The package `MultiStatM` provides general formulae for set partitions,
multivariate moments and cumulants, vector Hermite polynomials. It
provides theoretical formulae for some important symmetric and
asymmetric multivariate distributions and well as estimation functions
for multivariate moments and cumulants and connected measures of
multivariate skewness and kurtosis.
The formulae implemented in the package can be found in the book
"Multivariate Statistical Methods - Going Beyond the Linear", Springer
2021 by Gy.Terdik and are fully general. For example, in the conversion
formulae from multivariate moment to multivariate cumulants, given any
list of (numerical) multivariate moments up to order $k$, the conversion
formula provides all multivariate cumulants up to order $k$; this
differs to a large degree from the formulae provided in the package
`kStatistics` (Di Nardo and Guarino, 2022) which calculates one by one
(individually) the cumulants of order $r$ which are the entries of our
cumulant vectors.
The packages `MaxSkew` and `MultiSkew` (Franceschini and Loperfido
(2017a,b)) for detecting, measuring and removing multivariate skewness,
computes the third multivariate cumulant of either the raw, centered or
standardized data; s the main measures of multivariate skewness,
together with their bootstrap distributions and provides orthogonal data
projections with maximal skewness.
The package `matrixcalc` (Novomestky (2021)) provides the Commutation
matrix, Elimination matrix, Duplication matrix for Cartesian tensor
products of two vectors, which are particular cases of those provided in
the package `MultiStatM`.
The package `sn` ( Azzalini (2022)) discusses for the skew-normal and
the skew-t distributions, statistical methods are provided for data
fitting and model diagnostics, in the univariate and the multivariate
case. Random numbers generator for multivariate skew distributions are
provided. In the package `MultiStatM` complete formulae for theoretical
multivariate moments and cumulants of any order are implemented.
The package `moments` (Komsta and Novomestky (2022)) deals with
functions to calculate moments, cumulants, Pearson's kurtosis, Geary's
kurtosis and skewness; tests related to them from univariate data.
A careful study of the cumulants is a necessary and typical part of
nonlinear statistics. Such a study of cumulants for multivariate
distributions is made complicated by the index notations. One solution
to this problem is the usage of tensor analysis. In this package (and
the connected book) we offer an alternate method, which we believe is
simpler to follow. The higher-order cumulants with the same degree for a
multivariate vector can be collected together and kept as a vector. To
be able to do so, we introduce a particular differential operator on a
multivariate function, called the T -derivative, and use it to obtain
cumulants and provide results which are somewhat analogous to well-known
results in the univariate case.
More specifically, with the symbol $\otimes$ denoting the Cartesian
tensor product, consider the operator
$D_{\boldsymbol{\lambda}}^{\otimes}$, which we refer to as the
$\operatorname{T}$-derivative; see Jammalamadaka et al. (2006) for
details. For any function $\boldsymbol{\phi}(\boldsymbol{\lambda})$,
the\~$\operatorname{T}$-derivative is defined as
\begin{equation}\label{Tderiv}
D_{\boldsymbol{\lambda}}^{\otimes}\boldsymbol{\boldsymbol{\phi}}%
(\boldsymbol{\lambda})=\operatorname{vec}\left(\left( \frac{\partial\boldsymbol{\phi
}(\boldsymbol{\lambda})}{\partial\boldsymbol{\lambda}^{\top}}\right) ^{\top
}\right)=\boldsymbol{\phi}(\boldsymbol{\lambda})\otimes\frac{\partial}{\partial
\boldsymbol{\lambda}}.%
\end{equation} ${\boldsymbol{\phi}}$ is $k$-times differentiable,
with\~its $k$-th $\operatorname{T}$-derivative
$D_{\boldsymbol{\lambda}}^{\otimes k}\boldsymbol{\boldsymbol{\phi} }(\boldsymbol{\lambda})=D_{\boldsymbol{\lambda}}^{\otimes}\left( D_{\boldsymbol{\lambda}}^{\otimes k-1}\boldsymbol{\boldsymbol{\phi} }(\boldsymbol{\lambda})\right)$.
In the following we demonstrate the use of this technique through the
characterization of several multivariate distributions via their
cumulants and by extending the discussion to statistical inference for
multivariate skewness and kurtosis.
We note that Kollo (2006) provides formulae for cumulants in terms of
matrices; however, retaining a matrix structure for all higher-order
cumulants leads to high-dimensional matrices with special symmetric
structures which are quite hard to follow notionally and
computationally. McCullagh (2018) provides quite an elegant approach
using tensor methods; however, tensor methods are not very well known
and computationally not so simple.
The method discussed here is based on relatively simple calculus.
Although the tensor product of Euclidean vectors is not commutative, it
has the advantage of permutation equivalence and allows one to obtain
general results for cumulants and moments of any order, as it will be
demonstrated in this paper, where general formulae, suitable for
algorithmic implementation through a computer software, will be
provided.
Methods based on a matrix approach do not provide this type of result;
see also (Ould-Baba (2015), which goes as far as the sixth-order moment
matrices, whereas there is no such limitation in our derivations and our
results. For further discussion, one can see also Kolda (2009) and Qi
(2006).
In `MultiStatM` 2.0.0 many functions of the previous version 1.2.1 have been renamed or grouped for better clarity and joining similar functions producing, for example the same output for univariate or multivariate cases
The table below provides a complete plan of transition from `MultiStatM` 1.2.1 to `MultiStatM` 2.0.0. Functions of version 1.2.1 which are within the same rowhave been grouped into a single function. For example the functions `conv_Cum2Mom` and
`conv_Cum2MomMulti` which provided cumulants to moments conversion respectively in the univariate and multivariate cases have been joined in the function `Cum2Mom` which has now an option `Type=c("Univariate","Multivariate")`.
+------------------------+-----------------------------+---------------+
| **MultiStatM 1.2.1** | **MultiStatM 2.0.0** | **Family** |
+========================+=============================+===============+
|conv_Cum2Mom | Cum2Mom, | Moments and |
|conv_Cum2MomMulti | Type=Univariate/Multivariate| cumulants |
+------------------------+-----------------------------+---------------+
|conv_Mom2Cum | Mom2Cum, | Moments and |
|conv_Mom2CumMulti | Type=Univariate/Multivariate| cumulants |
+------------------------+-----------------------------+---------------+
|conv_Stand_Multi | MVStandardize | |
+------------------------+-----------------------------+---------------+
|distr_CFUSN_MomCum_Th | MomCumCFUSN | Moments and |
| | | cumulants |
+------------------------+-----------------------------+---------------+
|distr_SkewNorm_MomCum_Th| MomCumSkewNorm | Moments and |
| | | cumulants |
+------------------------+-----------------------------+---------------+
|distr_Uni_MomCum_Th | MomCumUniS | Moments and |
| | | cumulants |
+------------------------+-----------------------------+---------------+
|distr_ZabsM_MomCum_Th | MomCumZabs, | Moments and |
|distr_Zabs_MomCum_Th | Type=Univariate/Multivariate| cumulants |
+------------------------+-----------------------------+---------------+
|distr_SkewNorm_EVSK_Th | EVSKSkewNorm | Moments and |
| | | cumulants |
+------------------------+-----------------------------+---------------+
|distr_Uni_EVSK_Th | EVSKUniS | Moments and |
|distr_UniAbs_EVSK_Th | Type=Standard/Modulus | cumulants |
+------------------------+-----------------------------+---------------+
|distr_CFUSN_Rand | rCFUSN, | Random |
| | | Generation |
+------------------------+-----------------------------+---------------+
|distr_CFUSSD_Rand | rCFUSSD | Random |
| | | Generation |
+------------------------+-----------------------------+---------------+
|distr_SkewNorm_Rand | rSkewNorm | Random |
| | | Generation |
+------------------------+-----------------------------+---------------+
| distr_Uni_Rand | rUniS | Random |
| | | Generation |
+------------------------+-----------------------------+---------------+
|Esti_Kurt_Mardia | SampleKurt | Estimation |
|Esti_Kurt_MRSz | Type=Mardia/MRSz/Total | |
|Esti_Kurt_Total | | |
+------------------------+-----------------------------+---------------+
|Esti_Skew_Mardia | SampleSkew | Estimation |
| Esti_Skew_MRSz | Type=Mardia/MRSz | |
+------------------------+-----------------------------+---------------+
|Esti_Kurt_Variance_Th | VarianceKurt | Estimation |
+------------------------+-----------------------------+---------------+
|Esti_Skew_Variance_Th | VarianceSkew | Estimation |
+------------------------+-----------------------------+---------------+
| Esti_EVSK | SampleEVSK | Estimation |
+------------------------+-----------------------------+---------------+
|Esti_Hermite_Poly_HN_M | SampleHermiteN | Estimation |
+------------------------+-----------------------------+---------------+
|Esti_Gram_Charlier | SampleGC | Estimation |
+------------------------+-----------------------------+---------------+
| Esti_MMom_MCum | SampleMomCum | Estimation |
+------------------------+-----------------------------+---------------+
|Esti_Variance_Skew_Kurt | SampleVarianceSkewKurt | |
+------------------------+-----------------------------+---------------+
|Hermite_Coeff | HermiteCoeff | Hermite |
|Hermite_CoeffMulti | Type=Univariate/Multivariate| Polynomials |
+------------------------+-----------------------------+---------------+
|Hermite_Poly_HN | HermiteN | Hermite |
|Hermite_Poly_HN_Multi | Type=Univariate/Multivariate| Polynomials |
+------------------------+-----------------------------+---------------+
|Hermite_Poly_NH_Inv | HermiteN2X | Hermite |
|Hermite_Poly_NH_Multi_In| Type=Univariate/Multivariate| Polynomials |
+------------------------+-----------------------------+---------------+
| Hermite_Nth | Eliminated: use HermiteN | |
+------------------------+-----------------------------+---------------+
| Hermite_N_Cov_X1_X2 | HermiteCov12 | Hermite |
| | | Polynomials |
+------------------------+-----------------------------+---------------+
|indx_Commutator_Kmn | CommutatorIndx | Commutators |
|indx_Commutator_Kperm | Type=Kmn/Kperm/Mixing/Moment| |
|indx_Commutator_Mixing | | |
|indx_Commutator_Moment | | |
+------------------------+-----------------------------+---------------+
| indx_Elimination | EliminIndx | Commutators |
+------------------------+-----------------------------+---------------+
| indx_Qplication | QplicIndx | Commutators |
+------------------------+-----------------------------+---------------+
| indx_Symmetry | SymIndx | Commutators |
+------------------------+-----------------------------+---------------+
| indx_UnivMomCum | UnivMomCum | Commutators |
+------------------------+-----------------------------+---------------+
| matr_Commutator_Kmn | CommutatorMatr | Commutators |
| matr_Commutator_Kperm | Type=Kmn/Kperm/Mixing/Moment| |
| matr_Commutator_Mixing | | |
| matr_Commutator_Moment | | |
+------------------------+-----------------------------+---------------+
| matr_Elimination | EliminMatr | Commutators |
+------------------------+-----------------------------+---------------+
| matr_Qplication | QplicMatr | Commutators |
+------------------------+-----------------------------+---------------+
| matr_Symmetry | SymMatr | Commutators |
+------------------------+-----------------------------+---------------+
|Partition_2Perm | Partitions | Partitions |
|Partition_Diagrams | Type=2Perm/Diagram/Indecomp | | | ClosedNoLoops | /Pairs | |
|Partition_Indecomposable| | |
|Partition_Pairs | | |
+------------------------+-----------------------------+---------------+
|Permutation_Inverse | PermutationInv | Partitions |
+------------------------+-----------------------------+---------------+
|Partition_Type_All | PartitionTypeAll | Partitions |
+------------------------+-----------------------------+---------------+
## Set Partitions
`MultiStatM` provides several functions dealing with set partitions. Such functions provide some basic tools used to build the multivariate formulae for moments and cumulants in the following sections.
Generally a set of $N$ elements can be split into a set of disjoint subsets, i.e. it can be partitioned. The set of $N$ elements will correspond to set $1 : N = \{1, 2, \dots ,N\}$. If ${\cal{K}} = \{b_1, b_2, \dots , b_r \}$ where each $b_j \subset 1 : N$, then ${\cal{K}}$ is a partition provided
$\cup b_j = 1 : N$, each $b_j$ is non-empty and $b_j \cap b_i = \emptyset$ (the empty set) is disjoint
whenever $j \neq i$. The subsets $b_j$, $j = 1, 2, \dots, r$ are called the blocks of $\cal{K}$. We
will call $r$ (the number of the blocks in partition $\cal{K}$), the size of $\cal{K}$, and denote it by $|{\cal{K}}| = r$, and a partition with size $r$ will be denoted by ${\cal{K}}_{\{r\}}$. Let us denote the set
of all partitions of the numbers $1 : N$ by ${\cal{P}}_N$.
Consider next a partition ${\mathcal{K}}_{\{r\}}=\{b_{1},b_{2},\dots,b_{r}\}\in {\mathcal{P}}_{N}$, with size $r$. Denote the cardinality $k_{j}$
of a block in the partition ${\mathcal{K}}_{\{r\}}$, i.e. $k_{j}=|b_{j}|$.
The type of a partition ${\mathcal{K}}_{\{r\}}$ is $l=[l_{1},\dots ,l_{N}]$,
if ${\mathcal{K}}_{\{r\}}$ contains exactly $l_{j}$ blocks with cardinality $j$. The type $l$ is with length $N$ always. A partition with size $r$ and
type $l$ will be denoted by ${\mathcal{K}}_{\{r|l\}}$. It is clear that
$l_{j}\geq 0$, and $\sum_{j}jl_{j}=N$, and $\sum_{j}l_{j}=r$. Naturally, some
$l_{j}$'s are zero. A block constitutes a row vector of entries $0$'s and $1$'s with length $N$. The places of $1$'s correspond to the elements of the block. A partition matrix collects the rows of its blocks, it is an $r\times N$ matrix with column-sums $1$.
The basic function is `PartitionTypeAll` which provides complete information on the partition of a set of `N` elements, namely:
- `S_N_r`: a vector with the number of partitions of size `r=1`, `r=2`, etc. (Stirling numbers of the second kind); `S_N_r[r]` denotes the number of partition matrices
of size `r`.
- `Part.class`: the list of all possible partitions given as partition matrices. This list is enumerated according to `S_N_r[r]`, $r=1,2,\ldots N$, such that the partition matrices with
size `R` are listed from $\sum_{r,
Azzalini, A,, Dalla Valle, A. (1996) The multivariate skew-normal
distribution. Biometrika 83(4), 715--726
Chacón, J. E., & Duong, T. (2015). Efficient recursive algorithms for
functionals based on higher order derivatives of the multivariate
Gaussian density. Statistics and Computing, 25(5), 959-974.
Di Nardo, E. and Guarino, G. (2022). kStatistics: Unbiased Estimators
for Cumulant Products and Faa Di Bruno's Formula. R package version
2.1.1.
Franceschini, C. and Loperfido, N. (2017a). MultiSkew: Measures, Tests
and Removes Multivariate Skewness. R package version 1.1.1.
Franceschini, C. and Loperfido, N. (2017b). MaxSkew: Orthogonal Data
Projections with Maximal Skewness. R package version 1.1.
Holmquist, B. (1996). The d-variate vector Hermite polynomial of order.
Linear Algebra and Its Applications, 237/238, 155--190.
Jammalamadaka, S. R., Subba Rao, T. and Terdik, Gy. (2006). Higher order
cumulants of random vectors and applications to statistical inference
and time series. Sankhya A, 68, 326--356.
Jammalamadaka, S. R., Taufer, E. & Terdik, Gy. H. (2021a). On
multivariate skewness and kurtosis. Sankhya A, 83(2), 607-644.
Jammalamadaka, S. R., Taufer, E. & Terdik, Gy. H. (2021b). Asymptotic
theory for statistics based on cumulant vectors with applications.
Scandinavian Journal of Statistics, 48(2), 708-728.
Jammalamadaka,S.R. , Taufer,E. & Terdik, Gy. (2021c). Cumulants of
Multivariate Symmetric and Skew Symmetric Distributions, Symmetry 13,
1383.
Kolda, T.G.; Bader, B.W. (2009). Tensor decompositions and applications.
SIAM Rev. 51, 455--500.
Kollo, T. (2008). Multivariate skewness and kurtosis measures with an
application in ICA. Journal of Multivariate Analysis 99(10), 2328--2338.
Komsta, L. and Novomestky F. (2022). moments: Moments, Cumulants,
Skewness, Kurtosis and Related Tests. R package version 0.14.1.
Novomestky, F. (2021). matrixcalc: Collection of Functions for Matrix
Calculations. R package version 1.0-5.
Qi, L. (2006). Rank and eigenvalues of a supersymmetric tensor, the
multivariate homogeneous polynomial and the algebraic hypersurface it
defines. J. Symb. Comput. 41, 1309--1327.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis
with applications. Biometrika, 57, 519--530.
McCullagh, P. (2018). Tensor methods in statistics. Chapman and
Hall/CRC.
Móri, T. F., Rohatgi, V. K., & Székely, G. J. (1994). On multivariate
skewness and kurtosis. Theory of Probability & Its Applications, 38(3),
547--551.
Ould-Baba, H.; Robin, V.; Antoni, J. (2015). Concise formulae for the
cumulant matrices of a random vector. Linear Algebra Appl. 485,
392--416.
Terdik, Gy. (2021). Multivariate statistical methods - going beyond the
linear. Springer.