In case of compositional data these computations need to be done in orthonormal coordinates, preferably either in balances or pivot coordinate systems.
The latter are closely related to clr coefficients that are historically preferred in case of principal component analysis. When the effect of outliers in any given orthonormal coordinate representation of the compositions needs to be suppressed, a robust covariance estimation can be used to get robust loadings and scores. Loadings and scores of the first two principal components are often visualized together using a planar graph called biplot that has a specific interpretation in case of clr coefficients.
The goal of correlation analysis is to quantify the strength of the relationship between a pair of variables or between groups of variables.
In case of compositional data, it might be particularly misleading to compute correlation coefficients for the original data: due to scale invariance of the compositions, any correlation values could be obtained, depending on the representation of the compositional data in the respective equivalence classes. Therefore, a proper coordinate representation of compositions is again a must. A default setting are balance coordinates, for which either the standard Pearson correlation coefficient as a measure of strength of the linear association between the two balances or robust correlations can be computed.
Modeling and Analysis of Compositional Data presents a practical and comprehensive introduction to the analysis of compositional data along with numerous. Modeling and Analysis of Compositional Data presents a practical and comprehensive introduction to the analysis of compositional data along.
If an interpretation of the correlations in terms of a pair of parts is required, symmetric pivot coordinates capturing the dominance of these parts within the given composition are recommendable. Correlation analysis between two coordinates can also be extended to correlations between one coordinate and a set of coordinates, or to group correlations summarizing the relationships between balance representations of groups of compositional parts.
In the setting of discriminant analysis it is assumed that the so-called training data belong to certain groups. The goal is to find classification rules that allow to assign new test data to one of the groups. Both LDA and QDA utilize the information on prior class probabilities and heavily use the assumption of normality in ilr coordinates normal distribution on the simplex to represent group distributions. While for QDA individual group covariance matrices are assumed, a joint covariance matrix is computed for the case of LDA. These methods result in classification rules that allow to assign a new test set observation to one of the groups by taking the prior information on class pertinence into account.
The Fisher discriminant rule aims for the same goal, but now no underlying distributions of the samples in the groups are assumed and the idea is to search for projection directions which allow for a maximum separation of the group means with respect to the spread of the projected data. As a consequence, also discriminant scores can be derived that are used to visualize relevant information for the group separation.
All described procedures are invariant with respect to the choice of the orthonormal coordinates, and this also holds for the robust counterparts of the covariance-based methods if an affine equivariant location and covariance estimator like the MCD estimator is taken. Regression analysis is used to model the relationship between a response variable and one or more explanatory variables covariates. In the compositional case, the proper choice of logratio coordinates matters, both due to the interpretation of the regression parameters and because of the properties of the regression models.
And again, orthonormal coordinates, particularly in their pivot version, are preferable. Moreover, in case of regression with compositional response and real covariates, ilr coordinates enable to decompose the multivariate regression model into single multiple regressions. The coordinate representation of compositions is essential also for statistical inference like hypotheses testing, which is frequently of interest in the regression context. In this chapter, all basic regression cases are contained: the mentioned regression with compositional response and real covariates, the case of real response and compositional explanatory variables, regression between two compositions, and finally also regression between the parts within one composition.
A further important task is considered: variable selection of relevant covariates by forward and backward selection. Robustness issues are also of particular importance in the regression context—outliers in the response or in the covariates will have limited effect for robust regression estimates. With increasing dimensionality of compositional data much more care needs to be devoted to a reasonable coordinate representation and selection of methods to be used for their statistical processing.
In principle, all methods that are popular in the context of high-dimensional data, like principal component analysis and partial least squares regression, can also be used for compositional data with far more parts than observations. On the other hand, while pivot coordinates are still useful in terms of interpretation also in the high-dimensional context, this is not so clear for other types of balances: defining an interpretable sequential binary partition for compositions with hundreds or thousands of parts, where many of them may just be related to noise, is nearly impossible.
Accordingly, it is meaningful here to consider even the elemental information, contained in pairwise logratios, to build up a relevant method for marker identification or for the detection of cell-wise outliers. The latter one can be used to reveal which observations are deviating from the majority in order to identify possible measurement errors or other artifacts. Moreover, it may be possible with these methods to identify parts or groups of parts that show a different behavior in all or in subsets of the observations. Contingency and probability tables are well described in the literature, but the compositional nature of such tables is often not considered.
We discuss compositional tables, a generalization of contingency tables that allow also for continuous values in the cells under the requirement of scale invariance. Compositional tables carry relative information about the relationships within and between row and column categories of the variables factors. The assumption of the Aitchison geometry enables to decompose a compositional table orthogonally into independent and interactive parts. The independence table is formed by a product of row and column geometric marginals and can be considered as a relevant alternative to the independence case in a probability table.
Consequently, the interaction table captures relative information about the relationships between factors.
It turns out that for a coordinate representation of compositional tables the sequential binary partitioning is in general not appropriate as it does not respect the two-factor nature of compositional tables. The general case of compositional tables reveals that balance coordinates are recommendable just for the representation of the independence table.
The following vector space structure is called Aitchison geometry or the Aitchison simplex and has the following operations:. Under these operations alone, it is sufficient to show that the Aitchison simplex forms a Euclidean vector space. Since the Aitchison simplex forms a finite Hilbert space, it is possible to construct orthonormal bases in the simplex.
Every composition can be decomposed as follows.
Analyzing compositional data with R, Springer, Heidelberg, When the effect of outliers in any given orthonormal coordinate representation of the compositions needs to be suppressed, a robust covariance estimation can be used to get robust loadings and scores. Visiting lecturer at University of Lleida UdL. Address: Dept. As a consequence, the use of standard statistical methods for the analysis of compositional data that obey specific geometrical properties leads inevitably to biased results. New Password.
There are three well-characterized isomorphisms that transform from the Aitchison simplex to real space. All of these transforms satisfy linearity and as given below. This is given by. The choice of denominator component is arbitrary, and could be any specified component. This transform is commonly used in chemistry with measurements such as pH.
In addition, this is the transform most commonly used for multinomial logistic regression. The alr transform is not an isometry, meaning that distances on transformed values will not be equivalent to distances on the original compositions in the simplex. Michael J.
Franco Taroni. Gusti Ngurah Agung. Anthony O'Hagan. Colin Aitken. Sofia Dias. Sandra Eldridge. Ron S. Geert Molenberghs. Richard Webster. Vera Pawlowsky-Glahn. Home Contact us Help Free delivery worldwide. Free delivery worldwide. Bestselling Series. Harry Potter. Popular Features. New Releases. Modeling and Analysis of Compositional Data.