Most of the previous articles have focused on the application of statistical methods, especially the notion of probability and distributions. However, many chemometricians also like to think about data as points in multidimensional space, and to truly understand multivariate concepts, we need also to appreciate this way of thinking. A point can be represented by a set of numbers, usually arranged in a column, but sometime as a row. Some examples are in Figure 1. These numbers can be anything, and do not, for example, have to represent probabilities. For our purposes, they might represent the concentration of different trace elements in an ancient obsidian arrowhead or the concentration of ingredients in a pharmaceutical preparation. They can be visualized as corresponding to points in multidimensional space, where the numbers are the coordinates in this space. They can either be represented as a column of numbers where the number of rows represents the dimensionality of the space or a row. As an example, can be used to represent a point in a two-dimensional space or in a plane, as illustrated in Figure 2. Our human visual system is limited to perception in a maximum of three dimensions, so if there are more than three numbers, the space is an imagined one. The concept of linear independence of vectors is different to the statistical concept of independence. Many people have come across the concept of a vector from physics and mechanics textbooks, which has both a direction and a length. Most statisticians and mathematicians define a vector differently and use it to represent a point. These in fact correspond to two different but related definitions. A good article can be found online 1. The statistical definition of a vector is also called a position vector, which can be represented by a line from the origin to the point of interest. The vector is illustrated in Figure 2. In most of chemometrics, we use the term vector to refer to a position vector, which definition we will use in the succeeding text. Statisticians often use the concept of points and vectors interchangeably, unlike physicists. Linear algebra is a vast area, first formally described in a modern recognizable form by Hermann Grassmann in 1844 2. An excellent text, which is available online, is by Jim Hefferon 3 for the enthusiasts, and goes into significant detail. A major application is to vector operations. There are many ways of introducing the concept of linear independence—some are geometric, some involve simultaneous equations and some matrix algebra. However, understanding about whether vectors are linearly independent or not is a crucial concept in chemometrics. Each vector might represent, for example, the concentrations of compounds in a series of (chemical) samples or the spectral peak heights in a number of spectra. This concept differs from the statistical concept of independence, and we will see how it relates to other common properties of vectors in the next article.
Read full abstract