Measure of Location, Spread and Correlation for Multivariate Data
Much of the information contained in the data can be assessed by calculating certain summary numbers, known as descriptive statistics such as Arithmetic mean (measure of location), average of the squares of the distances of all of the numbers from the mean (variation/spread i.e. measure of spread or variation) etc.
We shall rely most heavily on descriptive statistics that is measure of location, variation and linear association.
Measure of Location
The arithmetic Average of n measurements $(x_{11}, x_{21}, x_{31},x_{41})$ on the first variable (defined in Multivariate Analysis: An Introduction) is
Sample Mean = $\bar{x}=\frac{1}{n} \sum _{j=1}^{n}x_{j1} \mbox{ where } j =1, 2,3,\cdots , n $
The sample mean for $n$ measurements on each of the p variables (there will be p sample means)
$\bar{x}_{k} =\frac{1}{n} \sum _{j=1}^{n}x_{jk} \mbox{ where } k = 1, 2, \cdots , p$
Measure of Spread
Measure of spread (variance) for n measurements on the first variable can be found as
$s_{1}^{2} =\frac{1}{n} \sum _{j=1}^{n}(x_{j1} -\bar{x}_{1} )^{2} $ where $\bar{x}_{1} $ is sample mean of the $x_{j}$’s for p variables.
Measure of spread (variance) for n measurements on all variable can be found as
$s_{k}^{2} =\frac{1}{n} \sum _{j=1}^{n}(x_{jk} -\bar{x}_{k} )^{2} \mbox{ where } k=1,2,\dots ,p \mbox{ and } j=1,2,\cdots ,p$
The Square Root of the sample variance is sample standard deviation i.e
$S_{l}^{2} =S_{kk} =\frac{1}{n} \sum _{j=1}^{n}(x_{jk} -\bar{x}_{k} )^{2} \mbox{ where } k=1,2,\cdots ,p$
Sample Covariance
Consider n pairs of measurement on each of Variable 1 and Variable 2
\[\left[\begin{array}{c} {x_{11} } \\ {x_{12} } \end{array}\right],\left[\begin{array}{c} {x_{21} } \\ {x_{22} } \end{array}\right],\cdots ,\left[\begin{array}{c} {x_{n1} } \\ {x_{n2} } \end{array}\right]\]
That is $x_{j1}$ and $x_{j2}$ are observed on the jth experimental item $(j=1,2,\cdots ,n)$. So a measure of linear association between the measurements of $V_1$ and $V_2$ is provided by the sample covariance
\[s_{12} =\frac{1}{n} \sum _{j=1}^{n}(x_{j1} -\bar{x}_{1} )(x_{j2} -\bar{x}_{2} )\]
(the average of product of the deviation from their respective means) therefore
$s_{ik} =\frac{1}{n} \sum _{j=1}^{n}(x_{ji} -\bar{x}_{i} )(x_{jk} -\bar{x}_{k} )$; i=1,2,..,p and k=1,2,\… ,p.
It measures the association between the kth variable.
Variance is the most commonly used measure of dispersion (variation) in the data and it is directly proportional to the amount of variation or information available in the data.
Sample Correlation Coefficient
The sample correlation coefficient for the ith and kth variable is
\[r_{ik} =\frac{s_{ik} }{\sqrt{s_{ii} } \sqrt{s_{kk} } } =\frac{\sum _{j=1}^{n}(x_{ji} -\bar{x}_{j} )(x_{jk} -\bar{x}_{k} ) }{\sqrt{\sum _{j=1}^{n}(x_{ji} -\bar{x}_{i} )^{2} } \sqrt{\sum _{j=1}^{n}(x_{jk} -\bar{x}_{k} )^{2} } } \]
$\mbox{ where } i=1,2,..,p \mbox{ and} k=1,2,\dots ,p$
Note that $r_{ik} =r_{ki} $ for all $i$ and $k$, and $r$ lies between -1 and +1. $r$ measures the strength of the linear association. If $r=0$ the lack of linear association between the components exists. The sign of $r$ indicates the direction of the association.
Download pdf file:
Incoming search terms:
- measures of location in statistics
- basic assumptions underlying measures of central tendency
- en wikipedia org/wiki/measures of central tendency in statistic
- measure of location in statistics
- Multivaraite