Friday, December 26, 2008

Analyse the Correlation Data

Introduction

Remember correlation research is looking for associations between variables. It has asked questions like "Are A and B related?". Your analysis will describe the strength of that relationship, and whether you could expect a relationship as strong as the one you have discovered if there was no relationship between the variables

Correlation does not show causation, although sometimes it is used as evidence for causation. In this case it is one part of a logical argument which uses related experimental data (say for animal studies) and basic principles to construct a strong theoretical causation. Smoking causing lung cancer is a classic example of where experimental studies on humans was impossible (in this case because it would be unethical to knowingly give someone cancer), so an argument had to be made on correlation studies (showing a consistent pattern over a large number of studies), animal studies where cancer was induced in animals in smoking conditions more than animals in non smoking conditions, and basic chemical studies and principles of pathophysiology.

Correlation analysis of metric data

metric, ordinal and nominal data can be analysed for correlation. However, as the process is similar only the analysis of metric data is presented here. Also, linear relationships must be considered.

Relationship between two variables

Sometimes there are only two variables of interest. In this case think of one variable as "x" and one as "y". The first step in the analysis is to look at the descriptive data analysis you have already performed, comparing the means and standard deviations.

Plotting a x,y scattergram gives a clear graphic picture of whether there appears to be a relationship between variables. Next a correlation analysis of the two data sets and view the results in either scattergram or table form.

Count (no.):.........Covariance (no.).......Correlation (no.)........R-squared (no.)

Finally, calculate whether one could expect a relationship that is strong in the sample data if there was no relationship between the two population data sets. There is chance of relationship being wrong if there is a relationship from a strong correlation in the sample. Use the following simple formulae for this:

t = r x (n-2 division) over
(1 - r to the power of 2) division

This involves using a statistical equation (test statistic) and looking at a normative curve (0.1) to reveal statistical significance. Test statistics becomes larger if the sample becomes larger and if correlation is high. With more subjects there is more likely to be a correct correlation)

Relationship between more than two variables

Often there is interest in looking at the relationship between larger number of variables. To do this you follow the same path of analysis as for two variables except that you need to think of the variables as x1, x2, x3,... You will also know that there is more than one relationship when you have more than two variables. In fact you have (x power of 2 - x)/2 relationships. you use a matrix to analyse, and report, the strength of the various relationships. After looking at the strength of the relationships you also need to look at the chance of getting a relationship thats strong if there was no relationship in the world. Also remember that if you look at the relationship between seven or more variables, one will be of statistical significance strength (using an alpha of 0.5) even is none of the relationships exist in the real world.

Advanced Analysis

If variables are strongly related it is common to do further analysis to describe the relationship more fully. This is done using
the coefficient of determination and a regression equation. Remember from the discussion on choosing a method, the coefficient of determination is the square of the correlation, r to the power of 2. We also discussed that this value represents the amount of variation on one varuable which can be predicted from the other variable. thus two variables which are moderatly realted, with a correlation of say r = 7, would ahve a coefficeint of determination of r to the power of 2 = 0.49. That is, if the value of one variable is known, it can predict nearly half the variability in the other vairable. This is often useful. In the health profession, knowing of relationships and making clinical decisions based on those relationships is something a clinician does informally every day.

No comments: