May 24, 2004
A Tool, A Puzzle, A Diversion - Part 1
M.J. McKeown, MD, FACOG, FACS
Back to the Table of Contents
In our childhood we learn lessons from our experiences and use them to guide our life. During this time and youth we also receive guidance from our elders. Some of this advice comes from adults and initially we all wonder if the word no is our name. As we get older we learn from personal experience and from our peers that the no we heard, as we were about to do something, could have motives behind it other than those of caring for and teaching us. We learned that the advice could have more complex motives and some of them were only designed to help the one giving the advice. However, to be fair, some of the reasons behind the advice were aimed at our best interests but the reasons were too complicated and technical for us to understand at the time. It is the meaning behind the advice and the information we receive that we seek. It is these meanings that we learn to apply to the decisions of our own life. As we grow and learn we gain the ability to evaluate more complex advice and information. The thread that runs through all the advice and information is, what does it mean and how do I use the information to guide my life?
The adult world has developed rules to evaluate and find the meaning and truth of the immense amounts of data needing to be analyzed to guide individuals, groups of individuals, states, nations and the world in total. These rules are labeled Statistical Analysis. This analysis system has grown into more and more complex mathematics with increasingly complex rules to guide its use. Whole fields of mathematics have been developed for use to answer the question we ask from childhood until death, why? However, the answer to that question needs to be trusted for us to use it.
This requires us to find a trusted meaning behind the answers. At times there seems to be more than one true and correct answer. Here is where the use of this Statistical Analysis becomes needed, real and trusted. This means that the analysis system and its rules need to produce an answer that can be defined as near to absolute truth as possible and one that we can trust to guide us in our life decisions.
- It all comes to the meaning and truth behind the advice that we have sought since listening to those early Nos. Lewis Carroll perhaps gave us the necessary dialogue in his verses on Humpty Dumpty.
- When I use a word, Humpty Dumpty said, in a rather scornful tone, it meansjust what I choose it to mean – neither more nor less.
The question is, said Alice, whether you can make words mean so many different things.
The question is, said Humpty Dumpty, which is to be master – that's all.
- Lewis Carroll was advising us to look behind the information to see where it came from, how it was produced, who produced it, and manipulation to produce a desired result.
Statistic analysis is an immense subject. There are a great deal of specialized mathematical procedures and a large library of specialized terms. When one reads an article whose authors have used some statistic mathematics they will usually mention the type of testing procedures they used. Terms such as mean, mode, analysis of variance, Chi Square Test, Student's T Test, etc, may be liberally sprinkled in the methods, and analysis section of a research paper. This discussion will not be an analysis of the reasons for or against the use of any particular statistic test. It will take a broader view and develop an approach to the data and its analysis that will allow a quick evaluation of a paper's statistical mathematics. This discussion will be oriented to biologic and biomedical investigations and an evaluation of how to determine the usefulness and possible risks of any specific clinical situation.
There will be informational and instructional appendices after the main discussion. These will include an excellent series of links from the British Journal of Medicine that discuss many of the most important principles of statistics in a medical context. They will also include a section that briefly outlines many of the major concepts in basic statistics. It will end with the inclusion of abstract data from several actual published studies. These abstracts will be highlighted and commented on as examples of studies one might evaluate when researching medical question.
Back to the Table of Contents
The wisdom of ancient philosophers is applicable today. Marcus Aurelius advised looking first to the purpose in a process. This wisdom, when joined with the remarks of Humpty Dumpty, which is to be master – that's all, becomes the first tool of evaluating a scientific study.
- what is the stated purpose of the study?
- what could the purpose be of those doing the study?
Studies will announce their purpose in the early paragraphs. The general format of most scientific studies is; Introduction and Purpose, Materials and Methods, Analysis and Discussion, and Conclusions and Recommendations.
The purpose of any study is what one needs to evaluate to see if the conclusions and results are applicable to one's personal problem. If the stated purpose of the study is to investigate a medical condition for which you are researching for information it is worthwhile to read further. For example, if you are looking for information on the diagnosis of breast cancer and the study says its' purpose is to investigate the accuracy of a particular diagnostic technique for breast cancer, then it is worthwhile to read on and try to evaluate the validity of the study.
The validity and the conclusions are a more difficult problem to evaluate and this is where statistical analysis is needed. It is necessary to read the publications sections on Materials and Methods, Analysis, and Conclusions to evaluate the validity of the study. It is not easy, at times, to find enough information to make this evaluation.
The applicability of the conclusions of the study to one's particular problem is another decision to make. In general if the problem you are researching is the same as those in the study being evaluated then the conclusions may be applicable. However you must be sure that your particular circumstances are equivalent to those of persons in the study.
The motives of the investigators doing the study are much more difficult to determine. However if one is reading a study on the efficacy of a new drug or treatment and the study is being funded by the manufacturer then it is possible that some bias may creep in to favor the use of the manufacturer's product.
Another common hidden purpose is the publication of incomplete results to support the need for further study which, of course, will be done by those publishing the study that says more study is needed.
Back to the Table of Contents
It was stated in the introduction that this is not a detailed discussion on how to use statistics, nor a textbook presentation of statistics.
However there are certain concepts in statistics that will now be discussed in enough detail to begin your understanding of them and how to evaluate their use.
The first one of these is the power of a statistic. This is most easily translated as,did the study have a large enough sample of what was being studied to make the conclusions of the study strong enough that they should be believed. A study will always have a stated purpose. It will be looking to demonstrate some observable effect of some manipulation of the study population. To evaluate the power of the conclusions reached in the study the first basic to look for is seeing if the number of the results is big enough to have the meaning the study says they have. This is an extensive subject to discuss and most published studies do not discuss it. The effective sample size needed relates directly to the smallest worthwhile change that can be observed and analyzed. For example, if one were studying the amount of change in a subject after some treatment and if the possible change was very small then perhaps hundreds or thousands of subjects would need to be studied. Most studies will not discuss this and just expect that the readers will trust the study to have done the analysis correctly. A very long discussion could be given here on the details of evaluating the power. However the most basic thing to look for is to see if the numbers of the things studied is large enough. If an event to be studied is thought to have only a small observable effect then the numbers studied must be large to give the observed results adequate meaning. Most studies of biologic effects require large numbers studied to have meaning. If the study being evaluated has less than one hundred effects evaluated it is generally not large enough. A good statistician can mathematically evaluate the data and likely find some meaning in a study of any size. However it is unlikely the published study will discuss this in enough detail to allow the reader to evaluate the true validity of the results. The Notes on Statistics on the internet by the British Medical Journal (excerpts and links appended) has a great general statement valid for guidance and it is, if the study says it uses statistical methods that are not found in a common book on statistics then distrust it.
Rule number one: Are the numbers studied large enough to likely be meaningful?
The second basic area of evaluation is the reasonableness of the study. Many studies evaluate events that are so detailed that they are applicable to only a very specific area of the subject being studied and they are simply not able to be evaluated easily without very special knowledge. It is unlikely that reading a study about the specific molecular biologic effects of a process will be easily evaluated unless the reader is very familiar with the detailed specifics in the area being reported on. It may be wise to skip an evaluation of such studies unless they apply in a very general way to an answer to the question being researched. However there are some simple concepts to be evaluated that allow a decision on the reasonableness of a study. The hypothesis to be tested must be adequately defined and if it is not it is not testable. The what that is being studied should be discussed in enough detail that one can judge whether or not the most likely events that could interfere with the validity of any conclusions have been evaluated. The statistic analyses used must then deal with any possible confounding events. A common possible mistake is to pose a hypothesis whose testing is circular. This is where the event being studied is itself used as an explanation for the occurrence of the event. An example of such circular reasoning is seen in the following statements: I am not lying, Since I am not lying, I must be telling the truth.
Rule number two: Does it seem reasonable that the study is designed such that it is able to answer the questions asked?
The third basic area of evaluation stems from a consideration of the first two rules. This relates to the ability of the study to really demonstrate a difference. The concept of noise is useful in evaluating this area of the study. If the study is evaluating a single measurement then the amount that measurement is likely to differ from the large universe of all measurements of that subject is critical in how well that measurement will be seen in relation to all the events being studied. If the event being studied is only a little different from the average of the universe of the events being studied then it will be hard to see and evaluate with statistical validity. If the study is attempting to see if a particular treatment affects the long term survival of patients with a certain disease and if 70% of people with the disease survive without treatment then the study is going to have to evaluate a large number of people with the disease to see if there was any effect with the treatment being studied.
If the study is evaluating the change between two measurements then the concept of the typical error must be considered. This typical error is the statistical measure of the noise in the events being studied that might affect the measurement.
Noise in the statistical sense is similar to the noise of all the voices in a crowd when one is trying to listen to just one voice. In biologic studies there are two main areas of noise. First, there is noise in the fact that one biologic entity has many areas of difference in comparison to another. To make the study as valid as possible the biologic entities studied must be as similar as possible and vary only in the parameter studied. For a human, this means age, sex, body weight, and any other parameters not being studied must be as near the same as possible. Second, there is the possible noise that occurs because of inaccuracies in the measurement techniques used. If an instrument can only measure a value to two decimal places then attempting to look for changes to more decimal places will be lost in the noise of the inaccuracies that occur in the third or greater decimal place.
For medical clinical studies this is ability of the smallest clinically important change (the signal being analyzed) to stand out enough from the noise. A good example of this would be if the results of a particular test can vary from 10.2 to 18.3 in patients without any treatment and the study is trying to see if that can be changed to a normal range of 10.2 to 16 then the effect being looked for is the change from 18.3 to 16.0. If the allowable error of the test is plus or minus1.8 then this means that much of any possible change may not be seen since a result of 17.8 could conceivably be reported as no change. Some points of interest to remember in evaluating the ability of the study to find a meaningful difference.
- it is good if the noise is less than the smallest important change.
- if the noise is larger than the smallest important change find a new test.
- if the noiseis near the smallest important change need very careful analysis.
If a study is looking for a clinically important change in relation to a reference value then greatest likely validity occurs if the change being studied is much greater than the reference value. A clinical study has two major areas contributing to this typical error and these are the variation from subject to subject and the variation in the technical details of the analysis method. This means there can be errors in attempting to get all the subjects of a study to be so similar they are almost identical. The unresolved differences between subjects must be discussed and analyzed in a good statistical analysis of the problem being considered. This also means that the technical methods of obtaining samples to be studied or the technical methods of the sample analysis itself can have errors. Once again, in a good statistical analysis, the study will discuss and have methods to reduce the possible effects of any errors in technique.
Most studies published do not report on these details such that the reader can evaluate how well the study deals with these possible problems. However the reader must attempt to make this evaluation. It may be that an evaluation of several studies must be done to get enough information to analyze this.
Rule number three: Is it reasonable to believe the effect being studied stands out enough to make the conclusions valid?
The last major concept one needs to evaluate is does this study apply to me and my problem? This is the most critical question when attempting to decide whether a particular test or proposed procedure applies to ones' own condition. This is a very hard question to answer when looking at published studies or published discussions of the problem. However some general guidelines can be found in considered application of the first three rules.
The predictive value of a test in clinical use depends critically on the prevalence of the abnormality in the patients being tested. This prevalence means how much of the problem exists in the population being studied. This prevalence may very well differ from the prevalence in a published study evaluating the usefulness of the test. This leads to a discussion of the concepts of sensitivity and specificity.
- Sensitivity: the proportion of true positive results from the test
- Specificity: the proportion of true negative results from the test
The sensitivity means how likely a test is to indicate the problem actually exists at the level the test says it does. To relate this back to the concept of noise a very sensitive test can find the proverbial needle in a haystack.
However excessive sensitivity can lead to a problem with specificity. This means that the test should not find great numbers of falsely positive results. In the needle in the haystack problem a test with poor specificity would find lots of nuts and bolts and count them as needles.
However it is important to know that knowledge of the sensitivity and specificity of any test will not give the probability that it will give a correct diagnosis. The positive predictive value is the proportion of patients with a positive result who are correctly diagnosed. The negative predictive value is the proportion of people with a negative result who are correctly diagnosed. This means that the test is correct when it says a person has a disease or it says a person does not have a disease.
It is very important to remember that the predictive value observed in one study does not necessarily apply universally.
In many instances an evaluation of the applicability of a particular study to an individual requires a thorough knowledge of the subject being evaluated. It is unlikely that a person without adequate background knowledge of heart disease will be able to evaluate a study that discusses details of platelet aggregation. It is unlikely that a person without adequate background knowledge will be able to evaluate a study that discusses positron effects on a particular atom.
To get to rule number four which comes from an evaluation of sensitivity, specificity, predictive value and applicability the concepts of generalization and extrapolation need to be considered. It may be very difficult to evaluate these concepts in any one particular case if the person doing the evaluation does not have certain basic knowledge as mentioned above. However certain things can be looked for; characteristics of the individuals studied, state of the disease studied, and route of the treatment studied. This is to help decide whether or not the individual in question may be so different from the individuals in the study that the results cannot be used in any treatment decision. The person doing the research needs to ask three questions to evaluate these factors. First, am I the same type of person covered in this study? Second, is my disease at the same state of that of the persons in the study? Third, is the route and type of the treatment of the persons in the study the same as that being considered for me? These concepts lead to rule number four.
Rule number four: Does this study apply to me?
This rule is perhaps the most important one to a particular individual. It is also the one that is most likely to require a consultation with a knowledgeable person in the area of the ideas being evaluated. In the realm of healthcare the skilled, experienced clinician will be better able to evaluate the true validity and applicability of the concept in question. The person searching for information regarding a medical condition will get the best answers to their questions by having information research skills to find studies that seem to apply to them and then have a skilled practitioner to take the results to for a final interpretation. In this era of immense knowledge base access, the skilled and experienced person is still valuable for these interpretations.
Back to the Table of Contents
Statistics are often used to make a point that is not valid. It is always important to evaluate the purpose and motive behind any study results or recommendation.
Living in these times of information immensity exposes one to the pointed messages of the advertising industry. It is not unusual now to have a particular healthcare treatment or medication advertised with apparent professional validity and applicability and given the final cachet of the reputation of a large company. It is also not unusual to have studies favorable to a particular treatment or medication given to a healthcare professional by those involved in the marketing of the concept. In both of these instances, those that produced these information products likely present a somewhat manipulated study. In the first instance of the advertising message, a careful observer will find, there is never any information presented that allows a valid statistical analysis of the concepts being sold. In the second instance, the studies presented to a busy healthcare practitioner are likely funded by the manufacturer of the product or technique and if studied carefully do not have the power of a good statistical analysis to support the particular technique or product.
The healthcare industry is an immense and profitable business. The true advances in the sciences of healthcare are incredible. However, when evaluating the validity and applicability of any particular concept one must always be aware that some person or company profits from the use of this concept. It is just a fact that developing advances in science cost money and that somehow these costs must be recovered. However, it is easy for those funding the concepts to manipulate the results of a study to support the concept. It may be very difficult to discern this even for a professional skilled in the area of the investigation. It is never to be thought that a supportive statement for a particular medication or treatment is an outright lie. However, the underlying statistics may be manipulated, glossed over or not mentioned at all.
This leads to a comprehensive general rule.
Rule number five: Can I believe in the conclusions presented?
This is the most difficult question to answer and not one that will have an extensive discussion here. Once this entire presentation has been studied and a significant degree of statistical knowledge has been learned, the reader should at least have the ability to pick out information that is most likely to apply to their personal question.They can then take that information to a skilled professional for review and commentary.
It is important to remember that skepticism is good!
Back to the Table of Contents
The basic text of this discussion will conclude with a short anecdote I used with patients when discussing any particular medication, test or procedure.
- This is as follows:
- If one has a hundred doctors in a room and asks them a particular question then one should be aware that:
If all of the doctors agree then they are most likely 80% correct.
(remember this is as good as it gets!)
If two well defended opinions, each with adequate basic science and clinical research support are found, each with apparently valid and applicable results then each can only be 50% correct.
If three or more well defended opinions are found then this means that no one really knows and in any particular case one just makes a choice.
Now that anecdote is a flagrant generalization but it rests on 40 years of clinical experience. It is very, very unusual to find a medication, test or treatment that is so specific that it is the only recognized and accepted medication, test or treatment. The basis of 40 years of clinical experience also gives the experience of treating conditions not previously known to exist and the experience of stopping a treatment that one was told was the treatment when it was first introduced. This leads directly to the concept that 80% correctness at any given moment in time is as good as it gets.
There is also the large area of alternative treatments that covers well studied and defensible herbal therapies or well studied practices such as acupuncture. These alternative treatments are only beginning to be studied in well designed scientific fashion. If one wishes to evaluate any given alternative treatment then the same basic statistical principles apply to give it validity and applicability. If well designed studies can not be found for any given treatment or technique then it is really just a matter of faith and hopefully finding a skilled and honest practitioner of the treatment or technique to advise one.