|
C. Reliability and Validity
In order for assessments to be sound, they must be free of bias
and distortion. Reliability and validity are two concepts that are
important for defining and measuring bias and distortion.
Reliability refers to the extent to which assessments
are consistent. Just as we enjoy having reliable cars (cars that
start every time we need them), we strive to have reliable, consistent
instruments to measure student achievement. Another way to think
of reliability is to imagine a kitchen scale. If you weigh five
pounds of potatoes in the morning, and the scale is reliable,
the same scale should register five pounds for the potatoes an
hour later (unless, of course, you peeled and cooked them). Likewise,
instruments such as classroom tests
and national standardized exams should be reliable – it should
not make any difference whether a student takes the assessment
in the morning or afternoon; one day or the next.
Another measure of reliability is the internal consistency of the
items. For example, if you create a quiz to measure students’
ability to solve quadratic equations, you should be able to assume
that if a student gets an item correct, he or she will also get
other, similar items correct. The following table outlines
three common reliability measures.
|
Type of Reliability |
How to Measure |
Stability or Test-Retest |
Give the same assessment twice, separated by days, weeks,
or months. Reliability is stated as the correlation between
scores at Time 1 and Time 2. |
Alternate Form |
Create two forms of the same test (vary the items slightly).
Reliability is stated as correlation between scores of Test
1 and Test 2. |
Internal Consistency (Alpha, a) |
Compare one half of the test to the other half. Or,
use methods such as Kuder-Richardson Formula 20 (KR20) or Cronbach's
Alpha. |
|
|
The values for reliability coefficients range from
0 to 1.0. A coefficient of 0 means no reliability and 1.0 means
perfect reliability. Since all tests have some error, reliability
coefficients never reach 1.0. Generally, if the reliability of a
standardized test is above .80, it is said to have very good reliability;
if it is below .50, it would not be considered a very reliable test.
Validity refers to the accuracy of an assessment -- whether
or not it measures what it is supposed to measure. Even if a test
is reliable, it may not provide a valid measure. Let’s
imagine a bathroom scale that consistently tells you that you weigh
130 pounds. The reliability (consistency) of this scale is very
good, but it is not accurate (valid) because you actually weigh
145 pounds (perhaps you re-set the scale in a weak moment)! Since
teachers, parents, and school districts make decisions about students
based on assessments (such as grades, promotions, and graduation),
the validity inferred from the assessments is essential -- even
more crucial than the reliability. Also, if a test is valid, it
is almost always reliable.
There are three ways in which validity can be measured. In order
to have confidence that a test is valid (and therefore the inferences
we make based on the test scores are valid), all three kinds of
validity evidence should be considered.
Type of Validity |
Definition |
Example/Non-Example |
Content |
The extent to which the content of the test matches the instructional
objectives. |
A semester or quarter exam that only
includes content covered during the last six weeks is not a
valid measure of the course's overall objectives -- it
has very low content validity. |
Criterion |
The extent to which scores on the test are in agreement with
(concurrent validity) or predict (predictive validity) an external
criterion. |
If the end-of-year math tests in 4th grade correlate highly
with the statewide math tests, they would have high concurrent
validity. |
Construct |
The extent to which an assessment corresponds to other variables,
as predicted by some rationale or theory. |
If you can correctly hypothesize that ESOL students will perform
differently on a reading test than English-speaking students
(because of theory), the assessment may have construct validity.
|
So, does all this talk about validity and reliability mean you
need to conduct statistical analyses on your classroom quizzes?
No, it doesn't. (Although you may, on occasion, want to ask
one of your peers to verify the content validity of your major
assessments.) However, you should be aware of the basic tenets of
validity and reliability as you construct your classroom assessments,
and you should be able to help parents interpret scores for the
standardized exams.
|
|