Chapter+Four+-+Issues+in+Data+Collection

= = =Collecting Data=

The purpose of this chapter is to focus on the various facets of collecting data. Tests, measures, questionnaires, interview protocols, observational methods which are obtrusive along with methods that are not obtrusive, as well as the validity and reliability of these approaches will be addressed. In addition, the relationship of the data collection method and research method will be noted, since all data collection methods are not appropriate for all research designs. Finally, issues related to the data collectors, types of collection methods, and recording of data will also be discussed.

I. Quantitative Data
Quantitative data is usually collected via tests and measures (or surveys) which are administered using different methods. Quantitative designs are based on numerical data that is collected through a variety of methods, and generally seek to test a researcher’s hypothesis about the effect of an intervention, the relationship between variables, or differences between individuals or groups. This design is based on numerical variables which can be indicated in the form of percentages or fractions. The dependent variable(s) of a study is (are) usually categorical.

**Reliability**
Reliability questions the "ability of an instrument to measure the construct consistently" (Beebe, ppts.). There are a few ways that researchers can determine if their instruments' scores are reliable. One way is that researchers can give the same instrument/survey twice with two to six weeks in between. Participants should have answers that are consistent. If they do not, the scores from that instrument are not reliable. This method takes time. Instead of giving the same instrument twice, researchers could give two similar forms of instruments in the same session. These instruments should measure the same thing, they should have the same number of questions presented in the same format, and they should be graded in the same way. For example, if a researcher wants to find out the attitudes students have towards math and/or reading (two projects being done in this class), the researcher would give the students one survey form and then the alternate form. If the instruments' scores are reliable, the students should identify the same type of attitude on each form. Finally, researchers could compare the first part of the form to the second part of the form (internal consistency) to see if the answers on the form remain consistent/reliable. All three of these methods should correlate with a reliability coefficient of .80 or higher (Beebe, ppts.). This increases the likelihood of the data also being valid.

Reliability is such an important component to research and should be treated seriously. Reliability relates to the level of the effect of measurement error (Beebe ppts). If the data that was collected is not reliable or has a high level then it will have adverse effects on the ability to use your results. The higher the reliability the lower the measurement error. The lower your reliability the higher your measurement error. If you measured a statue using a meter stick and you measure that same statue using centimeters you should come up with an equivalent measurement. This means that your measurements are reliable. Unfortunately, most questions that are being investigated cannot be measured using a ruler or meter stick so there are some ways you can ensure a higher level of reliability. You can re-test the participants with the same form at a later date, which may take up a considerable amount of time depending on your sample size. Also, you could give a similar questionnaire to look for similar results. Finally, you can have questions that relate to each other within the questionnaires called internal consistency (Beebe ppts).. For example, if you were taking a personality test and the questionnaire asked you four questions about being an optimistic person, the answers to those questions should either be the same or very similar in order for the data to be reliable.

Furthermore, full disclosure of your methods would be beneficial to include so it does not leave hanging questions. If you retested or did an internal consistency test it should be noted. Also, if you did not conduct a reliability test, that should be noted as well as any possible limitations.

According to Suter (2006), "Content validity is the extent to which a test reflects the domain of content that it presumably samples" (p. 250). More specifically, it consists of face validity, sampling validity, and item validity. Face validity corresponds to the format of the study in regards to construction and audience. Face validity is one of the least reliable forms of content validity. A test is considered to have face validity if it appears like it will measure what it is intended to measure. Item validity Item validity depicts the items' relevance to the construct. Sampling validity describes the extent to which items of a study adequately represent the construct.
 * Content Validity**

Content validity is expert-based, and involves no statistical evidence.

Content validity is of greatest concern for the researchers who study achievement. For example, in the Stanford Achievement Test, educators wanted to assess what students knew and what they were able to do. With each step, specialists and content area experts reviewed blueprints for finding the breadth and depth of tested objectives. This helped to strengthen the test to assure that it contained a representative and balanced coverage. If the test was not carefully constructed, the content validity would have been lowered. "The content validity of tests is important because without it, one would not know whether low achievement test scores were the result of learning deficits or learning-testing mismatches" (Suter, 250).

Discriminant validity is a way to establish criterion validity. This helps to determine differences between two individuals or groups. Discriminant validity is validity that compares two things that are different. The measures of the study help to further “discriminate” between them.
 * Discriminant validity**

Further, discriminant validity is the degree to which a test’s results differ from the results of another test that is not designed to assess the same material. In order to establish discriminant validity, we must confirm that the measures we intended to be unconnected really do not have any relation. If an unwanted relationship occurs, the reliability of our study will be reduced. The relationship between the measures from different tests should be very low. For example, if discriminant validity is low, scores on a test designed to assess aggressiveness should not be consistant with scores from tests designed to assess intelligence (example provided by alleydog.com, a psychology glossary).

Predictive validity is "the extent to which test scores accurately predict an outcome" (Suter, 2006). Predictive validity is accomplished through tests in which the scores accurately predict a future criterion. Most often psychological and educational tests are intended to measure and predict a future outcome (Suter, 2006). Various test scores, such as those from the SAT and other standardized tests, can predict to what extent a person will succeed academically. Predictive validity can also be used in targeting high risk students that may be bullies, may be at risk for dropping out of school, or predict other various academic deficiencies that may develop. This type of validity can ultimately be useful in setting up appropriate interventions to decrease the chances of negative outcomes in children, adolescents, and adults. My research on school retention policies can incorporate predictive validity. "Predicting which one of several methods of instruction is linked to the greatest probability of success is another application of predictive validity" (Suter, 2006). Thus, when researching what intervention practices schools utilize within their retention policies, I may be able to predict the practices that are most effective as identified by students who have been retained.
 * Predictive Validity**

Construct validity refers to whether or not a study’s system of measurement accurately measures what it claims to. The construct of the study and the specific measuring device of the study must agree with one another in order for there to be construct validity. Construct validity analyzes the validity of the research study’s system of measurement. For example, if researchers were focusing their study on the play skills of preschoolers with special needs, they would need to make certain that their measurement tool was accurately measuring appropriate "play" throughout the study sample.
 * Construct Validity: Validation as a Process**

Consequential validity looks at the possible negative effects on the administration and participants of the instrument. It asks what impact is made on the validity of the data collected. You're looking at the effects of administering the instrument as with standardized tests. If everyone is teaching to the test, how valid will the test be? The effect on participants would be the method and traits used in the test. Is the test all one format, such as multiple choice or fill in the blank? How is the test administered? Using one method, such as a pencil and paper test or observation, might change the outcome. Another aspect to look at is the inappropriateness of the test. Does it make accommodations for differences in language and abilities? With this type of validity you're looking for the possible negative effects of what's being done (Beebe, PowerPoint 2.10).
 * Consequential Validity**

II. Qualitative Data
Qualitative Data is collected through interviews, field notes, observational records, case studies, archival data, and media (PowerPoint 2.7). Qualitative data attempts to tell a story about the setting in which the data was collected, the characters who provide the information that is used as data and a plot which describes the social interaction of the characters. Most of the data gathered in qualitative studies comes from observations and interviews. In general, qualitative research designs using grounded theory involve: a focus or topic that is descriptive of some process or interaction; a design that emerges as the data is collected and analyzed; purposive sampling and generally small samples (in comparison to most quantitative designs); the participant’s context; and a thick description or narrative of the results (i.e. “the story”) regarding the process. This design involves categorical variables, which may include gender or race. The independent variable(s) of a study is (are) usually categorical. "Qualitative data in education are often more complex than quantitative data. The proper analysis of such data is usually challenging and time consuming and often requires creative talents." (Suter, 44).

A. Interviews
Used in both qualitative and quantitative studies, interviews when conducted properly, can be a useful tool for the educational researcher. Key informant, survey, and focus group interviews are all common styles of interview. A key informant interview can be conducted with either an official of the subject at hand that has the same idea as your hypothesis or an official who comes from a different perspective on the subject matter. When conducting a survey interview a follow up is a good way to understand if your participants understood the questions and to inquire about further thought. Focus groups are also productive because they allow for input from individuals as well as a group (Beebe PowerPoints). These interview may take place via telephone, mail, internet, or face-to-face. What ever interview method you choose you should be aware of the issues and bias that are associated with gaining this qualitative data. Issues such as response rate, incomplete or missing responses, and the environment of the interview setting may deter the accuracy of the collected data.

Also called “vocal questionnaires”, interviews should be well planned for with thoughtful and unbiased questions. The questions should also be planned in an open-ended fashion so that one can extract as much information from the interviewee as possible. A slight change of wording and/or nonverbal cues can affect the question and answer session, which is something in which the interviewer must be aware of (Suter, 2006).

B. Observations
When conducting observations, it is important to define what is going to be observed to ensure accuracy and proper judgment of data collection. It is also effective to have adequate baselines that identify what the standard behavior is during the time of the observation. Consideration can be given to the fact that the presence of an observer may influence the behavior within the group of focus. Referring to a video tape of a previous observation may be beneficial in providing clarification and bringing overseen instances to light (PowerPoint 2.7). It is also very important to provide a comparison of what was set out to be observed and what was actually observed during the study. The observation methods must be in alignment for the information to be deemed viable.

A. Subject Characteristics
Suter refers to the //Hawthorne effect// as a troubling influence on research participant behavior. The effect is a "bias that influences research participants' behavior stemming from a treatment's unintended effects related to special attention, novelty, or similar treatment" (p. 169). The researcher is not sure whether a change in behavior is the result of treatment effect or the workings of the Hawthorne effect. Suter uses the example of a research study using computers to teach science. The treatment classroom is filled with computers, people come in to observe the students' interactions, and a local news crew arrives for a story. The attention paid to the classroom is truly special and after two weeks, achievement levels are up (compared to a "chalk and talk" method). The data, however, are suspect; did the use of computers influence student achievement, or was it the novelty of being in the spotlight that increased student achievement? A control procedure to reduce the Hawthorne effect is based on the medical //placebo effect//: a control condition that preserves the illusion of participants' receivng treatment (p. 160). In this case, the "chalk and talk" control class would receive computers and attention, but the class would not use the computers to increase science achievement until the treatment class finished with the research study.

B. Data Collector Characteristics
Suter refers to the //expectancy effect// as the "granddaddy of all troubling effects in educational research (p. 166). This effect describes the tendency of researchers to bring about the finding they are expecting. "Experimenter expectancy is a serious problem because in its absence the same findings may not occur." (Suter, 166). A control for the expectancy effect is called //blinding//. This control procedure reduces data collector bias by ensuring the data collectors (or in the case of the Hawthorne effect research participants) do not have information that distorts perceptions or influences behavior. Data collectors are kept "in the dark" with regard to information such as which group a particular subject has been assigned (e.g. treatment or control). The person who is responsible for collecting the data can have a profound effect on the validity of an instrument. People are biased and it can be difficult to keep those biases out of the study. Identical responses to the same question made by two people can be interpreted very differently. Language, age, gender, and cultural differences can also come into play. If an older person is collecting data from a written questionnaire, answered by a nineteen year old male, the comment “I think skydiving is sick” may be interpreted by in a way that says the male does not enjoy doing this. The older woman may not be aware that the term “sick” is often used by young people to mean ‘great’ or ‘amazing’. Another problem may be the way the test results are scored. A data collector may unconsciously score tests to support his/her theory.

C. Additional Concerns
Extraneous events are "outside influences that occur between a pre-test and post-test in addition to the treatment" (Suter, 172). Suter gives an example on p. 172. The treatment in this example is a workshop that is designed to help students score better on the SAT exam. The students took the test, signed up for the workshop, then took the test again. The scores went up, and the workshop designers would like to believe the increase was the result of the workshop. However, extraneous events may have been at work. Students could have studied more, gotten outside help besides the workshop, been sick the first time they took the test, and the practice and experience the students gained from taking the test the first time could have influenced the results of the second exam. For most research that occurs over a long period of time, extraneous events could be a factor in the results. Researchers need to be aware of possible interfering extraneous events as they make their conclusions.
 * Extraneous events:**

A threat to internal validity may include changes in the measuring device or measuring procedures between a pretest and post test. It also refers to a process of gathering data with the use of measuring tools such as tests or surveys (Suter, 173). Instrumentation is a bias stemming from the process of measurement in the research setting (Suter, 2006). This could be the measuring device itself if it is not the same from test to test (think of your scale at home sometimes you get on and you weigh **//X//** number of pounds, step off for a second and get back on and you're //Y// number of pounds), or the how subjects interpret items of the test (Example from Suter: “I like gay parties” would be interpreted differently today than 80 years ago). Another type is //testing;// taking one test can influence a person on the next text. An example would be taking a SAT pretest and then the real SAT. Is the score on the real SAT related to the practice pretest? One more type is //pretest sensitization//; a treatment only works because the pretest had such an affect on the subjects. "Sometimes the experience of taking a pretest creates an effect by itself, one that might magnify (or wash out) the treatment effect." (Suter, 173).
 * Instrumentation:**
 * "Just as the strength of a chain is measured by its weakest link, the value of a research study is often compromised by a weak step in the research process" (Suter, 230). Instrumentation is one of these weak processes. "Meaningful research questions, even with strong sampling designs, can be rendered pointless if the researchers' measures, or instruments, are not sound." (Suter, 230.)**

Another bias that threatens research is mortality. Mortality, also known as attrition or loss of subjects, occurs when research subjects drop out of the study between the pretest and post-test. Two major reasons for mortality in research include relocation and sickness, but there are many others. Having a small amount of participants leave the study because of personal reasons generally does not cause a major problem. However, major problems are a result of systematic loss of subjects. Systematic loss of participants occurs when people drop out of a study because of an influence to the treatment. An example of this could be participants in a study about a new drug for cancer dropping out of the study because of side effects resulting from the drug. Another example given by Suter, page 174, involves a workshop for the SAT. Mortality would be a problem if the lowest scoring 20% dropped out of the workshop for reasons such as embarrassment, feelings of hopelessness, etc. This presents a problem because the post-test scores would be higher simply because the lowest 20% were no longer present. The difference in pretest and post-test scores would be higher even if the workshop had no effect on the scores because the lowest 20% is no longer present.
 * Mortality:**

Regression is a very tricky statistical threat that occurs during retesting. It only occurs when participants who scored extremely high or low are retested. Typically, when these subjects are retested, their scores shift closer to the mean - but only in the absence of any other influence. The shift is relatively small, but it is a reliable occurrence. This effect is partially explained by large measurement errors. For example, if the subject guessed poorly on some questions the first time that they were taking the test, this would have caused them to score lower. However, this is less likely to occur during the retesting.) A problem arises when a group is selected **because** of their extreme scores (such as low calculus grades), and they are then given a treatment (a refresher calculus course) before they are retested. Their scores can increase based on how effective the treatment was - but some of the increase can be due to regression, since the scores will naturally move closer to the mean. It is difficult to discover what the true cause of the score change was without using a randomized control group.
 * Regression:**

Selection occurs when the control group and experimental groups are selected in a manner that does not reasonably assure their equivalency (i.e. apples to oranges) (Suter, 175). Simply stated, if the control group and experimental groups are not comparable before the study; the information gathered will not be credible because they were not equal to begin with, and you will not know whether the difference you have observed was really caused by the treatment given. Selection is a threat to internal validity that can occur when nonrandom procedures are used. It is simply when the experimental and control groups are not equivalent. If the two groups are not comparable, how would you know whether a difference observed was due to the treatment and not the pretreatment difference? (Suter, 175). For example, if a researcher selects the first ten students’ that volunteer to be an experimental group and the next ten students to be a control group for his study, how can this selection affect the outcomes of the study? The participants in groups may be unlike in someway, so they will respond in different ways to the independent variable. Selection as a source of threat in research can be controlled by using a randomized control group.
 * Selection:**