Chapter 5 Validity and Reliability

5.1 Define validity and reliability

Reliability and validity are fundamental to critiquing psychological research and to developing your own high-quality research. There are different types of validity and reliability that are relevant to us, which sometimes confuses people. Because of this, introductory textbooks often present convoluted definitions of these concepts. Fortunately, the real definitions are simple:

Reliability means consistency. Something is reliable if it is consistent. The more consistency, the more reliability.

Validity means truth. Something is valid if it is true. Truth is either-or; there is no such thing as “more true” or “less true.”

In other words, good psychological science requires certain types of consistency and for some of the claims we make to be true. Next, we will look at the specific kinds of reliability and validity that are important for scientists.

5.2 Types of consistency = Types of Reliability

Here are arguably the four most important types of reliability:

Type of Reliability Situation Definition How to assess
Test-retest You administer a measure to a participant, then wait some period of time, and give them the test again. The participant’s true score on the measure has not changed (e.g., IQ, personality). The extent to which a measure is consistent across different administrations Look for a correlation between the two administrations
Interrater A measure involves two or more raters who record subjective observations (e.g., counting the number of times a participant has a tic, counting the number of times a married couple shows affection) The extent to which two observers are consistent in their ratings Look for a correlation between the two raters
Internal consistency You are measuring a construct using several items (e.g., five items all rating your enjoyment of a course) The extent to which items on a measure are consistent with each other; expected if the items measure the same construct Cronbach’s alpha (.7 is acceptable, .8 is good, and .9 is excellent)

5.3 Validity is a property of inferences

Validity is a specific kind of truth. Validity is the truth of an inference, or a claim. In other words, validity is a property of inferences. An inference (a claim) is valid if it is true.

For example, I could claim that the earth is round. Hopefully, it is a claim that you accept as being true. If you agree, then you could label my claim as valid.

Validity in research is frequently misunderstood, which leads to bizarre and confusing definitions of validity. There is no such thing as “a valid study.” Only claims about the study are valid or not. There is also no such thing as “a valid researcher.” A researcher can make claims. Only the researcher’s claims are valid or not. There is also no such thing as “more valid” or “increasing validity.” Validity is truth of a claim. Either a claim is true, or it is not.

For better or for worse, we usually don’t know with 100% certainty if a claim is true or false (if we did, we wouldn’t need the research). Therefore, research methods get very interesting when we listen to other researcher’s claims and then debate if we agree with them or not. When we do this, we are evaluating the validity of claims made about the study. Next, let’s look at different types of claims (inferences) that are made in research.

5.4 Types of inferences in a study = Types of validity

Here are some of the most important types of validity.

Type of Validity Type of Claim Definition Example claim
Construct validity The study operations represent the constructs of interest The truth of claims that study operations match study constructs “The Stanford-Binet was used to measure IQ”
Internal validity The study IV caused a change in the study DV The truth of claims that the IV causes changes in the DV “The control group reported lower levels of stress than the experimental group, suggesting that the manipulation raised stress.”
External validity The study results apply to situation X The truth of claims that the findings will apply as participants/units/variables/settings change. “Although data were collected from college students, a similar effect would be expected in working adults.”
Statistical conclusion validity The statistical analysis was significant or not significant The truth of claims about the size and direction of the relationship between the IV and the DV. Or, that the statistical results are correct. “p < .05, indicating a significant difference”

Finally, you might encounter these other types of validity, but they are less clearly defined and evaluated:

  • Content validity: The truth of claims that a measure adequately samples (includes the important elements of) the domain of interest. For example, if IQ includes both verbal and math ability, an IQ test would need to have both verbal and math items.
  • Face validity: The truth of claims that a study operation “seems like” the construct. For example, a study about distractions from mobile devices might not support claims of “seeming real” if the phone in the study is a paper mockup.
  • Criterion validity: The truth of claims that a measure can predict or correlate with some outcome of interest. A personality test as part of a job application would have criterion validity if it predicted applicants’ success in the job.

5.5 Threats to validity

Threats to validity are specific reasons why an inference about a study is wrong. They can help us anticipate problems in the design of our own research. The best way to address threats to validity is to change the design of our research. Understanding threats to validity also helps you critique research done by others.

There are many threats to validity. In this course, we will focus on the most common ones.

5.5.1 Construct validity: When operations don’t match constructs

All threats to construct validity occur when the study operation does not match the construct of interest. Researchers usually clearly state the constructs that apply to their study in the introduction. They then make claims in the methods section that their study operations represent the constructs of interest.

Threats to construct validity are explanations about why a particular study operation and its intended construct do not match. It could be that the measure is too general (using an IQ test to measure reading ability), or too specific (using a reading test to measure IQ). It could be the wrong construct (using an IQ test to measure happiness). It could be two or more constructs combined as one (a task performance construct measured with both speed and accuracy). This last example, where a study operation includes two or more constructs, is called construct confounding.

5.5.2 Internal validity: GAGES

All threats to internal validity are confounding variables. A confounding variable is a “third variable” that can cause a simultaneous change in the IV and the DV. We looked at the effect of confounding variables when we talked about causality. Experiments provide strong protection from threats to internal validity because of random assignment. In quasi-experimental and non-experimental designs, internal threats to validity are much more likely.

The most common problematic third variables can be remembered as GAGES (Pelham & Blanton, 2019): Geography, age, gender, ethnicity, socioeconomic status.

5.5.3 External validity: OOPS

Every threat to external validity is an interaction effect. An interaction effect means “it depends.” When a claim about how the study will apply to a new population or a new situation are false, it is false because the study has a different effect after the change.

The most common study variations that may affect study results can be remembered as OOPS! (Pelham & Blanton, 2019): Operations (changing the study operations), occasions (changing the time), populations (changing the people), and situations (changing the environment).

5.5.4 Statistical Conclusion Validity

All threats to statistical conclusion validity increase the odds of being wrong in your statistical conclusion. You may remember that these have names: Type I and Type II error. Put another way, a threat to statistical conclusion validity increases the chance of either a Type I or Type II error.

Low statistical power (sample size is too low, or the treatment is weak) can increase the chance of a Type II error; you might not be able to reject the null hypothesis when you otherwise should.

Fishing is running analyses over and over again until you find one that is significant, then ignoring all the non-significant results and just reporting the significant one. Fishing greatly increases the chance of a Type I error; if you do statistics this way, you’ll probably get significant findings that are spurious.

5.5.5 Scientific Publishing

For both reading and producing science, knowing how publications work can be helpful.

Peer review means that the work was evaluated by professionals working in the same area. This is also called refereeing (i.e., a refereed article). Usually the reviewers are anonymous.

Scientific publications range from more informal to more formal publications. More informal works allow researchers to get results out to the public faster, but they have less stringent review (or maybe no peer review). More formal publications take longer to get published, and they have the most thorough level of review, but they have the highest prestige since they have been scrutinized by the peer review process.

The most informal works are conference presentations. These can be presented in short talks or on a scientific poster. Poster sessions work like a science fair; researchers stand in front of their posters and answer questions from audience members about their work. Conference presentations usually do not have a paper attached to them. Authors might only write an abstract, and the reviewers accept or reject the presentation after only reading the abstract. On the SJSU campus, the Spartan Psychological Association Research Conference (SPARC) is a great first scientific conference. It is held toward the end of April each year.

Some conference presentations have a proceedings paper. The proceedings paper is a complete article that describes the work. Proceedings papers usually are peer reviewed, but the review process is much faster than for a journal article.

A journal article (often called an “article” or a “pub”) is the most formal presentation of research results. It has to pass a peer review process lead by an editor of the journal. Journal articles can be rejected totally, rejected with an invitation to fix the problems and resubmit (called a “revise and resubmit”), or accepted. Journal articles can describe new research studies, a replication of a past research study, a summary of many past studies (called a meta-analysis or a literature review article), discuss new ideas without presenting new data (called a theoretical article), or discuss how to properly use research methods (called a methodological article). If you go on to complete a graduate degree, the article written to describe your final graduate project is called a thesis or dissertation.

Researchers also publish results in chapters in edited books, or by writing a book on the topic. Researchers can use any other medium, as well, including a blog post or a magazine article. These types of works are not usually peer reviewed, so their claims have not undergone much, if any, review.

5.6 Introducing APA Style Manuscripts

For any type of scientific publishing related to psychology, you will find APA style to be an expectation. APA style is a set of guidelines developed by the American Psychological Association with the goal of “clear, concise, and organized” writing.

We will learn a bit about APA style throughout the course; it covers everything from citations to grammar and mechanics. For now, we will introduce a general outline that APA suggests for a research report:

5.7 Introduction

The introduction is often described as a funnel. The top part of the funnel is more general, and the bottom is more specific and focused. First, the paper starts broadly by introducing an area of interest or a theory. Then, the researcher describes the research problem. The paper gets more specific as the researcher identifies a specific question of interest, which leads into the most specific point of the intro—the hypotheses. In a good introduction, only constructs are discussed. Study operations are discussed in the method section.

5.8 Method

The method section is where the study is operationalized. The method section has sub-sections that describe participants, materials, conditions, and measures.

5.9 Results & Conclusions

In the results section, statistical analyses are presented to test the study’s hypotheses.

In the conclusion section, the researcher ties the results back to the research question developed in the hypothesis. At this point, the paper is getting more general, because it’s again talking about the study constructs.