Science · Inquiry & data skills

Evaluating Investigations

Not every study is a good study. Here is how to judge whether an investigation actually proves what it claims, and how to spot the flaws.

A big part of the CAEC Science test is not remembering facts, it is judging investigations. You will be handed a short description of an experiment or study and asked questions like: Was this a fair test? Was the sample big enough? Does the conclusion actually follow from the data?

Good news: you do not need to be a scientist to answer these. You just need a checklist and a habit of asking the right questions. Let's build that checklist together, then use it to critique a real-looking study.

Four questions that evaluate any study

Whenever a question asks you to judge an investigation, run it through these four checks. Each one targets a different way a study can go wrong.

  • Fair test (validity)? Did they change only one thing at a time and keep everything else the same? If two things changed at once, you cannot tell which one caused the result.
  • Big enough sample? A result from 3 people or 2 plants could easily be luck. Larger samples make a pattern more trustworthy.
  • Repeated (reliability)? Was the measurement done more than once? Repeats show whether the result is consistent or just a one-off fluke.
  • Free of bias, and credible? Who ran it, who paid for it, and did anything push the result in a particular direction? A study funded by a company selling the product deserves extra scrutiny.
One more check at the end: does the conclusion match the data? A study can be perfectly run and still overreach, claiming far more than the numbers actually show.

Three words the test loves: valid, reliable, credible

These sound similar but mean different things. Keeping them straight makes the questions much easier.

TermWhat it asksQuick test
ValidDid it actually measure the thing it claims to?Was it a fair test, only one variable changed?
ReliableWould you get the same result if you did it again?Were measurements repeated and consistent?
CredibleCan you trust the source and the way it was reported?Any bias, conflict of interest, or missing detail?

Worked example: critiquing a plant-fertilizer study

Here is the kind of scenario the test gives you. Read it, then we will run it through the four checks.

A company that sells "GrowFast" fertilizer ran a test. They gave GrowFast to one tomato plant on a sunny windowsill and gave no fertilizer to one tomato plant in a dim corner. After three weeks, the GrowFast plant was taller. The company concluded: "GrowFast makes tomato plants grow taller, so every gardener should buy it."

At a glance it sounds convincing, one plant grew taller! But run the checklist and it falls apart fast.

  • Fair test? No. Two things differed between the plants: the fertilizer and the amount of sunlight. The sunny plant might have grown taller from light alone. We cannot credit GrowFast.
  • Big enough sample? No. One plant per group. A single plant could be tall for dozens of reasons. You would want many plants in each group.
  • Repeated? No. The test ran once. There is no way to know if the result would happen again.
  • Free of bias? No. The fertilizer company ran its own test and is selling the product. That is a clear conflict of interest.
Incorrect

"GrowFast makes tomato plants grow taller, so every gardener should buy it." The data cannot support this, sunlight was not controlled, the sample was one plant, it was never repeated, and the source is biased.

Correct

"This test does not show whether GrowFast works. To find out, give many plants the same light and water, change only the fertilizer, and repeat the test."

The key move: the conclusion did not match the data. Even if the plant really was taller, the study's design makes it impossible to say why. Spotting that gap is exactly the skill being tested.

When a result is too small to trust

Sample size questions are common. Imagine a fixed version of the study: same light, same water, only the fertilizer changes, with eight plants per group. Here is the average height after three weeks.

30150No fertilizerGrowFast22 cm24 cm

Even with a fairer design, the difference is small, about 2 cm. Now the right question becomes: is a 2 cm gap a real effect, or just normal plant-to-plant variation? That is where repeating the study and looking at the spread of results matters. A small difference from a single run is weak evidence, no matter how careful the setup.

Tips for evaluation questions

  • Hunt for the second variable. The most common flaw is changing two things at once. Ask: "What else was different between the groups besides the one thing they tested?"
  • Count the sample. If you see "one person," "a friend," or a tiny number, sample size is probably the issue the question wants.
  • Check who benefits. If the people running the study gain from a particular result, flag bias and credibility.
  • Compare the claim to the numbers. Read the conclusion last and ask whether the data really shows that much. Words like "proves," "always," or "everyone" are often overreach.

Your turn: practice problems

Read each short scenario and decide what is wrong with it, and how you would fix it. Try before you reveal.

  1. A student tests whether music helps studying. She studies one night with music and gets 80% on a quiz, then studies a different chapter another night in silence and gets 70%. She concludes music improves test scores. What is the biggest flaw?
  2. A juice maker reports that "90% of people preferred our juice", but the survey asked 10 of the company's own employees. Name two problems.
  3. A weather app claims its new model is more accurate because it correctly predicted rain yesterday. Why is this weak evidence, and what would make it stronger?
Tap to reveal the answers
  • 1. It was not a fair test. Two things changed: the presence of music and the chapter being studied (one may simply be easier). To fix it, keep the material the same and change only the music, ideally with many students.
  • 2. First, bias and credibility: the company surveyed its own employees, who are not neutral. Second, the sample is far too small (only 10 people) to claim anything about people in general. Fix: survey a large, independent group.
  • 3. One correct prediction is a sample of one and could easily be luck, the conclusion does not match such thin data. To strengthen it, test the model over many days and compare its accuracy against the old model across a large number of predictions.

Why this matters for the CAEC

The CAEC Science test is 35 questions in 90 minutes, and a calculator is allowed. It is a skills and scientific-inquiry test, it does not ask you to memorize biology, chemistry, or physics facts. Instead, it hands you investigations and asks you to judge them. Evaluating fairness, sample size, repeats, bias, and whether a conclusion fits the data is exactly the kind of thinking those questions reward.

Want more practice like this? Explore more Science lessons, dig into the CAEC Ready Workbook, or start with a free sample to test yourself.

Disclaimer

This article is a general study lesson. CAEC Ready is an independent study resource and is not affiliated with or endorsed by any government, ministry of education, or official CAEC testing provider.