Types of Studies

Formulating a Scientific Study, Part 2

Oct 04, 2024

Welcome to Part 2 of Formulating a Scientific Study! Today we’re going over the different types of studies and how they relate to our example from Part 1, determining if the amount of hot chocolate you consume determines the amount of snow you have to shovel.

Ablation Testing is most commonly seen in machine learning studies. Each component of the system is removed one by one and the performance of the full system is compared to the system with one aspect removed. This is useful for determining what components are most relevant to the system, as well as what level of degradation the system can handle while still performing effectively. This can also be thought of in terms of the dependent/independent variables discussed in Part 1 - the ablation testing can help tease out which parts of the system are dependent on others.
Case Reports are commonly found in the health sciences, and are exactly what they sound like - a paper that covers an individual case in extensive detail. These aren’t great for making sweeping generalizations about topics, since they are by definition only talking about a single instance. However, they are often where new research is first presented to the world and can be used as justification to fund further, more detailed studies. This is self-explanatory in terms of our hot chocolate/shoveling example from Part 1 - we’d simply follow a single person over the course of a set period of time.
Cohort studies, sometimes called longitudinal studies, follow one group over a large period of time. These are useful because they are relatively cheap, and if given enough subjects and time, the results can be standardized and the effect of confounding variables can be limited. This can be done using the law of large numbers, which states that if you have enough independent random samples, the average of that will converge on the true value. So, if we have enough participants in our study, we can theoretically ignore all the other things that make them different and assume that the shoveling average for each group (hot chocolate drinkers vs non-hot chocolate drinkers) will converge on the real values. Since we often don’t have enough participants for that to be the case, there are other statistical analyses that can be done to limit the affect of confounding variables. If we know that the snow is a confounding variable, we can add it to the statistics we use. However, these approaches only work if we can determine what variables actually are confounding, and that can be difficult. In addition, if the studies involve people it can be difficult to motivate them. Unlike in randomized control trials, the researchers can’t control what happens to the participants, so there could be large class imbalances. A famous example of a cohort study is the Framingham Heart Study.
Randomized Control Trials are studies in which the participants are randomly assigned to the control group or the experimental group. These are good because, given enough participants, the randomization will eliminate any biases. It is also easy to statistically analyze these results. Double-blind studies are randomized control studies in which the participant and the researchers do not know who is in the control group. Triple-blind studies have an added layer of blindness - the person doing the statistical analysis doesn’t know who is in which group as well. These are often considered the gold standard of trials, especially for drug development, because they allow researchers to control as many variables as possible and they can be analyzed easily with basic statistics. However, they are very expensive and time consuming, especially when large numbers of participants are required, and can suffer from a volunteer bias - how do we know that our drug actually works on everyone, and not just on the type of people who are adventurous enough to volunteer for the study and have the tenacity and time/money/general life privilege to complete it? If we wanted a randomized control trial for our hot chocolate/shovel example, we’d randomly assign people to buy hot chocolate or a different drink every time they went out, then measure how much snow they have to shovel. This example makes it obvious why randomized control trials are considered the best - we would never reach our absurd conclusion if this is the first type of study we tried.
Meta-Analyses are exactly what they sound like - a way to look at all the studies relevant to a particular topic and come to a conclusion. The law of large numbers comes into play again here - the more studies that come to the same conclusion, the more confident we are that the conclusion is correct.

Science for the Unscientific

Discussion about this post