Statistical Significance: Sample Size(s) & Statistical Power – To understand the world around us, researchers formally use the scientific method as a way to separate suspected truths from falsehoods. Cognitive Neuroscience aims to understand how genetic, neurological, and behavioral systems support an organism’s ability to sense, interact, navigate and think about the world around them.
This means cognitive neuroscience designs experiments and collects data at all levels of analysis. Research programs worldwide seeking to further our understanding of the natural world are regularly testing assumptions, or hypotheses, in a well-planned series of smaller experiments. These experiments tend to probe specific factors that may or may not influence an outcome while minimizing the influence of extraneous factors such as environment, sexual orientation, race or socioeconomic status.
Scenario One: A Dopamine Release Study
In Cognitive Neuroscience, dopamine is generally considered a “feel-good” compound. Its release in the Nucleus Acccumbens (NuAc) is triggered by behaviors or things that make us motivated to behave. These can include:
- Eating a good meal
- Time with loved ones
Let’s say we would like to find out if peak dopamine levels in the NuAc occur before, during, or after exposure to a desired or familiar visual stimulus. We can use the EEG experimental design adopted from Amatya Johanna Mackintosh’s study. We can hypothesize that dopamine release occurs during and peaks slightly after exposure to the familiar or desired visual stimuli.
Now, most critically, where do we get test subjects?
In experimental situations, “population” refers to the larger, total collective group being studied. It’s impractical and unlikely that your lab could devise a technique to recruit and collect dopamine release data on hundreds of thousands or millions of people.
Therefore, we will attempt to gather data from a smaller, representative group or sample to understand the population. To do that, we’ll need to answer two main questions.
- How many individuals need to be included in our sample?
- How does this relate to the practical significance and statistical power?
Let’s break it down below.
Statistical Power and True Effect
Statistical power is defined as the probability of a test detecting a statistically significant difference when such a difference truly exists. It is also referred to as a true effect.
The true effect is the cornerstone of experimental design. Cohen’s 1988 report, prolific for its contributions to the scientific method, reasoned that a study should be designed to have an 80% probability of detecting a true effect. This 80% represents a high-power (HP) test design, while any value nearing 20% is a low-power (LP) test design.
Cohen suggested that studies should always have less than a 20% probability of making a type II error, known as a false negative. He also uses these same guideline ranges for missed discoveries, which occur when a researcher inaccurately reports no significant effect when a difference truly exists.
Why does Statistical Power Matter?
Think of this scenario. If a true effect exists in 100 different studies with 80% power, statistical tests will detect a true effect in 80 out of the 100. However, when a study has a research power of 20%, if there are 100 genuine non-null effects in the results, these studies are expected to discover only 20 of them.
Statistical Power Shortcomings in Neuroscience Research
Unsurprisingly, because of the resource-intensive nature of neuroscience research, this field has a median statistical power of about 21% and averages out at a wide range of 8%-31%. Low statistical power in neuroscience research:
- Casts doubt on the replicability of findings.
- Leads to an exaggerated effect size.
- Reduces the likelihood of statistically significant results that accurately represent the true effect.
As such, the current state of neuroscience research is trapped by the statistical power problem because these values are far below Cohen’s theoretical threshold.
Establishing a Representative Sample(s) Group
Scenario One’s goal: Avoid sampling errors and type I and II errors in our test with inclusive and large sampling.
How many human brain scans need to be included in our sample set if we want the experiment to be practically significant? Practical significance refers to whether or not results from an experiment apply to the real world.
A neuroscientist’s experiment’s ability to determine effects (statistical power) is related to sample size. Continuing scenario 1’s parameters, the goal is still to collect enough data so that we can statistically evaluate if there is a true effect in the timing of dopamine release after showing emotionally charged visual stimuli. We also need to establish criteria for inclusion into the sample that minimizes the potential for a sampling error.
How to Avoid Sampling Errors
Two terms are important to understand before moving forward.
- Sampling error: When sampling, there is always a chance that the selected individuals’ collected data will not represent the population.
- Statistical Significance: Statistical significance means that our data and our observed effects are likely true effects. In most biomedical sciences, statistical significance is established with a significance level or p-value of .05. Essentially, this means the scientists are 95% confident in the effect observed in their experiments.
Consider if the data shows a relationship (i.e., dopamine release). There is a 5% possibility that the effect is from chance and unrelated to the variable (visual stimuli). This would be a Type I error. Alternatively, there is a 5% probability that our collected data could show no relationship between dopamine release and visual stimuli when, in fact, there is a true effect – a false negative or Type II error.
Carefully establishing inclusion criteria is more impactful because there is a point of diminishing returns after a certain sample size.
We are hoping to collect data representing all humans, and we want our conclusions to be both practically significant and statistically significant. To design our sample set successfully, a sampling error, type I error (false positive), or type II error (false negative) must be accounted for and avoided.
Our experiment is testing the following hypothesis:
- Null hypothesis – No relationship or effect between the timing of dopamine release in the NAc and emotionally valent visual stimulus.
- Hypothesis – There IS a relationship between the timing of dopamine release in the NAc and emotionally valent visual stimulus, and peak dopamine release occurs after seeing the visual stimuli.
There is a relationship between the timing of dopamine release in the NAc and emotionally valence visual stimuli. When the data is not statistically significant:
- Our hypothesis is rejected.
- No true effect or difference is found.
- Our observed effects are just as likely to result from chance.
Understanding the Population?
Practical limitations in experimental design.
In neuroscience research, a formal inclusion criterion typically attempts to randomize and/or equalize the likelihood of inclusion across the population to avoid sampling errors. We need to avoid selecting individuals just because they are the closest or most accessible to collect data from, as this is the prescription for a sampling error.
The best approach to sample set generation is to use inclusion criteria that randomly equalizes the likelihood of selection across the entire population. For example, using census data, we could obtain contact information for 50 randomly selected individuals in each county of Ohio. This would minimize selection bias because names would be randomly chosen equally from all geographic areas.
Establishing the experimental design, increasing sample size, and fully realizing an unbiased, randomized, and equally applied inclusion criteria can quickly run up against practical limitations. This is an issue for scientific research at all levels, from academic exercises to full-fledged research universities. Usually, budgetary and timeline limitations are the first to force compromise. Collectively, these issues around statistical significance are active areas of research.
What is the True Effect Size?
Due to the low statistical power of neuroscience research, we tend to overestimate the true effect size leading to the low reproducibility of many studies. Furthermore, the inherent complexity of neuroscience research makes statistical power critical.
One method the field can adopt is to increase the power of a study by increasing the sample size. This increases the probability of detecting a true effect. Choosing an appropriate sample size is vital to designing research that:
- Makes practical discoveries.
- Advances our understanding of the countless processes in the brain.
- Develops effective therapies.
Overcoming Challenges in Contemporary Neuroscience Research: The EmotivLAB Platform
Neuroscience research’s experimental designs should push to establish larger sample group sizes and better inclusion criteria in order to achieve reliable statistical significance. With access to a crowd-sourced enabled platform like EmotivLAB, researchers are provided access to potentially far more diverse, far more representative subject individuals – improving sample size and inclusiveness of all demographics with minimal additional logistical effort for the research groups.
Modern neuroscience research can fall vulnerable to sampling errors due to limited available resources to recruit a diverse group for the experimental sample set. The “WEIRD group” concept encapsulates the issue. Most university research is done on a shoestring budget on experimental subjects that are generally speaking Western, Educated, and from Industrialized, Rich, and Democratic countries. However, remote data collection equipment, like EmotivLABs’ EEG platform, allows researchers to reach beyond the college campus to recruit sample groups that better reflect the population.
EmotivLABs’s platform and remote EEG equipment are not just helping researchers expand the diversity of individuals included in experimental sample groups. It also mediates the issues regarding overall sample size and geographic reach into target populations.
The EmotivLABs platform frees researchers from the current constraints and instead allows them to focus their energy on designing experiments and analyzing the results. Our platform matches the experiment with the most suitable individuals in the subject pool. There is no need to spend time recruiting participants, coordinating and scheduling them, and performing in-lab data collection. All that is required is that the desired demographic be specified in the online platform, and EmotivLABs will make the experiment available to contributors who best conform to the desired parameters. Participants can undertake the experiments in their own homes, using their own equipment. Their familiarity with the headset removes the need for researchers to provide instruction about its use.
Beyond that, the EmotivLAB platform provides automated EEG recording data quality control and assessment. Large amounts of low-quality data do not help overcome sampling or statistical errors in experimental designs. Having access to more high-quality data, however, does provide a solution to help avoid errors in:
- Statistical significance
Want to Learn More About What the EmotivLABs Platform Could Do for Your Research?
EmotivLABS enables you to build your experiment, deploy your experiment safely and securely, recruit from a global panel of verified participants, and collect high quality EEG data, all from one platform. Click here to learn more or request a demo.