During my quantifying user experience course I used statistical methods to analyze data. We analyzed data from an existing user study that had been conducted by the instructor in the past.
The goal of the data analysis was to analyze how multiple variables would affect user task performance.
All participants conducted four search tasks on everyday topics. Each participant performed an easy and difficult task with Interface A and an easy and difficult task with Interface B.
I started the project by compiling an initial analysis without hypothesis testing to get a better understanding of the raw data at hand. I looked at the interface and memory span effects on average task duration (s) and the interface and memory span effects on fixation duration (s). Task duration represents the length of time a user spends on a task while fixation duration represents the length of time a user rests their eyes on a particular section of the monitor.
The top two graphs show that interface B is easier to use than interface A, with users spending less time on interface B to complete tasks. The bottom graph shows that users with a higher memory span complete tasks faster than users with a lower memory span. Slide right to see the raw values.
The top two graphs again show that interface B is easier to use than interface A, with users spending less time fixated on the screen. The bottom graph interestingly shows that users with a higher memory span actually spend a longer time focusing on an area of the screen. Slide right to see the raw values.
The initial analysis gave me a good idea what the data looked like, but the next step was to calculate statistical SIGNIFICANCE using hypothesis testing. I used paired t-tests with a 95% confidence interval to compare three different parts of the study and determine if the results of the comparison were statistically significant:
Based off of the data collected thus far, it is apparent that overall users perform better when using interface B rather than when using interface A. The t-tests allow us to conclude that specifically for the easy tasks, users perform better while using interface B, but we can not draw the same conclusion from analyzing the difficult tasks because the results were not statistically significant.
Looking at a new variable, fixation count, I calculated whether task duration and fixation count were correlated in any way. Fixation count represents the number of times a user's eyes focuses on a certain area of a screen.
There was a very strong, statistically significant positive correlation between task duration and fixation count. This data shows that as users spend longer amounts of time completing a task, the time they spend focusing on an area of the screen increases as well (and vice versa).
Lastly, I conducted a two way ANOVA analysis in order to examine the effect of memory span and interface on task duration and fixation duration. The descriptive statistics earlier gave me an idea of what to expect, with ANOVA analysis I would be able to accept or fail to accept the analysis drawn from the descriptive statistics.
The website interface was the only variable that resulted in a statistically significant effect on both task duration and fixation duration. There was no statistically significant effect found between memory span on task duration, memory span on fixation duration, and the interaction effect. Based off of these results, we can infer that participants took less time completing the task on interface B than on interface A AND participants took less time focusing on parts of the screen using interface B than on interface A.
Completing this project has reiterated to me the importance of statistical analysis when trying to infer information from data. The initial analysis told me one story, but the statistical analysis showed me what part of that story was reliable.
Because ANOVA analysis showed that the memory span effect on both fixation duration and task duration WAS NOT statistically significant, I can not rely on the data from the initial analysis. Because ANOVA analysis showed that the interface effect on both fixation duration and task duration WAS statistically significant, I can rely on the data from the initial analysis.
Building on top of the story, the T-Testing comparison analysis found that there was only a significant difference for the easy task complexities but NOT for the difficult task complexities.
Putting it all together and completing the story, we can only reliably infer that the significant difference between interface A and interface B mainly come from the effect of different interfaces on the EASY tasks. Therefore, users spend less time on easy tasks when using interface B. We can also reliably infer that the interface has an effect on fixation duration. Therefore, users spend less time focusing on parts of the screen when using interface B.
Overall, as a user experience researcher I would recommend the use of Interface B over Interface A.