home

AI: Actual Intelligence!

  • Blog
  • by Dr. Tim Holmes
  • 6 min

Another day, another article reporting further advances on the predictive accuracy of an AI model lands in my in-box.  This one, called Centaur, created by researchers at Helmholtz Munich, claims to mimic human thinking and predict decision making with “striking accuracy”1

This post isn’t a review of that model which is indeed impressive when you consider the training data the algorithm uses: a dataset called “Psych-101” which includes more than 10 million decisions collected from 60,092 participants in 160 behavioral experiments.   

Sounds great, right?  But this is precisely where I invite you to pause and reflect for a moment.  Anyone who has ever read a psychology research study will know that the majority of participants in any academic study are psychology students, it’s one of the standard criticisms you’re taught to mention when critiquing papers as part of psychology degree.   

Why does this matter? 

Simply put, and with the best will in the world, psychology students are not that representative of the public at large2.  This is one of the reasons many lab-based studies struggle with replication even in different labs, let alone in the real-world the results of many psychology studies are directly tied to the participants and the precise conditions used for the studies in question. 

Using eye tracking in psychology studies helps researchers understand individuals' preferences, biases and behaviors.
Using eye tracking in psychology studies helps researchers understand individuals' preferences, biases and behaviors.

The world of attention research, whether it be for media, shopper, wayfinding, human performance or application/interface optimization has, inevitably, seen the same rise in popularity of AI predictive models as almost every other facet of life in the 3 years since ChatGPT version 1.0 was unleashed upon the world.  In fact, predictive models of attention have been around much longer than that, and in particular Visual Salience models, such as that first proposed by Itti & Koch3 have been growing in popularity and number since 2001.  Interestingly, these were never actually intended to predict eye-movements, a point which I’ll come back to later, but they were, and still are, very much predictions of likely allocation of attention, most usually visualized in a spatial map that looks very much like an eye tracking heat-map.  

I’ll be totally honest, my experience with many of these early models had left me somewhat unimpressed, since they tended to do a great job of mapping involuntary attention from the first 1-2 seconds of viewing, but failed miserably at mapping voluntary attention, or it’s associated eye movements, which we have known for decades, is primarily task driven rather than stimulus driven3.  More recent developments, including a broadening of data types to include EEG, which can provide deeper insights into attention than most simple eye tracking studies, have resulted in significantly improved levels of sophistication thanks to AI algorithms that have been trained on massive data-sets using a range of biometric and behavioral markers. 

So, after that rather lengthy preamble, let’s get to the main question of this post...

Predictive attention models — what are they good for, what are their weaknesses and what opportunities do they present for the savvy behavioral researcher? 

The Good 

Firstly, I’m not going to single out specific models or algorithms.  As part of my research for this post, I played with many of the leading algos and spoke with representatives from several key commercial players, so feel free to reach out to me directly if you want my personal opinions.  Instead, I am going to discuss what they are generally good for. 

As with all Artificial Intelligence, the algorithms are only as good as the data they are trained on.  If there are biases in the data then these will almost certainly be reflected in the outcomes from the algorithm, so it’s certainly worth asking this question of any provider before you invest in it. 

That said, these algorithms are generally very good a predictive “average” attention, in other words, attention in the absence of any specific participant segmentation or any specific task.  As such they are great for use in early-stage testing of designs, and as tools. I genuinely believe every designer should be testing concepts with these before ever presenting options to a client.  Basically, they level the playing field, meaning you can test both your own, and other people’s designs, against “common knowledge”, by which I mean, the knowledge of how attention works based on years and years of neuroscience research.  I’ve never yet seen a prediction from one of these algorithms that a vision science/attention expert could not have identified simply from their knowledge of the research, but people like me don’t come cheap and aren’t always available, meaning AI provides the chance for a start-up to pre-test in the same way a large FMCG company might. 

But herein lies a problem… 

AI models cannot predict if a customer will choose brand A or B. Wearable eye trackers help you see the choice process.
AI models cannot predict if a customer will choose brand A or B. Wearable eye trackers help you see the choice process.

The Bad 

…well, actually, several problems. 

  • Current AI models base their entire knowledge on what’s already known, meaning they are extremely unlikely to provide any earth-shattering insights into your designs.  This sort of insight only usually comes from innovative research paradigms, with the right participants and, almost certainly, some qualitative component as well as an objective methodology like eye tracking, or even EEG. 

  • The “what’s already known” might actually be completely irrelevant to your particular needs.  Remember, predictive models of attention are trained on participant data, and so in order to be relevant to you, those participants need to match your user-base or target audience.  This can be hard to achieve if, for example, you are launching a product rebrand and your question is “how do I grow my customer base without alienating any of my existing customers?”  Questions like this typically require testing against specific groups of participants, typically called cells in market research, and comparing the results.  Currently this is something AI models just can’t do, and in fact you would probably require a locally trained version of those models based on your customer base. 

  • As I mentioned in the introduction, predictive attention algorithms have their roots in Visual Salience models which were very good at predicting involuntary attention – that is the kind of attention which is automatically pulled by things which stand-out or are conspicuous because they are unexpected.  So, the influence from brightness, color, motion, and even loudness and pitch for audio stimuli, were typically well predicted by these models3.  Unfortunately, many of them kind of brushed over the role of voluntary attention, which takes slightly longer to kick-in, typically dominates the allocation of attention long-term (by which I mean after around 2 seconds), and is guided by high-level cognitive processes such as task goals, intent, reward, preference and desire.  These of course, are highly contextual in terms of your research questions and your participants and can vary over time and repeated exposure, meaning a simple prediction with no task knowledge based on a generalized set of participants, really isn’t going to tell you that much at all! 

  • Lastly, but actually quite importantly, as I mentioned before, predictive attention models were never intended to predict eye movements, and to be honest, they still don’t today.  This means that if it’s the actual eye movements (e.g. the sequence of fixation, their duration, returns to areas-of-interest or regressions in the case of text processing), these models aren’t going to help, for that you are going to have use an eye tracker.  For many of you this might not be an issue because it is the higher-order concept of attention that you are interested in, but the whole reason that eye tracking and attention are often used in the same sentence is that measures like those I have mentioned above are often essential to understand where problems like ambiguity or confusion like in a design.  In human performance studies, it is often the automated use of non-conscious eye-movements that can represent the difference between a novice and an expert. 

Predictive AI models cannot access what an expert vs. a novice sees. Eye tracking can accurately show you the differences.
Predictive AI models cannot access what an expert vs. a novice sees. Eye tracking can accurately show you the differences.
How can you leverage the best these algorithms have to offer to give yourself a competitive edge?

The Opportunity 

You might now be asking yourself “are these predictive attention models worth bothering with?”  As I mentioned at the start, when I started researching these models a few years ago, my answer would’ve been “no”, but this is technology we’re talking about, and it never stands still.  Some of these algorithms have improved almost beyond recognition in the past 5 years, but the limitations I mentioned above still frequently apply.  In a new book co-authored with Roger Jackson of Shopper Intelligence titled The Nursery Rhyme Conundrum4 we discuss the authority being given to AI as a tool, and I firmly believe we need to continue to see it as just that, a tool, and not a complete answer.  So how can you leverage the best these algorithms have to offer to give yourself a competitive edge? 

  1. The algorithms are out there. There’s no escaping that, which means any competitor can use them to learn about your products. Why give them that advantage if you’re not going to use it yourself? See how your designs stack up against the competition from a “general attention” perspective the insights from this could be critical, especially given the dynamism of the marketplace.  EVERYTIME a competitor changes a product design, you should be re-running this comparison because when it comes to attention, success is highly contextual. 

  2. Don’t stop there. One of the best ways to appeal to your customers is to show how well you KNOW your customers. Testing based on general participants, and especially models trained on psychology students, will never give you the level of insight you will get from testing attention with your unique customer base.  For example, we know that attention distribution can vary with age, gender and nationality, so without this level of segmentation you’re already in trouble.  The only way to show you know your customers is to test on your customers. 

  3. Be the best. If you are a luxury brand, it’s just not enough to be generic.  You need to be different.  One look at the endless stream of LinkedIn posts showing suggested redesigns of ads, interfaces and packages based on AI algorithms will confirm that they look EXACTLY like they were generated by an AI, and not crafted by a designer. This is precisely what a luxury brand needs to avoid, unless of course it’s trying to be ironic. More and more we see the suggestion that AI can even act as participants in a study, but these “participants” will certainly not represent high-end customers.  

  4. The unexpected matters. Everyone likes a good eye tracking heat-map, and it’s no coincidence that predictive attention algorithms produce a similar looking output. I have long been a critic of their misuse in eye tracking, and so I’m clearly going to highlight a key issue they have here. Outliers in eye tracking studies, which typically are excluded from heat-maps because they can skew the representation, are frequently some of the most informative when it comes to non-obvious insights.  A classic example, which I encountered early in my PhD days, was what appeared to be a complete breakdown of my “unconscious preference detection” algorithm which was supposed to provide an optimal design as output. When I tested it on my partner, it gave me two equally likely designs, one was mostly red, the other mostly green. He is, of course, red/green colorblind and so the two designs were actually the same single result, revealing a whole new potential for my algorithm! Without the actual eye tracking results for an outlier you will never know they could exist and more importantly why they were behaving the way they were.  A reliance entirely on general population based predictive attention algorithm removes any potential for learning from outlier behavior, this is especially relevant for UX and human performance research.  

Quite simply, nothing, right now, can replace testing real designs with real people performing real tasks.  This is not artificial intelligence, it’s actual intelligence. 

References 

  1. Binz, M., Akata, E., Bethge, M. et al.A foundation model to predict and capture human cognition. Nature (2025).

  2. Hanel, P. H., & Vione, K. C. (2016). Do Student Samples Provide an Accurate Estimate of the General Public? PloS one, 11(12), e0168354.

  3. Itti, L., Koch, C. Computational modelling of visual attention. Nat Rev Neurosci2, 194–203 (2001).   

  4. Jackson, R., &  Holmes, T. (2025). The nursery rhyme conundrum. Pantheon Publishers.

Written by

  • Dr. Tim Holmes

    Dr. Tim Holmes

    Independent Neuroscientist, Researcher and Educator

    Dr. Tim Holmes is a visual neuroscientist who researches the role that environment and design play in decision making and behavior. He is recognized as a leading authority on eye tracking and visual attention and has worked with top brands, retailers, architects, content creators, and sports teams to educate on, and develop, behavioral interventions. Tim also works with many academic institutions and is an award-winning educator and public speaker on the application of neuroscience to behavioral influence.

Keep learning about AI predictive models versus eye tracking

Swoosh Top

Subscribe to our blog

Subscribe to our stories about how people are using eye tracking and attention computing.