Faulty memories have served us just fine through evolutionary time

Let’s talk about our memory experiment. There were many features of memory that were of interest when I designed this experiment and I will touch on all of them. But this is a long, actually very long post. So, I will start with the points that I think are the most interesting. The most interesting of all is that, when push comes to shove, we preferentially remember the gestalt over the details. I would argue that the gestalt-over-details approach has worked for us mammals through evolutionary time.

To remind you of the task, I embedded the original video, in which viewers are asked to remember five words, above. The five words were on the screen for about 13 seconds (at the end of the video). If you want to cut to the chase and simply see the slide with the five target words, here it is:

This slide was presented for approximately 13 seconds. Viewers were asked to remember the words without writing them down or recording them.

Respondents correctly recalled about half of the words  

Most respondents (76%) remembered that there were five words to recall but listed fewer than 5 words (71%). In fact, the number of words listed averaged less than three (2.8). Overall the average number of words that were correctly recalled was 2.2 (standard deviation was 1.2; note that because of the large number of respondents, the standard error is infinitesimal in this and all measures examined, and is therefore not useful as a descriptive statistic).

The number of correctly remembered words as a function of the number of words attempted. The bars show the standard deviation for each group.

Of the words listed, most (73%) were correct.

The proportion of answers that were correct rose with the number of attempted words.

Many of you are surely wondering whether misspelled answers were counted. We tried to account for spelling so that misspelled, but clearly correct answers (e.g. persimone instead of persimmon) were counted as correct. Of course, I am sure that we did not catch all of these misspellings. So the true proportion of correctly remembered words is likely somewhat higher than calculated.

Remembering the gestalt over the specifics

The incorrectly “remembered” words were not random guesses. Most of the incorrect answers shared membership in a group that also contained one of the words presented. A less common type of incorrect answer was a word that sounded similar to one of the target words. For example, canopy instead of canoe, or parmesan for persimmon. In both cases, it is patently obvious that words of the same category or words that sound similar are words that are related to the targeted words.

Perhaps for amusement, there were very few respondents who listed words completely unrelated, either categorically or phonetically, to the five target words. For example, one person listed spinal cord, forebrain, brainstem, and so on whereas another listed brain, neuron, synapses, vision and eletric [sic]. All answers that I spotted in this category (I spotted 3 but there are probably several more; very hard to search for such random answers) included words that were highly related to the course.

While most incorrect answers were part of the correct group, they were not the correct member of that group. For example, kayak, porcupine and pomegranate were popular wrong answers that were clearly related to canoe, hedgehog and persimmon. Far more respondents remembered that there was a number than recalled what that number was. For example, one wrote, “27 (there was a number for sure)”. We can infer that these incorrect answers were primed by the original words. This means that although the answers were not completely faithful to the facts, they were highly influenced by the original words. Put another way, no one guessed protest or discus or licorice although these words are about as frequently used as canoe or redwood or persimmon (see more below in Which words were remembered?).

Speaking of priming, the number of words recalled as well as the proportion of correct responses would be greater if the words could be chosen from a list of potential words. This holds even if there are distractors in the list. As anyone who has tried to learn a language as an adult knows, coming up with a word is far more difficult than recognizing a word.

What does our penchant for remembering categories over specific members tell us about memory? Basically, we are far better at remembering meanings or associations than exact words. I am sure that you easily recall conversations that you had that were important to you. And I bet that you can have a strong memory of how you felt. Yet, I imagine that you would have a hard time reconstructing all the specific words used. And that if you did attempt to recall the conversation verbatim, it would be correct only in its gist and would be far from accurate in the particulars of word choice and arrangement.

Just in case you think you’re immune to a faulty memory, let’s take an example from pop culture. The most famous line in one of the greatest movies of all time, Casablanca, is “Play it again, Sam” spoken by Rick, played by Humphrey Bogart. All well and good except that this precise line is not actually in the film. The closest is “Play it, Sam.” The word again is never said in close proximity to any of the instances when Rick asks Sam to play As Time Goes By. My guess is that we remember the again because Rick is asking Sam to play the song for an additional time. Thus, the spirit of the remembered quote is correct even if the precise words are not.

On a personal note, I am currently in Paris teaching at the UChicago Paris Center (6 Rue Thomas Mann). I am living in a UChicago-owned apartment a block away. I taught in the same setup in January 2013 and occupied the exact same apartment for 3 weeks. Now, two years later, I still have a vivid memory of when the washing machine locked up on me capturing my clothes and sending me into a panic. When I arrived a few days ago, I did not remember the correct order of the buttons to push on the washer (=the details). But my memory of my previous bad experience (=the gestalt) was enough to ensure that I remembered to ask the Center director for instructions. This does not surprise me. What did surprise me is that I had 1) no recall of how to use the range burners; and 2) no recall that the range was tricky. Note that I used the range every day to cook at least breakfast and lunch. Even with introspection, I cannot recall learning the rigid sequence of 4 buttons to push but I am guessing that after not being able to cook on my first night here in 2015, I will remember the range’s trickiness in 2 years time. Remembering the gestalt of emotionally laden events far more reliably than remembering mundane details is a great evolutionary strategy.

Faulty, not false, memory

Our relatively stronger ability to recall categories over specific words has been consistently demonstrated using the Deese-Roediger-McDermott paradigm. In this paradigm, subjects see a list of words such as bed, pillow, dream and so on. They are then asked to either produce or choose from a menu all the words that they previously viewed. People tend to recall sleep as one of the words although it was not in the original list. Subjects even feel very confident that the “lure” word, in this case sleep, was listed.

The confident recall of words that did not occur has been referred to as false memory. While technically true, I find this perspective overly lexical and underwhelmingly biological. What I mean by that is that recalling bedpillowdream vs recalling bed, pillow, sleep has zero implications for survival and reproduction. This is particularly true when you’re performing this task in a psychological laboratory. The semantic choice is essentially optional without biological repercussions that evolution could ever have acted upon. So, if remembering word-for-word text is the goal, then humans are not the way to go. Instead I suggest a tape recorder (or rather an audio recording; tape recorder shows my age). A better term for lexical mistakes of either omission (forgetting a target word) or commission (recalling a word that was not one of the target words) is faulty memory

Others with far more expertise than I in the field of memory agree that inventing a word in a recall test within the context of a controlled experiment is very different from a false memory. As  Kathy Pezdek and Shirley Lamwell explain in their highly accessible article, the term false memory was introduced in the early 1990s to refer to a biographical event or episode that although made of whole cloth, so to speak, is recalled with deep confidence and certainty. They go on to write:

“Using the term ‘‘false memory’’ to describe all memory flaws would be like using the term ‘‘elements’’ to describe all types of matter. Yes, all elements are matter, but all chemical principles that describe elements do not apply to all of the other subtypes of matter (i.e., compounds, mixtures, etc.), and to assume so would misinform the development of the field.”

Wikipedia sums up the objections of Drs Pezdek and Lamwell particularly well: “…it is inappropriate to compare the recognition of a word with the implantation of a memory for an entire childhood event.”

Which words were remembered?

I discussed this experiment with a colleague who, unlike me, actually studies memory. He remarked upon how idiosyncratic the chosen words were; not the typical dog-cat-table-room words. After my native English-speaking colleague repeatedly asked me what a persimmon was, I had this sinking feeling that my choice of words may have been ill-advised. However the die was cast, meaning that the video was shot and there was no going back.

While it was not possible to change the chosen words, I thought it might be possible to figure out some measure of their usage frequency. Luckily, it only took me a few google-clicks to get to Wiktionary:Frequency lists which lists the most common words in a dizzying number of languages. For English, there are lists, in sets of 10,000, that have been garnered from all books in Project Gutenberg. Because the frequency listed in Wiktionary depends on words in books that are in the public domain, the books and words are weighted toward older classics. Thus the frequency of word usage books in Project Gutenberg does not perfectly reflect modern parlance, either written or spoken.

So here are our words, from the most frequent to the least frequent:

  • Canoe: 3716 (sandwiched between renewed and protest)
  • Fifty-five: 13,536 (of note, five comes in at 416, fifty at 962; and another double-digit multiple of 5, forty-five, comes in at 8,831 )
  • Hedgehog: 17,921 (sandwiched between citation and transmutation)
  • Redwood: 21,504 (sandwiched between discus and eavesdropper)
  • Persimmon: 24,965 (sandwiched between licorice and cubicle)

Thus, according to Wiktionary, the frequency of the 5 words is ordered as follows:

canoe >> fifty-five > hedgehog > redwood > persimmon

where >> indicates roughly twice as great a difference as >. Not surprisingly, we remember words in order of their frequency so that canoe was correctly recalled by 53% of respondents whereas persimmon was remembered by only 36% of respondents. In fact the relationship between word and recall frequency is strong (r=0.82, r2=0.67). Yet, there are two deviations from the expected relationship. Hedgehog is remembered slightly better than would be expected (48.0% vs 45.6%) and fifty-five is remembered slightly worse than would be expected (39.5% vs 42.5%). The slightly poorer memory for the number fifty-five than expected by word frequency alone is likely due to the lack of an obvious association for 55. In contrast, the word hedgehog can refer to an animal, a video game, or an important signaling molecule so that the modern frequency of hedgehog usage is likely far greater than what is suggested by the books included in Project Gutenberg.

The influence of English proficiency on word recall

Considering that I expected the majority of students, and therefore respondents, to come from outside of North America, I was very worried that my highly idiosyncratic choice of words was seriously ill-advised. True to expectations, 57% of the MOOC students did not come from countries where English is either the primary language (USA, Australia) or a primary language (UK, Canada). The proportion of students who responded to the memory survey that were native English speakers was 47%, very close to the 43% of all NeuroMOOCers from the USA, Australia, UK, and Canada.

However, the influence of English fluency on word recall was not overwhelming. An influence was present, no doubt, but it was only dramatic between those with the best and worst English skills, between those with a basic knowledge and native speakers:

The proportion of respondents with a basic knowledge of English and native speakers who correctly remembered 0 to 5 numbers correctly.

Moreover, nearly as large a proportion of respondents with a basic knowledge of English remembered all 5 words as did respondents who were native English speakers.

The proportion of respondents who remembered all 5 words correctly, according to their fluency in English.

The ability of non-native English speakers to do as well as they did is remarkable to me. Presently I am in Paris, France. It is incredibly difficult for me to summon up French words, despite taking French through high school and having French-speaking relatives. One respondent reported her process thus, “My native language is French. Persimmon is a fruit I discovered just a few weeks before the test and it’s called that way in French (or “kaki”). Hegdehog is the second word of the list, I think : I had to search for its meaning in an English dictionary at the time I was confronted to the test. I remembered the meaning, but not the word itself (I had to search it back in the dictionary).” It would appear then that this individual used a general recall with French words and then reverse translated the word back into English.

English language proficiency did not alter the proportion of respondents who correctly recalled different words. With one exception, there was roughly a 20% drop between the proportions of respondents that recalled canoe and persimmon, the most and least often recalled words, respectively (basic, 19%; intermediate, 21%; proficient, 21%; fluent, 18%). The exception was native speakers, among whom there was less of a drop-off between the rates of recall for canoe and persimmon (16%). Additionally, a smaller proportion of native speakers actually remembered fifty-five than remembered persimmon (43% vs 44%). In sum, it would appear that correct recall is more dependent on word frequency among non-native English speakers than among those for whom English is their mother tongue.

Interval of time between memory formation and recall and age of respondent

Surveys were sent out at three time points: after the 1st, 4th and 8th weeks. There was a small drop-off in recall between the 1st and 4th weeks but not an appreciable difference between the 4th and 8th weeks. This pattern held across all words as is clear in the following graph:

This shows the proportion of correct recall across time and for each word.

One might expect that with age, recall would become increasingly susceptible to the ravishes of time. However, there was no difference in how respondents in different decades recalled words across time. In all age groups, recall was best after week 1 and slightly worse afterwards. The only exception to this was in the over-80 crowd. When asked after week 8, respondents who were over-80 recalled only one word on average correctly; this was less than the average of 2.0-2.2 words recalled by all other age groups at this time point. However, I would take these data with a cow lick of salt as only 5 respondents fell into the over-80 age group queried after week 8.


Mnemonics were helpful. People who used mnemonics attempted more words and correctly recalled more words. This is not surprising. What is a bit surprising to me is that respondents correctly reported whether the mnemonics helped them or not. Among those who reported that they used a mnemonic and that it was helpful, 3.7 words, on average, were recalled correctly. In contrast, an average of only one word (yup, 1.0) was recalled correctly by respondents reported that they used a mnemonic and that it was not helpful. This latter rate of correct recall is even lower than the proportion for those who did not use a mnemonic at all (average of 1.7 words). This is surprising to me because in general, people are inaccurate when they attempt to report their internal motivations.

More words were attempted and more words were correct when people used mnemonics.

The mnemonics mentioned in the comment box were amusing. There were many variations of “fifty-five hedgehogs sitting in a redwood canoe eating persimmons.” There were also many people who reported using images that they did not describe. Here is a selection of the verbalized mnemonics (with corrected spelling):

fifty-five people in a canoe looking for a hedgehog [missing the persimmon]

I pictured a number (the number that was mentioned) of redwood canoes going down stream being piloted by the animals that were mentioned – not sure if it was Koala or hedgehog but a small, not everyday animal, one that could be held

a 55-year old hedgehog making a canoe out of redwood, and filling it with pomegranates. Hope that wasn’t cheating (I didn’t write it down, *ever*).  [No, not at all] If anything is wrong, it’s that pomegranate *might* be persimmon [bingo]

while in my canoe I saw a hedgehog eating a persimmon under a redwood tree near route 55

I pictured a hedgehog paddling a canoe past a redwood tree while holding a persimmon but if I discover I should have pictured an alligator riding a bicycle past a telephone pole with an ice cream cone I wouldn’t be totally surprised

CHP RF  (California Highway Patrol Request For – plan, quote etc ..) helped with the first letters and the order of the words

I used a poem to remember the words – the owl and the pussy cat poem and substituted the words i.e. hedgehog for owl, persimmon for pussy cat, canoe made of redwood for a beautiful pea green boat and fifty five pound note for five pound note

a hedgehog in a boat by the lake eating a persimmon. The canoe is redwood with 55 embedded. The 55 is what I am now unsure of

a hedgehog riding in a canoe race through the redwoods, eating a persimmon, with the race number “55” on it’s back

a hedgehog in a red canoe gathering falling persimmon from a redwood tree it had just collided with while traveling 55 mph

January I went to Canoe Museum in Peterborough ON Canada, I am 55 yrs old, persimmons are delicious – one of my favorite fruits, I’ve never seen a hedgehog, red woods are beautiful


The demographic characteristics of the 4149 respondents are far more varied than is the case in traditional, in-place experiments that typically tap the college student population. The distribution of the respondents’  ages is shown here:

A histogram showing the distribution of ages for all respondents.

Clearly the over-80 crowd was under-sampled (11 respondents) but the rest of the age distribution is well represented with a minimum of more than 200 respondents in each bin. The gender representation was skewed toward females (62%), to a similar extent as the overall enrollment for Understanding the Brain. English proficiency was in general high, with almost half of the respondents being native speakers (47%) but there were students with only a basic knowledge of the language (2%). The remainder reported that they were intermediate (9%), proficient (16%), or fluent (3%).

Neurological conditions

Neurological conditions were reported by about 9% of the respondents. Conditions were varied. Most of the reported conditions would not be expected to alter memory; e.g. peripheral neuropathy, foot drop, lumbar stenosis, Parkinsons disease, multiple sclerosis, autism, PTSD, depression. And indeed, respondents who reported a neurological condition did not perform any worse on the word recall test (they were in fact slightly better than those without a neurological condition, 2.3 vs 2.2 words recalled correctly).

Selective attention

I put the words in 5 different colors in order to test how people recall items that they are not specifically asked to attend to. Spoiler alert: If you want to try this test before I tell the spoiler, don’t read on; instead click on this video.

Okay, you’ve been warned. In the most famous selective attention test, a crowd of people wearing either dark or white t-shirts is milling about and passing 2 basketballs around. You are asked to count the number of passes from white-t-shirted people. As you concentrate on this (very difficult) task, most people don’t see the person in a gorilla suit walking through. By the time that I watched it for the first time, I already knew about the gorilla snd I did indeed see the gorilla. But, amazingly, even as attuned as I was to the point of the experiment, I totally missed that the gorilla beats his chest. Amazing!!! That is what I call a robust finding.

When I colored the words, I was basically trying to do a baby version of the invisible gorilla experiment. It was not ideal but it basically proved the point. Regardless of color blind status, people guessed that there were 3 different colors. Many of the comments were that people had not noticed the colors. A small minority not only noticed the colors but remembered which words were in which colors (mostly correctly).


  1. I do wonder about the recall process relative to the brain and evolution GIVEN that hypnosis has shown that complete and accurately detailed memories are stored. It would seem in view of the hypnosis experiments that there is some evolutionary aspect related to behavior as to how memories are recalled.
    This is a MOST interesting observation: “The gender representation was skewed toward females (62%), to a similar extent as the overall enrollment for Understanding the Brain.”; I wonder whether that has some sociological aspect to it. It’s a statistic that certainly lends itself to stereotyping. 🙂
    FWIW: “The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors. ”

    Might have something to do with the results relative to the test.


  2. As one of the participants in the experiment, I thank you for sharing all these results. Very interesting! 🙂

    ps — I hope you have plenty of opportunity to enjoy Paris, when you’re not wrestling with appliances! 🙂


  3. I don’t think I could have remembered the words in the short time that they were shown in the video. I’m surprised at the number of people who remembered all the words.
    It’s interesting to me that by trying to remember all the words, I missed the opportunity to create a simpler mnemonic with fewer words. If I and those who actually participated had chosen to only remember 3 words, I suspect that the average words recalled would have been higher. Perhaps much higher because making it easier involves less stress. Anxiety about getting it right probably interferes with learning and memory. After three words were easily rooted in a mnemonic, participants could choose to add the other 2 words if they remembered them.


