4  Experiment 4

Online Processing


4.1 Motivation

One of the most common objections to singular they is that both generic and specific forms are too ambiguous (Hekanaho, 2020). However, it is unclear whether this opinion arises from actual difficulties in coreference resolution, or if it is more a product of language and gender attitudes. According to the processing fluency account (Alter & Oppenheimer, 2009), processing difficulty and language attitudes are connected both directly and indirectly. First, processing fluency can be a cue to language attitudes: if a listener attributes their difficulty understanding a speaker to that speaker being unable or unwilling to communicate in a way that the listener finds clear, experiencing less processing fluency will cause them to evaluate the speaker more negatively. Second, harder processing tends to elicit more negative affect, which may bias listeners’ language attitudes (Dragojevic, 2020). One line of experiments has connected the processing fluency account specifically to perceptions of nonnative-accented speech. Participants listened to audio recordings of fictional stories, and while their task was ostensibly to remember enough of the story to complete a fill-in-the-blanks memory task, the dependent measures were how much processing fluency they experienced (e.g., rating as clear, easy to understand), how positively they felt about the speaker (1‍–‍100 scale), and status (e.g., intelligence, competence) and solidarity (e.g., friendliness, niceness) judgments about the speaker. The experiments manipulated various ways of making the audio easier or harder to understand, independent of the speaker’s accent. Adding background white noise to the audio decreased listeners’ fluency ratings when the speaker used a Punjabi accent, more so than when the same speaker used a Standard American English accent. The lower processing fluency then resulted in more negative affect and lower status attributions to the speaker. When a Mandarin-accented speaker was accompanied with subtitles or participants had read a transcript of the story first, participants reported higher processing fluency, which resulted in more positive feelings about and higher status attributions to the speaker. In both sets of experiments, the effects of making the listening conditions easier or harder on status attributions were mediated by processing fluency and sequentially by fluency and affect (Dragojevic et al., 2017; Dragojevic, 2020; Dragojevic & Giles, 2016).

The processing fluency account would predict that dislike of singular they is caused, at least in part, by lower-level processing difficulty. Multiple factors could cause listeners to experience lower processing fluency for singular they compared to other pronouns: a larger set of possible antecedents may make it more ambiguous, it may elicit a number or gender mismatch agreement violation, it may be newly learned, and it is overall less frequent even for speakers familiar with it. The processing fluency account also predicts that making singular they easier to understand would reduce people’s negative reactions to it, and therefore the people who use it.

However, the actual amount of processing difficulty for singular they—particularly for definite specific gender-specified forms—is unclear. Only a few studies to date have investigated online comprehension of they coreferring with proper names. These are described in more detail in the Section 0.4.4, but to review briefly, people are slower to identify the referent for they compared to he and she, as measured through a maze task while reading (Shenkar et al., 2023) and a mouse tracking task while listening (Arnold et al., 2023). In two ERP experiments, a P600 effect was observed for they coreferring with proper names (gender-specified), but not with specific gender-unspecified referents (e.g., the participant) (Chen et al., 2023; Prasad & Morris, 2020). Since the P600 indexes detecting a syntactic error or having difficulty comprehending a sentence’s syntactic structure (Hagoort et al., 1993; Kaan et al., 2000; Osterhout et al., 1994; Osterhout & Holcomb, 1992), Prasad & Morris interpret their results to indicate that they coreferring with proper names still causes a gender agreement error, even though participants in their experiment all had significant experience with using they/them pronouns and considered it grammatical in offline acceptability judgments.

However, results like Prasad & Morris (2020) do not necessarily require that they for specific gender-specified antecedents is still ungrammatical for these participants. Even in LGBTQ+ communities, they coreferring with a name is still relatively infrequent overall and would not be expected in many contexts. Stimuli in sentence processing experiments are typically unrelated, with each sentence using a different name and referring to a new character. When singular they corefers with a new referent in each trial, it is unclear whether they is consistently perceived as syntactically anomalous, or if it is originally unexpected, but could be processed smoothly once anticipated to corefer with a particular referent. Experiment 4 tests processing in the context of repeated reference, where listeners can come to expect singular they to corefer with certain characters. This is potentially somewhat easier, and it more closely resembles the real-world contexts in which we hear pronouns referring to people.

Additionally, the majority of processing studies, particularly for generic indefinite they, have used self-paced reading and eyetracking while reading measures. Experiment 4 is one of the first to use the visual world paradigm, which measures eye movements while participants listen to sentences describing a visual scene. Gaze at pictured characters provides a measure of online processing as the sentence unfolds, since listeners automatically look at what they think is being talked about (Allopenna et al., 1998; Sedivy et al., 1999; Spivey et al., 2002; Tanenhaus et al., 1995, 2000). The visual world paradigm has advantages compared to other tasks, as it provides detailed time-course information about which alternative interpretations are being considered, in addition to when processing difficulties occur.

Experiment 4 investigates the degree of processing difficulty of singular they compared to he and she, if the processing of singular they follows the same patterns as he and she, and if processing measures correspond with offline judgments. The design is based on a prior line of work investigating ambiguous pronoun resolution. In Arnold et al. (2000, 2007), participants looked at illustrated scenes of cartoon characters and listened to stories about them:

1 Donald is bringing some mail to Mickey/Minnie, while a violent storm is beginning
2 He’s/She’s carrying an
3 umbrella and it looks like they’re both going to need it.

Part 1 introduced 2 named characters (Donald, Mickey/Minnie), using a verb (bringing) that allows for a subsequent pronoun to refer to either of the characters individually (Garnham, 2001; Gordon et al., 1993; Sanford & Garrod, 1981). While he or she (part 2) is more likely to refer to the character mentioned first in the prior sentence (Donald) (Arnold et al., 2000, 2007; Gernsbacher, 1989; Kaiser & Trueswell, 2011), it can also refer to the character mentioned second (Mickey/Minnie). In other words, the character mentioned first is more accessible (Ariel, 2006). This structure makes it possible for the referent of he or she—called the target character in visual world experiments—to remain ambiguous until the next phrase (part 3) can be compared to the illustration. In this example, either Donald or Mickey/Minnie is carrying an umbrella (Figure 4.1A). This allows for enough time to observe processing of the pronoun (is carrying an), but without creating a discourse context too different from actual language use.

The stories in Arnold et al. (2000) manipulated 2 factors: the ambiguity of the pronoun (target and competitor characters using the same vs different pronouns) and the accessibility of the referent (target mentioned first vs second). The results showed that listeners rapidly use both gender and accessibility cues to identify which character the pronoun referred to (Figure 4.1B). If gender was unambiguous (top right in Figure 4.1A), the pronoun referred to the character mentioned first (bottom left), or both (top left), participants looked at the target character starting at approximately 200ms after the pronoun. This is about as quickly as effects in the visual world paradigm can be observed (Hallett, 1986; Tanenhaus et al., 1995). When neither gender nor accessibility cues disambiguated the referent (bottom right), participants looked at the target and competitor characters almost equally. For the purposes of the present experiment, these results provide a validated stimuli design and a baseline for how we expect he and she to be processed.

Figure 4.1: Arnold et al. (2000). [A] Recreation of design, showing the pronoun ambiguity and order of mention conditions. The original materials were illustrated using the Disney characters. [B] Results, with 0 indicating pronoun onset and horizontal lines indicating the verb (carrying).

A later set of studies used a similar design to examine how listeners process pronouns acoustically ambiguous between he and she (Brown-Schmidt & Toscano, 2017; Falandays et al., 2020). This experiment used similar stories as Arnold et al. (2000, 2007), but different images. Instead of 2 characters being drawn to match the scene, characters were pictured in colored shapes. The target character was disambiguated by describing their location, e.g., he’s standing on a blue square instead of he’s carrying an umbrella. This allows for a larger range of stimuli to be created, and the prior results demonstrate that listeners can process he and she smoothly in these types of stories. Critically, while the descriptions may seem odd and somewhat discontinuous, the discourse structure matches how speakers introduce new referents and when they tend to use pronouns instead of names.

The current experiment uses similar manipulations as Arnold et al. (2000, 2007) and the same task as Brown-Schmidt & Toscano (2017). One potential issue with this design is that they can be ambiguous between a singular and plural interpretation, even if participants learn that they is always singular in the context of the experiment. An alternative is to use stimuli that rule out a plural interpretation of they. Reflexive pronouns (himself, herself, themself) can syntactically constrain a singular interpretation (e.g., Runner et al., 2006; Sturt, 2003), but introduce a potential confound, since speakers vary in whether they prefer themself or themselves for singular referents (Ahn & Conrod, 2022). Another option is to use stimuli that semantically rule out a plural interpretation. Returning to some of the examples in the first chapter (Section 0.2.3), they’re worrying (Example 6) can be singular or plural, since people can worry together, but their free leg (Example 7) can only be singular, since a body part only belongs to one person. However, it is difficult to create stimuli that rule out a plural interpretation, while still including a long enough period where the pronoun is ambiguous between two possible referents. Results like these would be difficult to interpret because it would be unclear whether processing costs are due to singular they itself, or because the structure of the story does not match when speakers use pronouns instead of names or other referring expressions. Moreover, because fully ruling out a plural interpretation is difficult, the majority of instances of singular they in actual language use do contain some degree of ambiguity between singular and plural interpretations. Results from stimuli that reflect a very narrow set of contexts in which people hear singular they, where no plural interpretation is at all possible, would be less relevant.

4.2 Methods

The design and analysis plan were preregistered on the Open Science Framework. Sources and attributions for the images are included with the materials; the edited images and audio stimuli are available upon request. The de-identified data and analysis code are available at this dissertation’s Github repository.

4.2.1 Participants

30 participants completed the study for partial course credit or for pay; their demographic information is shown in Table 4.1. Participants were required to be fluent English speakers (but not necessarily native or monolingual) and to have normal or corrected-to-normal vision and hearing, and most were Vanderbilt undergraduate students. An additional 2 participants completed the experiment, but were excluded due to too few trials having usable eyetracking data. The experiment lasted approximately 45 minutes.

4.2.2 Materials

Characters

Participants learned about 6 characters, each associated with a name and an image: 2 who used he/him, 2 who used she/her, and 2 who used they/them (Figure 4.2A). The 6 character names and 6 character images were the same as in Experiment 3 (Drucker, 2019). Recall that all names were gender neutral since counterbalancing gender associations of the names within lists was not feasible. Participants were randomly assigned to 1 of 6 lists, in order to counterbalance the images and names associated with characters who use they/them. Across lists, 3 images appeared twice with he/him and once with they/them, and 3 images appeared twice with she/her and once with they/them; each name appeared twice with each pronoun. Critically, across lists they/them appeared once with each image and once with each name, in order to avoid confounding interpretations about what aspects of a person’s name or appearance may make it easier for someone to learn that they use they/them pronouns.

Figure 4.2: Experiment 4: Stimuli. [A] Example set of characters. [B] Example trial screen and story, with grey boxes indicating information not shown to participants.

Stories

The stories and visual scenes were based on Arnold et al. (2000) Arnold et al. (2007) and Brown-Schmidt & Toscano (2017). During each trial, the 6 characters were arranged in a 3x2 grid, each shown inside a colored shape (red, yellow, green, blue; triangle, square) (Figure 4.2B). Participants listened to stories in the frame:

1 Jaime is painting a portrait of Sam, as some paint is spilling on the floor.
2 He is/she is/they are
3 standing in a blue triangle
4 and the painting looks amazing

Each story began with a sentence that named two characters, with an additional phrase to allow time for participants to identify them (part 1). The two named characters—the target and competitor—always used different pronouns (e.g., Jaime: they/them, Sam: he/him). This created 3 Pronoun Pair conditions: they/them targets with he/him or she/her competitors [They|HeShe], he/him or she/her targets with they/them competitors [HeShe|They], and he/him or she/her targets with he/him or she/her competitors [HeShe|SheHe]. Next, a pronoun (he, she, or they) referred to one of the named characters (part 2). This created 2 Order of Mention conditions, where the pronoun refers to the character mentioned first in the preceding sentence or to the character mentioned second. Figure 4.2B shows an example of the first-mention condition, where the pronoun (they) refers to the first named character (Jaime). The second-mention story matching this scene would have he is standing in a blue triangle, where he refers to Sam.

At this point in the story, participants could identify which of the named characters is the target if they knew the characters’ pronouns and were using that information in their language comprehension (e.g., Jaime uses they/them and Sam uses he/him, meaning that they refers to Jaime). The stories then described the location of the target character (part 3). Because the target and competitor characters were always pictured with the same color, the target was not fully disambiguated until the shape word, an average of 1364ms after the pronoun onset. After the shape word (part 3), listeners could identify the target character without taking the pronoun into consideration. The story concluded with a final phrase, which did not include another pronoun referring to the character(s) (part 4). After listening to each story, participants were asked to decide whether it matched the scene (e.g., if Jaime was standing in a blue square).

There were a total of 60 story frames (1 + 4). Within lists, each story appeared once in the first-mention condition and once in the second-mention condition. Across lists, each story appeared twice with each pronoun for counterbalancing, but with the same pair of names to make the stimuli recording feasible. There were a total of 24 pronoun + color + shape combinations (2 + 3). These clips were recorded as full sentences (not spliced together), and each trial randomly selected 1 of 3 versions, in order to avoid participants learning additional cues about a particular recording. The audio was recorded by the first author, a white native English speaker from the northeast U.S. with a feminine voice.

4.2.3 Procedure

Character Learning

To learn about the characters, participants first saw each character’s image, accompanied by their name (e.g., This is Jaime) and a fact about them (e.g., They like to play the piano, They work as an engineer). Each character was shown twice, so that participants saw two examples of the characters’ pronouns. However, pronouns were never directly stated (e.g., This is Jaime, who uses they/them pronouns), and the use of singular they was not explained to participants. Participants were then tested on the names and images of the characters. They were shown all 6 images and asked to click on the named character. If the answer was correct, the image and name of the character was displayed, along with another example of their pronouns (e.g., Correct, they’re Jaime). If the answer was incorrect, the image of and information about the incorrectly chosen character was shown (e.g., Incorrect, he’s Sam), followed by the image of and information about the correct character (e.g., They’re Jaime). To continue, participants were required to get all 6 names correct in the same block. When listening to the stories, participants should then have been able to identify the images of the 2 named characters, and had seen at least 3 examples of each characters’ pronouns.

Eyetracking

During each trial, the images were displayed for 1 second, then the audio began playing. After the story finished, the images remained on the screen, and the text Did the story match the picture? was displayed at the bottom. Participants clicked YES or NO at the corner of the screen to advance to the next trial. Eye movements were recorded with an Eyelink 1000 desktop-mounted eyetracker recording monocularly at 1000 Hz, with drift correction after every fifth trial. The trial order was randomly generated for each participant, the locations of the 6 images were randomly generated for each trial, and the colors and shapes were counterbalanced.

Participants completed 6 practice trials in order to explain the task and that they should judge whether the story matched the scene based on the colored shape sentence, since the action described at the beginning (e.g., painting a portrait) was not pictured. These practice trials used a name instead of a pronoun (e.g., Jaime is standing in a blue triangle). 4 trials matched the scene, and 2 trials mismatched by referring to a color not pictured. After each practice trial, participants saw feedback on if their match judgment was correct.

Participants then completed 96 critical and 18 filler trials, mixed in a randomized order. These varied according to 2 within-subjects factors: Pronoun Pair [They|HeShe; HeShe|They; HeShe|SheHe] and Order of Mention [target mentioned first; second]. The target and competitor characters were evenly distributed, yielding a total of 32 critical trials for each pronoun. Filler trials were included to ensure that participants treated no as an option in the match judgment question, even if they considered singular they acceptable and knew which characters used they/them. 10 of the filler trials were unambiguously the wrong description, referring to a color that was not pictured on the screen (e.g., for Figure 4.2B, they are standing in a red square). The other 8 filler trials used one of the pronouns of the non-named characters, making the story incorrect for the target character, as well as the competitor character (e.g., for Figure 4.2B, she is standing in a blue triangle). The he/him and she/her characters were each called they twice, and the they/them characters were each called he once and she once. No filler trials used he instead of she or she instead of he. Note that throughout the experiment, they was always singular, never plural. After completing the 120 trials, participants were tested on the names of the characters, following the same procedure as before, but without feedback.

Survey

Finally, participants completed the same singular they naturalness ratings, familiarity with using they/them pronouns, gender binary and gender essentialism beliefs (survey), and demographics questions as in Experiment 3. All demographic questions included the option to not respond. Figure 4.3 shows an overview of the full procedure.

Figure 4.3: Experiment 4: Procedure.

4.3 Predictions

The first question concerns whether listeners can accurately comprehend they as singular, then combine this with knowledge about the character’s pronouns to identify who is being described in the story. If so, participants will preferentially look at the target character after the pronoun and before the disambiguating shape word. While it is theoretically possible that singular they would show no processing costs compared to he or she, prior results indicate this is currently unlikely (Arnold et al., 2023; Chen et al., 2023; Prasad & Morris, 2020; Sanford & Filik, 2007; Shenkar et al., 2023). Instead, listeners may identify the referent for singular they before the disambiguation, but more slowly than they do for he and she. This result would resemble those observed in young children (Arnold et al., 2007; Song & Fisher, 2005) and in adult second language learners (Cunnings et al., 2017; Grüter et al., 2012; Speyer & Schleef, 2019), who can use gender and order of mention cues from pronouns to identify the referent, but do so more slowly than fluent adults. Alternatively, there are two ways of observing results where listeners do not preferentially look at the target before the disambiguating shape word, which the current experiment cannot distinguish between. One possibility is that listeners attempt to use singular they to identify the target character, but do not succeed because of ambiguity. Another possibility is that listeners recognize the potential ambiguity in they and strategically choose to wait until hearing more information before deciding on an interpretation.

A secondary question concerns the competitor character, who is named at the beginning of the story but whose pronouns are never used. Stories using he and she can have a competitor who uses she/her or he/him (never the same as the target character), or a competitor who uses they/them. A difference between these two conditions could be predicted in either direction: If trials where the competitor character uses they/them are slower than trials where the competitor character uses he/him or she/her, this could indicate that some aspect of the they/them characters—the pronoun activated alongside the character or the character themself—is causing greater competition (making it a stronger possibility) than the he/him and she/her characters. If, on the other hand, trials where the competitor character uses they/them are faster, this could indicate that listeners are treating the more ambiguous character as less likely to be referred to, either in general or with a pronoun.

With regards to order of mention, we expect to replicate prior results for he and she, where participants are more likely to look at target characters who were named first than target characters who were named second, because although the pronoun can refer to either, it is more likely to refer to the person mentioned first (Arnold et al., 2000, 2007; Brown-Schmidt & Toscano, 2017). If we also observe an order of mention effect for singular they—either the same as or present but reduced compared to he and she—it would suggest that singular they is being integrated into listeners’ standard discourse processing mechanisms.

4.4 Results

4.4.1 Participant Backgrounds

To contextualize the findings, I first discuss the results of the survey. Most participants were in the typical undergraduate age range (M = 20.13, SD = 2.61) and described themselves as native English speakers (N = 27). 19 were women, 11 were men, and none identified as transgender and/or a gender different than their sex assigned at birth (Table 4.1). Overall, all participants were at least somewhat familiar with singular they before the experiment: 5 had heard about people using they/them pronouns but not met anyone who does, 21 had met but were not close to anyone who uses they/them, and 9 were close to someone who uses they/them, but 0 participants used they/them themselves (Figure 4.4B). When rating the naturalness of singular they coreferring with different types of referents (Figure 4.4A), acceptance of indefinite forms was generally high (M = 5.17, SD = 1.76). Surprisingly, ratings for proper names (M = 4.98, SD = 1.66) were not significantly lower than ratings for indefinites (β = 0.19, t = 0.36, p = .73) (Table A.20). For the gender beliefs measure (Nagoshi et al., 2008), responses were again scaled to 0–6 and summed, so that a score of 0 indicated the lowest endorsement of the gender binary and gender essentialism, and a score of 54 indicated the highest Figure 4.4C. Participant totals spanned the entire range but were strongly skewed towards the lower end, with the mean response favorable towards trans and gender-nonconforming people (range = 1–53, M = 13.93, SD = 12.56) (see Table A.19 for item text and means).

Table 4.1: Experiment 4: Participant demographics. Categories with higher totals allowed participants to select as many options as applied. All questions included the option to not respond.

Experiment 4: Participant Demographics

Age

30

18-24

26

25-34

4

35-44

0

45-54

0

55-64

0

65-74

0

75+

0

Prefer not to answer / Missing data

0

Gender

30

Female

19

Male

11

Prefer not to answer / Missing data

0

Transgender & Gender-Diverse

47

I consider myself cisgender

17

I consider myself transgender

0

I don't consider myself cisgender or transgender

0

My gender is the same as what was written on my original birth certificate

30

My gender is different than what was written on my original birth certificate

0

Prefer not to answer / Missing data

0

Sexuality

34

Asexual

1

Bisexual/Pansexual

3

Gay/Lesbian

1

Heterosexual/Straight

27

Queer

0

Questioning

2

I use a different term

0

Prefer not to answer / Missing data

0

English Experience

30

Native (learned from birth)

27

Fully competent in speaking, listening, reading, and writing, but not native

3

Prefer not to answer / Missing data

0

Race/Ethnicity

34

American Indian or Alaska Native

0

Asian

10

Black, African American, or African

1

Hispanic, Latino, or Spanish

6

Middle Eastern or North African

1

Native Hawaiian or Pacific Islander

0

White

15

I use a different term

0

Prefer not to answer / Missing data

0

Prefer not to answer

1

Total Participants

30

Figure 4.4: Experiment 4: Prior Familiarity and Attitudes Survey. [A] Naturalness ratings on a 7-point Likert scale (1 = very unnatural, 7 = very natural) for singular they coreferring with indefinite referents and with proper names. [B] Experience with using they/them pronouns. [C] Gender binary and essentialism beliefs, with higher scores indicating higher endorsement and thus more negative attitudes towards transgender and gender non-conforming people (Nagoshi et al., 2008). The mean response is indicated by the black line.

4.4.2 Offline Measures

Character Learning

Participants were generally able to learn the name-image pairs within 2–3 test rounds (M = 2.33, SD = 1.09). Across all pretest rounds, accuracies for they/them characters (M = 0.81, SD = 0.40) and she/her characters (M = 0.83, SD = 0.38) were slightly lower than accuracy for he/him characters (M = 0.90, SD = 0.30). Participants remembered the names of the characters throughout the study, with most (N = 27) getting all 6 correct in the post-test, and no participants excluded for getting 4 or fewer correct.

Match Judgments

When asked if the description they heard matched the scene (Figure 4.5, left), participants correctly judged the majority of test trials to be matching. The match rates for singular they trials (M = 0.91) were not significantly lower than the match rates for he and she trials (MHeShe|They = 0.91, MHeShe|SheHe = 0.94) (Table A.21). For the wrong description trials, which referred to a color that was not pictured, participants were correctly at floor for all pronouns. For the wrong pronoun trials, which used the pronoun that neither of the two named characters used, participants varied. However, they were not less likely to indicate a mismatch when they/them characters were referred to with he or she (M = 0.44) than when he/him or she/her characters were referred to with they (M = 0.43) (Table A.22).

Reaction times were calculated from the display of the match question until the participant’s click, with responses outside of 3SD of the mean of each trial type excluded as outliers (Figure 4.5, right). Similar to the accuracy data, reaction times were shortest for wrong description trials (M = 2917ms, SD = 2360ms), somewhat longer for test trials (M = 3544ms, SD = 3369ms), and longest and most variable for wrong pronoun trials (M = 4995ms, SD = 3996ms). An inverse Gaussian distribution with an identity link was fit to test whether Pronoun Pair affected reaction time in the test trials (Table A.23). This accounts for the non-normal distribution of reaction time data, but in contrast to applying a non-linear transformation (e.g., log), it maintains the theoretical assumption that experimental manipulations affect the total amount of time to make a decision (Lo & Andrews, 2015). The maximal model that converged included by-participant and by-item intercepts and slopes for Pronoun Pair (Bates et al., 2015; R Core Team, 2023; Voeten, 2023). Participants were slower to make match judgments for stories using singular they than stories using he and she (β = -506.67, t = -12.40, p < .001). The pronoun of the competitor character did not affect reaction times (β = 32.96, t = 0.75, p = .45).

Figure 4.5: Experiment 4: By-participant mean proportions of stories judged to match the picture (left) and reaction times (right). Lines indicate by-participant means between Pronoun Pair conditions; violins indicate the distribution of by-participant means; point ranges indicate condition means and 95% CIs calculated over the by-participant means. The correct answers are match (=1) for test trials and mismatch (=0) for wrong pronoun and wrong description trials. For the wrong description trials, he/him or she/her for a they/them character corresponds to the HeShe|SheHe condition; they/them for a he/him or she/her character corresponds to the They|HeShe condition; and there were no wrong pronoun trials for the HeShe|They condition.

4.4.3 Online Processing

Figure 4.6 shows fixations to the target, competitor, distractor, and no characters, starting 500ms before the onset of the pronoun and continuing for 2500ms (e.g., …spilling on the floor. They are standing in a blue triangle, and the painting looks amazing). The He|She and She|He trials (first row) generally resemble prior results (Arnold et al., 2000, 2007; Brown-Schmidt & Toscano, 2017), with participants starting to look at the target more than the competitor rapidly after the onset of the pronoun, and more so when the target character was mentioned first. Unexpectedly, the order of mention effect—where listeners look more at the character named first in the story than to the character named second—is only clear in He|She trials, not in She|He trials. The He|They and She|They trials (second row) show the expected order effect, but participants are less likely to be looking at the target than in the He|She and She|He trials. The They|He and They|She trials (third row) still show participants looking at the target more than the competitor before the onset of the shape word, but less than in the other two conditions, and no order effect is apparent. Examining fixations during the beginning of the story (Figure A.10), participants looked at the target and competitor after each was named, and the time course did not differ by the character’s pronouns. This confirms that participants knew the names of the characters and had identified the two possible referents before the start of the critical time window.

Figure 4.6: Experiment 4: Eyetracking: Full Window. Proportions of looks to the target, competitor, distractor (average of 4), and no characters, split by target pronoun, competitor pronoun, and order of mention conditions. The gray box indicates the analysis region, starting 200ms after pronoun onset and ending at 1210ms, the earliest shape word onset across stimuli.

The primary analysis window, shown in grey in Figure 4.6, was offset by 200ms after the onset of the pronoun, the estimated time it takes to plan and execute a saccade in response to the auditory stimulus (Hallett, 1986). It continued until 1210ms after the pronoun onset, which was the earliest shape word onset across all of the stimuli (range = 1216–1790ms, M = 1364ms, SD = 103ms). Results were analyzed with dynamic generalized mixed-effects models, predicting whether participants looked at the target character (=1) or not (=0) at each time point (Brown-Schmidt et al., 2020; Cho et al., 2018). Observations were down-sampled to 10ms bins, where bins that included >5ms of a fixation on or a saccade to the target (McMurray et al., 2009, 2019) were coded as 1, bins that included <5ms were coded as 0, and bins that included 5ms were coded as 1 if they followed a bin that was coded as 1 and 0 if not. Aside from this, the data was not aggregated across trials or participants. The model included a fixed effect for Trend (timestep during trial, mean-centered) to capture linear changes across the trial in the level of fixations to the target. To account for autocorrelation between time points, the model included an AR(1) term, which captures whether the participant was looking at the target in the prior timestep. To calculate AR(1) for the start of the analysis window, timesteps for 180ms and 190ms were included in the data, and then the first timestep with missing data for AR(1) was excluded prior to estimation, resulting in 103 data points for each trial.

The fixed effect of Pronoun Pair was coded with orthogonal Helmert contrasts, with the first contrast comparing trials with they target characters to trials with he or she target characters (They|HeShe vs HeShe|They + HeShe|SheHe), and the second contrast comparing trials with he or she target and they competitor characters to trials with he or she target and she or he competitor characters (HeShe|They vs HeShe|SheHe). The fixed effect of Order was mean-center effects coded, comparing trials where the target character was mentioned second to trials where the target character was mentioned first. Figure 4.7 shows the proportion of looks to the target and competitor characters during the analysis window, comparing the 3 Pronoun Pair and 2 Order of Mention conditions.

Figure 4.7: Experiment 4: Eyetracking: Analysis Window. Proportion of looks to target, competitor, distractor (average of 4), and no characters, comparing between the 3 pronoun pair and 2 order of mention conditions. The window starts 200ms after pronoun onset and ends at 1210ms, the earliest shape word onset across stimuli.

The maximal random effects structure (Baayen et al., 2008; Barr et al., 2013) included by-participant and by-item slopes for Pronoun Pair, Order, AR(1), Trend (time point during trial), and Trial Number (time point during experiment), with items defined as the 60 story frames that named the 2 characters. The lme4 and buildmer packages in R identified the most complex random effects structure that would converge (Bates et al., 2015; R Core Team, 2023; Voeten, 2023), which included by-participant slopes for AR, Order, and Trial Number and by-item slopes for Order and Trend (Table 4.2).

The AR(1) effect was significant (β = 9.97, z = 91.37, p < .001), reflecting the fact that participants were more likely to be looking at the target during the current timestep if they had been looking at the target during the previous timestep. Trend was not significant (β = 0.05, z = 1.37, p = .17), indicating that the overall level of target fixations did not significantly increase or decrease in a linear fashion over the course of the trial.

Both contrasts for Pronoun Pair were significant: Participants were more likely to look at the target character after the onset of he and she than after the onset of they, across Order conditions (β = 0.28, z = 6.00, p < .001). After the onset of he or she, participants were more likely to look at the target character if the competitor character used he/him or she/her than if the competitor character used they/them (β = 0.28, z = 5.16, p < .001). Visual inspection of the data shows that in stories using he and she, looks to the target diverge from looks to the competitor and reach a proportion of 0.5 in the first quarter of the analysis window. In stories using they, looks to the target diverge from looks to the competitor in the first quarter of the analysis window, but do not reach 0.5 until the last quarter (Figure 4.7).

In the primary model, the main effect of Order was not significant, indicating that listeners were not more likely to be looking at target characters mentioned first than target characters mentioned second (β = 0.08, z = 1.06, p = .29). Both interactions of Order with Pronoun Pair were nonsignificant (They|HeShe vs HeShe|They + HeShe|SheHe: β = 0.14, z = 1.52, p = .13; HeShe|They vs HeShe|SheHe: β = -0.18, z = -1.65, p = .10). The lack of an order effect was a surprise, given that this was a robust result in prior research using a similar paradigm (Arnold et al., 2000, 2007; Brown-Schmidt & Toscano, 2017). Post-hoc models explored the order of mention effect in each condition separately (Table A.24, Table A.25, Table A.26). These analyses revealed a significant effect of Order in the HeShe|They condition (β = 0.22, z = 2.20, p < .05), but not in the HeShe|SheHe (β = 0.05, z = 0.47, p = .64) or They|HeShe conditions (β = 0.00, z = -0.03, p = .97).

After noting differences between the He|She and She|He trials in Figure 4.6, an exploratory analysis tested the effect of Target Pronoun. As in previous experiments, it was coded with orthogonal Helmert contrasts, with the first contrast comparing they/them to he/him + she/her target trials, and the second contrast comparing she/her to he/him target trials. The comparison between he and she trials was not significant (β = 0.04, z = 0.73, p = .47) (Table A.27). Additional exploratory analyses tested if Trend and AR(1) interacted with Pronoun Pair and Order, finding no significant effects in addition to those reported in the main model (Table A.28, Table A.29).

Table 4.2: Experiment 4: Model results for the effects of Pronoun Pair, Order, AR(1), and Trend on the likelihood of looking at the target character (=1) or not (=0).
Experiment 4: Looks to the Target Character
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.715 0.084 -56.037 <0.001
AR(1) (0, 1) 9.973 0.109 91.367 <0.001
Pronoun Pair: They|HeShe (-.66) vs
HeShe|They
(+.33) + HeShe|SheHe (+.33)
0.282 0.047 6.004 <0.001
Pronoun Pair: HeShe|They (-.5) vs
HeShe|SheHe (+.5)
0.277 0.054 5.158 <0.001
Order: Target Mentioned Second (-.5) vs First (+.5) 0.084 0.079 1.059 0.290
Trend (mean-centered) 0.052 0.038 1.370 0.171
Pronoun Pair (They|HeShe vs HeShe|They +
HeShe|SheHe) * Order
0.143 0.094 1.521 0.128
Pronoun Pair (HeShe|They vs HeShe|SheHe) * Order -0.177 0.107 -1.651 0.099
Random Effects
τ00 Story 0.007
τ00 Participant 0.196
τ11 Order | Story 0.024
τ11 Trend | Story 0.005
τ11 AR(1) | Participant 0.271
τ11 Order | Participant 0.112
τ11 Trial Number | Participant 0.025
ρ01 Order | Story 0.964
ρ01 Trend | Story 0.214
ρ01 AR(1) | Participant -0.299
ρ01 Order | Participant 0.091
ρ01 Trial Number | Participant 0.527
N Participant 30
N Story 60
Observations 296022

4.5 Discussion

Experiment 4 tested online processing of singular they coreferring with proper names, in a context where listeners can come to anticipate it for certain referents. Participants learned about two he/him, two she/her, and two they/them characters and passed a pretest to ensure that they learned the mappings between the names and images. In the eyetracking trials, each of the six characters was pictured in a colored shape. The stories named two characters—always a pair with different pronouns—and used a verb that allows an upcoming pronoun to refer to either of the named characters (e.g., Jaime is painting a portrait of Sam, as some paint is spilling on the floor). The pronoun phrase described the target character’s location (e.g., They are standing in a blue triangle), leaving about 1200ms between the onset of the pronoun and the onset of the disambiguating shape word to measure how listeners used the pronoun to identify the target character. Trials varied by two within-participants conditions: the pronouns of the two named characters (HeShe|SheHe, HeShe|They, They|HeShe) and whether the pronoun referred to the character mentioned first or second.

Visual inspection of the data suggests that listeners do comprehend they as singular and use this information to identify which character is being referred to, just to a lesser degree than with he and she. Looks to the target diverged from looks to the competitor at about 500ms after the onset of they, compared to about 200ms after the onset of he or she. This is similar to the patterns seen in young children (Arnold et al., 2007; Song & Fisher, 2005) and in adult second language learners (Cunnings et al., 2017; Grüter et al., 2012; Speyer & Schleef, 2019). Singular they, however, did not show an order of mention effect, suggesting that listeners may be using different cues to disambiguate singular they from plural they than when he or she could refer to multiple referents (e.g., if both of the named characters used he/him). After each trial, participants judged whether the story matched the scene, and they were not less likely to say that stories used singular they matched, but their responses were slower than for he and she.

Even though the competitor character’s pronouns were never used in a given story, they did affect processing. Within he and she trials, listeners were less likely to be looking at the target character when the competitor character used they/them than when the competitor character used she/her or he/him. One explanation is that participants were paying more attention in general to the they/them characters, since they were atypical or more difficult, or because they were inferred to be the goal of the study. Examining the fixations during the 1000ms where the screen was displayed before the audio started (Figure A.11) and during the first phrase when the two characters were named (Figure A.10) rules out this explanation, because participants were not more likely to be looking at the they/them characters than the he/him and she/her characters during either window.

One unexpected result is that the he and she trials did not entirely replicate the order of mention effect, which has been robust in previous experiments (Arnold et al., 2000, 2007; Brown-Schmidt & Toscano, 2017; Falandays et al., 2020). While there was a clear order of mention effect in the HeShe|They trials, the effect of order was not significant in the HeShe|SheHe trials. Breaking the data down further shows that the expected order pattern appeared for He|She trials, but not She|He Trials. Replication data collection is in progress, to determine whether this is a consistent pattern. With a larger data set, it will also be possible to conduct a dynamic tree-based item response analysis (Cho et al., 2020) which can test additional hypotheses by making a distinction between looks to the competitor character and looks to the distractor characters.