Appendix A — Supplementary Analyses

A.1 Experiment 1

A.1.1 Experiment 1A: Additional Results

In addition to whether the characters used he/him, she/her, or they/them, participants were also asked about the characters’ jobs and pets (Figure A.1). Accuracy matching the 12 jobs to the 12 characters was lower (M = 0.21, SD = 0.41) than for the 3 pets (M = 0.41, SD = 0.49) and 3 pronouns (M = 0.66, SD = 0.47), but not at floor. Neither job nor pet accuracy varied based on the character’s pronouns. Accuracy for the characters’ pets (cat, dog, or fish) was designed as a comparison to pronoun accuracy, and the two were compared in a model including the Character’s Pronouns (contrast coded as in the main analyses) and Question Type (pronoun vs pet, mean-center effects coded) as fixed effects. The most complex model that converged included by-participant random slopes for Question Type (Table A.1). Averaging across the three character pronouns, participants were significantly more accurate for pronoun questions than pet questions (β = 1.17, z = 11.19, p < .001). The interaction between Character Pronoun (they/them vs he/him + she/her) and Question Type was significant (β = 1.45, z = 7.55, p < .001), reflecting that the character pronouns affected accuracy for the pronoun question, but not the pet question. Probing this interaction indicated that for they/them characters, there was no significant difference in accuracy between pronouns and pets (β = 0.21, z = 1.35, p = .18), but that for he/him + she/her characters, pronoun accuracy was higher than pet accuracy (β = 1.65, z = 12.97, p < .001).

Figure A.1: Experiment 1A: By-participant mean accuracy in the multiple-choice memory task for each character’s pronouns, pet, and job, with colors indicating the character’s pronouns. Error bars indicate 95% CIs calculated over the by-participant means.
Table A.1: Experiment 1A: Model results for the effects of Character Pronoun and Question Type (character’s pronoun or pet) on Memory Accuracy. Character Pronoun is contrast-coded as in the main analysis.
Experiment 1A: Memory for Pronouns vs Pets
  Memory Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.199 0.073 2.731 0.006
Question Type: Pet (-.5) vs Pronoun (+.5) 1.168 0.104 11.189 <0.001
Character Pronoun: They (-.66) vs He (+.33) + She (+.33) 0.884 0.096 9.216 <0.001
Character Pronoun: He (-.5) vs She (+.5) 0.010 0.113 0.088 0.930
Question Type * Character Pronoun (They vs He+She) 1.449 0.192 7.548 <0.001
Question Type * Character Pronoun (He vs She) 0.128 0.226 0.566 0.571
Random Effects
τ00 Participant 0.323
τ11 Question Type | Participant 0.243
ρ01 Participant 0.231
N Participant 102
Observations 2448

The memory and production tasks were compared directly by creating a model predicting accuracy in both tasks, with Task as a mean-center effects coded fixed effect (Table A.2). The main effect of Task was not significant (β = 0.11, z = 1.04, p = .30), but the interaction between Pronoun (They vs He + She) was significant (β = 1.38, z = 6.36, p < .001). Probing this interaction indicated that memory was more accurate than production for they/them characters (β = -0.80, z = -4.77, p < .001). Conversely, memory was less accurate than production for he/him and she/her characters (β = 0.56, z = 4.19, p < .001).

Table A.2: Experiment 1A: Model results for the effects of Pronoun and Task (memory vs production) on Accuracy.
Experiment 1A: Comparing Memory and Production Accuracy
  Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.891 0.092 9.723 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 2.469 0.204 12.104 <0.001
Pronoun: He (-.5) vs She (+.5) 0.155 0.172 0.900 0.368
Task: Memory (-.5) vs Production (+.5) 0.110 0.105 1.041 0.298
Pronoun (They vs He+She) * Task 1.380 0.217 6.357 <0.001
Pronoun (He vs She) * Task 0.152 0.268 0.568 0.570
Random Effects
τ00 Participant 0.468
τ11 Pronoun (They vs He + She) | Participant 2.737
τ11 Pronoun (He vs She) | Participant 0.356
ρ01 Pronoun (They vs He + She) | Participant 0.091
ρ01 Pronoun (He vs She) | Participant 0.098
N Participant 102
Observations 2448

A.1.2 Experiment 1B: Additional Results

Table A.3: Experiment 1B: Model results for the effect of Pronoun on Memory Accuracy, when the memory task was completed after the production task.
Experiment 1B: Memory
  Memory Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.949 0.095 10.028 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 1.552 0.143 10.882 <0.001
Pronoun: He (-.5) vs She (+.5) 0.096 0.179 0.540 0.589
Random Effects
τ00 Participant 0.380
N Participant 101
Observations 1212
Table A.4: Experiment 1B: Model results for the effect of Pronoun on Production Accuracy, when the memory task was completed after the production task.
Experiment 1B: Production
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 1.108 0.123 8.974 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 2.476 0.167 14.788 <0.001
Pronoun: He (-.5) vs She (+.5) 0.002 0.224 0.009 0.993
Random Effects
τ00 Participant 0.779
τ00 Name 0.008
N Participant 101
N Name 12
Observations 1212
Table A.5: Experiment 1B: Model results for the effect of Pronoun on Production Accuracy, when the memory task was completed after the production task.
Experiment 1B: Memory Predicting Production
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.767 0.080 9.538 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 1.991 0.161 12.349 <0.001
Pronoun: He (-.5) vs She (+.5) -0.040 0.209 -0.194 0.846
Memory Accuracy: Wrong (-.5) vs Right (+.5) 1.491 0.161 9.274 <0.001
Pronoun (They vs He + She) * Memory Accuracy -1.156 0.322 -3.584 <0.001
Pronoun (He vs She) * Memory Accuracy 0.240 0.418 0.575 0.565
Observations 1212

Figure A.2: Experiment 1B: By-participant mean accuracy in the multiple-choice memory task for each character’s pronouns, pet, and job, with colors indicating the character’s pronouns. Error bars indicate 95% CIs calculated over the by-participant means.
Table A.6: Experiment 1B: Model results for the effects of Character Pronoun and Question Type (character’s pronoun or pet) on Memory Accuracy, when the memory task was completed after the production task.
Experiment 1B: Memory for Pronouns vs Pets
  Memory Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.324 0.072 4.505 <0.001
Question Type: Pet (-.5) vs Pronoun (+.5) 1.255 0.103 12.217 <0.001
Character Pronoun: They (-.66) vs He (+.33) + She (+.33) 0.818 0.096 8.532 <0.001
Character Pronoun: He (-.5) vs She (+.5) 0.075 0.124 0.600 0.549
Question Type * Character Pronoun (They vs He+She) 1.474 0.192 7.688 <0.001
Question Type * Character Pronoun (He vs She) 0.022 0.231 0.095 0.924
Random Effects
τ00 Participant 0.256
τ00 Name 0.005
τ11 Question Type | Participant 0.177
ρ01 Participant 0.416
N Participant 101
N Name 12
Observations 2424

A.1.3 Comparing Experiments 1A & 1B

Figure A.3: Experiments 1A & 1B. Memory accuracy, distribution of memory responses, production accuracy, and distribution of production responses, comparing between task orders. Points indicate by-participant means, and error bars indicate 95% CIs calculated over the by-participant means.

Figure A.4: Experiments 1A & 1B. Distribution of combined memory and production accuracy, then production accuracy split by memory accuracy, comparing between task orders. Error bars indicate 95% CIs calculated over trials.
Table A.7: Experiments 1A & 1B: Model results for the effects of Pronoun and Task Order on Memory Accuracy.
Comparing Experiments 1A (Memory First) & 1B (Production First):
Memory
  Memory Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.866 0.070 12.310 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 1.582 0.101 15.715 <0.001
Pronoun: He (-.5) vs She (+.5) 0.088 0.130 0.674 0.500
Order: Memory First (-.5) vs Production First (+.5) 0.177 0.132 1.341 0.180
Pronoun (They vs He+She) * Order -0.043 0.196 -0.217 0.828
Pronoun (He vs She) * Order 0.023 0.247 0.094 0.925
Random Effects
τ00 Participant 0.406
τ00 Name 0.005
N Participant 203
N Name 12
Observations 2436
Table A.8: Experiments 1A & 1B: Model results for the effects of Pronoun and Task Order on Production Accuracy.
Comparing Experiments 1A (Memory First) & 1B (Production First):
Production
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.905 0.052 17.425 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 2.370 0.102 23.194 <0.001
Pronoun: He (-.5) vs She (+.5) 0.112 0.137 0.819 0.413
Order: Memory First (-.5) vs Production First (+.5) 0.134 0.104 1.291 0.197
Pronoun (They vs He+She) * Order -0.453 0.204 -2.215 0.027
Pronoun (He vs She) * Order -0.187 0.274 -0.683 0.495
Observations 2436
Table A.9: Experiments 1A & 1B: Model results for the effects of Pronoun, Memory Accuracy, and Task Order on Production Accuracy.
Comparing Experiments 1A (Memory First) & 1B (Production First):
Memory Predicting Production
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.729 0.056 12.928 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 2.208 0.113 19.506 <0.001
Pronoun: He (-.5) vs She (+.5) 0.078 0.146 0.536 0.592
Memory Accuracy: Wrong (-.5) vs Right (+.5) 1.354 0.113 12.005 <0.001
Order: Memory First (-.5) vs Production First (+.5) 0.075 0.113 0.667 0.505
Pronoun (They vs He + She) * Memory Accuracy -0.932 0.226 -4.115 <0.001
Pronoun (He vs She) * Memory Accuracy 0.121 0.293 0.413 0.680
Pronoun (They vs He+She) * Order -0.435 0.226 -1.923 0.055
Pronoun (He vs She) * Order -0.238 0.293 -0.813 0.416
Memory Accuracy * Order 0.274 0.226 1.214 0.225
Pronoun (They vs He+She) * Memory Accuracy * Order -0.448 0.453 -0.989 0.323
Pronoun (He vs She) * Memory Accuracy * Order 0.239 0.585 0.408 0.683
Observations 2436
Table A.10: Experiments 1A & 1B: Model results for the effects of Pronoun and Task Order on the difference between memory accuracy and production accuracy for each participant.
Comparing Experiments 1A (Memory First) & 1B (Production First):
By-Participant Differences Between Memory & Production
  Difference Score
Predictors Estimates SE z p
(Intercept) -0.002 0.012 -0.206 0.837
Order: Memory First (-.5) vs Production First (+.5) 0.002 0.024 0.068 0.946
Pronoun: They (-.66) vs He (+.33) + She (+.33) -0.181 0.026 -7.079 <0.001
Pronoun: He (-.5) vs She (+.5) -0.001 0.029 -0.040 0.968
Pronoun (They vs He+She) * Order 0.079 0.051 1.548 0.122
Pronoun (He vs She) * Order 0.027 0.058 0.464 0.643
Observations 609

Figure A.5: Experiments 1A & 1B: Correlations between by-participant random slopes
for the effect of Pronoun in each half of the data, for the memory and production tasks.

A.2 Experiment 2

A.2.1 Pet & Job Questions

As in Experiments 1A & 1B, memory for the characters’ 12 jobs was analyzed in order to make sure the task did not show floor effects, and memory for the characters’ 3 pets was analyzed as a less marked comparison to pronouns (Figure A.6). Averaged across conditions, accuracy for jobs (M = 0.37) was numerically higher than in Experiments 1A (M = 0.21) and 1B (M = 0.29). Accuracy for pets was also higher in Experiment 2 (M = 0.54) than in Experiments 1A (M = 0.41) and 1B (M = 0.43). Job and pet accuracy did not vary based on the characters’ pronouns or the PSA and Biography conditions.

Pronoun and pet questions were compared in a model including the Character’s Pronouns (contrast coded as in the main analyses), the Question Type (mean-center effects coded), and the PSA and Biography conditions as fixed effects. The initial model included interactions between Character Pronoun, PSA, and Biography as in the main analyses; the interaction between Character Pronoun and Question Type; by-participant and by-item intercepts; and by-participant and by-item slopes for Character Pronoun and Question Type. In addition to the subset of interactions between fixed effects listed above, the most complex model that converged included by-participant intercepts, by-item intercepts, and by-item slopes for Question Type (Table A.11). Participants were significantly more accurate for pronoun questions than pet questions (β = 1.03, z = 15.59, p < .001), and the interaction between Character Pronoun (they/them vs he/him + she/her) and Question Type was significant (β = 1.50, z = 12.84, p < .001). Probing this interaction indicated that there was no significant difference in accuracy between pronouns and pets for they/them characters (β = 0.05, z = 0.48, p = .63), only for he/him + she/her characters (β = 1.53, z = 18.57, p < .001). This resembles the pattern of results in Experiment 1.

Figure A.6: Experiment 2: Mean accuracy in the multiple-choice memory task for pronouns, pets, and jobs, with colors indicating the character’s pronouns. By-participant means are shown as points; error bars indicate 95% CIs calculated over the by-participant means.
Table A.11: Experiment 2: Model results for the effects of Character Pronoun, PSA, Biography, and Question Type (pronoun vs pet) on Memory Accuracy.
Experiment 2: Memory for Pronouns vs Pets
  Memory Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.838 0.089 9.400 <0.001
Question Type: Pet (-.5) vs Pronoun (+.5) 1.034 0.066 15.592 <0.001
Character Pronoun: They (-.66) vs He (+.33) + She (+.33) 0.752 0.058 12.915 <0.001
Character Pronoun: He (-.5) vs She (+.5) 0.070 0.087 0.804 0.421
PSA: Unrelated (-.5) vs PSA: Gendered Language (+.5) 0.142 0.165 0.861 0.389
Biography: He/She (-.5) vs Biography: They (+.5) -0.123 0.165 -0.747 0.455
Question Type * Character Pronoun (They vs He+She) 1.495 0.116 12.841 <0.001
Question Type * Character Pronoun (He vs She) 0.189 0.155 1.220 0.222
Character Pronoun (They vs He+She) * PSA -0.247 0.116 -2.135 0.033
Character Pronoun (He vs She) * PSA -0.026 0.138 -0.188 0.851
PSA * Biography 0.166 0.330 0.503 0.615
Character Pronoun (They vs He+She) * Biography -0.011 0.116 -0.095 0.925
Character Pronoun (He vs She) * Biography -0.008 0.138 -0.057 0.954
Pronoun (They vs He+She) * PSA * Biography 0.084 0.231 0.363 0.717
Pronoun (He vs She) * PSA * Biography 0.357 0.276 1.293 0.196
Random Effects
τ00 Participant 1.883
τ00 Name 0.011
τ11 Question Type | Name 0.015
ρ01 Name 0.099
N Name 12
N Participant 320
Observations 7680

A.2.2 Additional Figures

Figure A.7: Experiment 2: Distribution of memory and production responses.

A.2.3 Additional Model Results

Table A.12: Experiment 2: Model results for the effects of Pronoun, PSA, Biography, and Memory Accuracy on Production Accuracy.
Experiment 2: Memory Predicting Production
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 0.522 0.055 9.542 <0.001
Pronoun: They (-.66) vs He (+.33) + She (+.33) 3.130 0.110 28.527 <0.001
Pronoun: He (-.5) vs She (+.5) -0.342 0.133 -2.560 0.010
Memory Accuracy: Wrong (-.5) vs Right (+.5) 0.830 0.104 8.002 <0.001
PSA: Unrelated (-.5) vs PSA: Gendered Language (+.5) 0.075 0.104 0.719 0.472
Biography: He/She (-.5) vs Biography: They (+.5) -0.017 0.104 -0.163 0.871
Pronoun (They vs He + She) * PSA -1.896 0.219 -8.677 <0.001
Pronoun (He vs She) * PSA 0.120 0.258 0.464 0.643
Pronoun (They vs He + She) * Memory Accuracy -0.402 0.219 -1.838 0.066
Pronoun (He vs She) * Memory Accuracy 0.255 0.259 0.987 0.324
PSA * Memory Accuracy 0.240 0.207 1.156 0.248
PSA * Biography 0.377 0.207 1.820 0.069
Biography * Memory Accuracy -0.114 0.207 -0.551 0.582
Pronoun (They vs He + She) * Biography -0.304 0.219 -1.391 0.164
Pronoun (He vs She) * Biography 0.174 0.258 0.675 0.500
Pronoun (They vs He + She) * PSA * Memory Accuracy -0.237 0.437 -0.542 0.588
Pronoun (He vs She) * PSA * Memory Accuracy 0.480 0.516 0.931 0.352
PSA * Biography * Memory Accuracy 0.425 0.415 1.025 0.305
Pronoun (They vs He + She) * PSA * Biography 1.198 0.437 2.740 0.006
Pronoun (He vs She) * PSA * Biography 0.446 0.516 0.863 0.388
Pronoun (They vs He + She) * Biography * Memory Accuracy 0.830 0.437 1.899 0.058
Pronoun (He vs She) * Biography * Memory Accuracy -0.390 0.516 -0.756 0.450
Pronoun (They vs He + She) * PSA * Biography * Memory Accuracy -1.527 0.876 -1.744 0.081
Pronoun (He vs She) * PSA * Biography * Memory Accuracy -1.203 1.033 -1.164 0.244
Random Effects
τ00 Name 0.004
N Name 12
Observations 3840
Table A.13: Experiment 2: Model results for the effects of PSA and Biography on whether each participant produced singular they at least once. Participants were coded with a 1 if they produced singular they at least once in the written sentence completion task, regardless of accuracy, and with a 0 if they did not.
Experiment 2: Production of Singular They
  Produce They/Them
Predictors Log-Odds SE z p
(Intercept) -0.362 0.117 -3.086 0.002
PSA: Unrelated (-.5) vs PSA: Gendered Language (+.5) 0.927 0.235 3.946 <0.001
Biography: He/She (-.5) vs Biography: They (+.5) 0.006 0.235 0.025 0.980
PSA * Biography -0.816 0.470 -1.737 0.082
Observations 320

A.3 Experiment 3

A.3.1 Norming Study

Table A.14: Image Norming: Results. Counts and proportions
of they/them, he/him, she/her, and no pronoun responses for
each image in the norming study.

Norming Study: Pronouns Produced

Image Code

they/them

he/him

she/her

none

Images Included

gs04

19

0

6

5

gs06

20

0

5

5

gs12

24

0

4

2

gs09

24

4

0

2

gs08

22

5

0

3

gs11

21

7

0

2

Images Not Included

gs03

20

0

7

3

gs05

20

0

6

4

gs02

19

1

6

4

gs10

23

1

3

3

gs07

23

1

2

4

gs01

20

6

1

3

Totals

Proportion

0.71

0.07

0.11

0.11

Count

255

25

40

40


A.3.2 Additional Survey Results

Table A.15: Experiment 3: Additional Demographics. Race/ethnicity has a higher total, as participants could select more than one option.

Experiment 3: Additional Participant Demographics

English Experience

162

Native (learned from birth)

151

Fully competent in speaking, listening, reading, and writing, but not native

8

Prefer not to answer / Missing data

3

Education

162

Less than high school

2

High school graduate

23

Some college

24

2-year degree

16

4-year degree

63

Professional degree

23

Doctorate

7

Prefer not to answer / Missing data

4

Race/Ethnicity

176

American Indian or Alaska Native

4

Asian

9

Black, African American, or African

21

Hispanic, Latino, or Spanish

19

Middle Eastern or North African

2

Native Hawaiian or Pacific Islander

0

White

112

I use a different term

2

Prefer not to answer / Missing data

7

Total Participants

162


The difference in naturalness ratings between singular they coreferring with generic and quantified referents [Indefinite] and with masculine, feminine, and gender-neutral names [Proper Name] was tested using a linear mixed-effects model, with Referent Type mean-center effects coded. The model also included by-item intercepts and by-participant slopes. Ratings were mean centered such that a score of 0 indicated the center of the Likert scale, so the significant intercept means that both types of sentences were rated as more natural than unnatural (β = 1.07, t = 10.23, p < .001). The significant effect of Referent Type indicated that singular they was rated as more natural with indefinites than with proper names (β = 0.98, t = 4.89, p < .001) (Table A.16).

Table A.16: Experiment 3: Naturalness Ratings. Model results for the effect of Referent Type (generic, each, every vs masculine, feminine, gender-neutral names) on sentence naturalness ratings for singular they.
Experiment 3: Naturalness Ratings for Singular They
  Naturalness Ratings
Predictors Estimates SE t p
(Intercept) 1.067 0.104 10.234 <0.001
Referent Type: Proper Name (-.5) vs Indefinite (+.5) 0.981 0.200 4.893 <0.001
Random Effects
σ2 1.294
τ00 Participant 0.938
τ00 Item 0.022
τ11 Referent Type | Participant 3.229
ρ01 Participant -0.555
ICC 0.577
N Item 6
N Participant 159
Observations 954
Marginal R2 / Conditional R2 0.073 / 0.608
Table A.17: Experiment 3: Gender Beliefs. Question texts, distributions, means,
and SDs for items in the gender beliefs scale (Nagoshi et al., 2008).

Experiment 3: Gender Binary & Gender Essentialism Beliefs

Item

M

SD

Distribution

I avoid people on the street whose gender is unclear to me.

0.74

1.32

I am uncomfortable around people who don't conform to traditional gender roles, e.g., aggressive women or emotional men.

1.14

1.66

I would be upset if someone I'd known a long time revealed to me that they used to be another gender.

1.40

1.81

I think there is something wrong with a person who says that they are neither a man nor a woman.

1.97

2.25

I believe that a person can never change their gender.

2.08

2.29

When I meet someone, it is important for me to be able to identify them as a man or a woman.

2.31

2.11

I don't like it when someone is flirting with me, and I can't tell if they are a man or a woman.

2.35

2.16

A person's genitalia define what gender they are, e.g., a penis defines a person as being a man, a vagina defines a person as being a woman.

2.52

2.41

I believe that the male/female dichotomy is natural.

3.50

2.17

Total

18.02

14.38

0: Strongly Disagree – 6: Strongly Agree


A.3.3 Additional Pronoun Accuracy Results

Figure A.8: Experiment 3: [A] Distribution of pronouns produced for each of the 6 character images (Drucker, 2019). [B] Mean accuracy for each of the 18 name + image + pronoun combinations, summarized across Nametag & Introduction conditions. Error bars indicate 95% CIs calculated over the by-item means.

A.3.4 Participant Covariate Analyses

Figure A.9: Experiment 3: Distribution of mean-centered, rescaled participant covariates tested for model inclusion.
Table A.18: Experiment 3: Model results for the effects of Pronoun, Nametag, Introduction, Gender Beliefs, and Familiarity with Pronoun-Sharing on Accuracy.
Experiment 3: Participant Covariates
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 11.431 1.398 8.175 <0.001
Pronoun Pair: T|HS (-.66) vs HS|T (+.33) + HS|SH (+.33) 4.009 1.102 3.638 <0.001
Pronoun Pair: HS|T (-.5) vs HS|SH (+.5) -0.900 0.730 -1.233 0.218
Nametag (-.5; +.5) 1.332 1.136 1.173 0.241
Intro (-.5; +.5) 0.462 1.188 0.389 0.697
Gender Beliefs (mean-centered, higher binary endorsement +) -10.067 2.927 -3.439 0.001
Familiarity with Pronoun Sharing (mean-centered) -3.232 1.655 -1.953 0.051
Pronoun (T|HS vs HS|T + HS|SH) * Nametag 2.734 0.940 2.908 0.004
Pronoun (HS|T vs HS|SH) * Nametag -0.326 1.189 -0.274 0.784
Pronoun (T|HS vs HS|T + HS|SH)* Intro -3.445 1.223 -2.816 0.005
Pronoun (HS|T vs HS|SH) * Intro -1.450 1.416 -1.024 0.306
Nametag * Intro -1.428 2.286 -0.625 0.532
Pronoun (T|HS vs HS|T + HS|SH) * Gender Beliefs 6.504 1.841 3.533 <0.001
Pronoun (HS|T vs HS|SH) * Gender Beliefs -4.167 2.573 -1.620 0.105
Intro * Gender Beliefs 8.753 5.573 1.571 0.116
Pronoun (T|HS vs HS|T + HS|SH) * Familiarity -2.048 1.127 -1.817 0.069
Pronoun (HS|T vs HS|SH) * Familiarity 1.067 1.452 0.735 0.462
Nametag * Gender Beliefs 3.090 4.881 0.633 0.527
Intro * Familiarity -0.945 3.261 -0.290 0.772
Pronoun (T|HS vs HS|T + HS|SH) * Nametag * Intro 6.163 1.651 3.733 <0.001
Pronoun (HS|T vs HS|SH) * Nametag * Intro 0.663 2.248 0.295 0.768
Pronoun (T|HS vs HS|T + HS|SH) * Gender Beliefs * Intro 13.319 4.284 3.109 0.002
Pronoun (HS|T vs HS|SH) * Gender Beliefs * Intro 2.599 4.359 0.596 0.551
Pronoun (T|HS vs HS|T + HS|SH) * Nametag * Gender Beliefs -11.034 4.642 -2.377 0.017
Pronoun (HS|T vs HS|SH) * Nametag * Gender Beliefs -9.794 5.381 -1.820 0.069
Pronoun (T|HS vs HS|T + HS|SH) * Intro * Familiarity 6.643 2.567 2.588 0.010
Pronoun (HS|T vs HS|SH) * Intro * Familiarity 4.790 3.032 1.580 0.114
Random Effects
τ00 Participant 34.933
τ00 Character 3.143
N Participant 158
N Character 18
Observations 4640

A.4 Experiment 4

A.4.1 Additional Survey Results

Table A.19: Experiment 4: Gender Beliefs. Question texts, distributions, means, and SDs for items in the gender beliefs scale (Nagoshi et al., 2008).

Experiment 4: Gender Binary & Gender Essentialism Beliefs

Item

M

SD

Distribution

I avoid people on the street whose gender is unclear to me.

0.67

1.30

I think there is something wrong with a person who says that they are neither a man nor a woman.

0.70

1.42

I am uncomfortable around people who don't conform to traditional gender roles, e.g., aggressive women or emotional men.

0.90

1.65

I believe that a person can never change their gender.

1.03

1.77

I would be upset if someone I'd known a long time revealed to me that they used to be another gender.

1.37

2.16

A person's genitalia define what gender they are, e.g., a penis defines a person as being a man, a vagina defines a person as being a woman.

1.70

1.95

When I meet someone, it is important for me to be able to identify them as a man or a woman.

1.87

1.81

I don't like it when someone is flirting with me, and I can't tell if they are a man or a woman.

2.63

2.14

I believe that the male/female dichotomy is natural.

3.07

2.00

Total

13.93

12.56

0: Strongly Disagree – 6: Strongly Agree


As in Experiment 3, the difference in naturalness ratings between singular they coreferring with generic and quantified referents [Indefinite] and with masculine, feminine, and gender-neutral names [Proper Name] was tested using a linear mixed-effects model (Table A.20). Referent Type was mean-center effects coded, and the model also included by-item intercepts and by-participant slopes. Ratings were mean centered such that a score of 0 indicated the center of the Likert scale, so the significant intercept means that both types of sentences were rated as more natural than unnatural (β = 1.07, t = 3.89, p < .01). There was no significant difference between proper name and indefinite referents (β = 0.19, t = 0.36, p = .73).

Table A.20: Experiment 4: Model results for the effect of Referent Type (generic, each, every vs masculine, feminine, gender-neutral names) on sentence naturalness ratings for singular they.
Experiment 4: Naturalness Ratings for Singular They
  Naturalness Ratings
Predictors Estimates SE t p
(Intercept) 1.072 0.276 3.892 0.004
Referent Type: Proper Name (-.5) vs Indefinite (+.5) 0.189 0.519 0.364 0.726
Random Effects
σ2 1.420
τ00 Participant 0.807
τ00 Item 0.247
τ11 Referent Type | Participant 2.200
ρ01 Participant -0.317
ICC 0.530
N Item 6
N Participant 30
Observations 180
Marginal R2 / Conditional R2 0.003 / 0.532

A.4.2 Match Judgments

Table A.21: Experiment 4: Test Trial Match Rates. Model results for the effect of Pronoun Pair on the likelihood of judging the story to match the scene in test trials (1 = match = correct).
Experiment 4: Match Judgment Rates in Test Trials
  Match
Predictors Log-Odds SE z p
(Intercept) 3.087 0.254 12.176 <0.001
Pronoun Pair: They|HeShe (-.66) vs HeShe|They (+.33)
+ HeShe|SheHe (+.33)
0.330 0.245 1.345 0.179
Pronoun Pair: HeShe|They (-.5) vs HeShe|SheHe (+.5) 0.255 0.275 0.926 0.355
Random Effects
τ00 Participant 1.536
τ11 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant 0.341
τ11 Pronoun Pair (HS|T vs HS|SH) | Participant 0.120
ρ01 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant 0.277
ρ01 Pronoun Pair (HS|T vs HS|SH) | Participant -0.637
N Participant 30
Observations 2877
Table A.22: Experiment 4: Wrong Pronoun Trial Match Rates. Model results for the effect of Pronoun on judging the story to match the scene in wrong pronoun trials (1 = match = incorrect). There were no wrong pronoun trials for the HeShe|They condition, so the They|HeShe (they/them character referred to with he/him or she/her, whichever the competitor did not use) and the HeShe|SheHe (he/him and she/her characters referred to with they/them) conditions were mean-center effects coded.
Experiment 4: Match Judgment Rates in Wrong Pronoun Trials
  Match
Predictors Log-Odds SE z p
(Intercept) -0.373 0.328 -1.137 0.255
Correct Pronoun: They (-.5) vs He + She (+.5) -0.076 0.362 -0.211 0.833
Random Effects
τ00 Story 0.190
τ00 Participant 2.206
τ11 Pronoun | Story 0.140
ρ01 Pronoun | Story -0.507
N Participant 30
N Story 35
Observations 238
Table A.23: Experiment 4: Test Trial Match RT. Model results for the effect of Pronoun Pair on reaction times for test trial match judgments.
Experiment 4: Match Judgment RT in Test Trials
  RT (ms)
Predictors Estimates SE t p
(Intercept) 3888.739 32.056 121.311 <0.001
Pronoun Pair: They|HeShe (-.66) vs HeShe|They
(+.33) + HeShe|SheHe (+.33)
-506.668 40.874 -12.396 <0.001
Pronoun Pair: HeShe|They (-.5) vs HeShe|SheHe (+.5) 32.961 43.864 0.751 0.452
Random Effects
σ2 0.014
τ00 Story 126
τ00 Participant 305041
τ11 Pronoun Pair (T|HS vs HS|T + HS|SH) | Story 397848
τ11 Pronoun Pair (HS|T vs HS|SH) | Story 592385
τ11 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant 114829
τ11 Pronoun Pair (HS|T vs HS|SH) | Participant 151199
ρ01 Pronoun Pair (T|HS vs HS|T + HS|SH) | Story -0.324
ρ01 Pronoun Pair (HS|T vs HS|SH) | Story -0.961
ρ01 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant -0.833
ρ01 Pronoun Pair (HS|T vs HS|SH) | Participant 0.092
N Participant 30
N Story 60
Observations 2842

A.4.3 Additional Figures

Figure A.10: Experiment 4: Name Window. Looks to the named characters at the beginning of the story, with the first name beginning at 0ms and the second name beginning ~1000ms later.

Figure A.11: Experiment 4: Preview Window. Looks to each character during the 1000ms preview time before audio started, split by the character’s pronouns.

A.4.4 Additional Eyetracking Results

The Order of Mention effect was estimated separately for each Pronoun Pair condition by running three models, each with one Pronoun Pair condition coded as 0 and the other two Pronoun Pair conditions coded as 1. The maximal random effects structures that converged (Bates et al., 2015; R Core Team, 2023; Voeten, 2023) only included a subset of those in the main model (Table 4.2). For the by-participant effects, each kept all 3 slopes (AR(1), Order, and Trial Number). For the by-item effects, the HeShe|They reference model included both slopes (Order, Trend), the HeShe|SheHe reference model only included slopes for Trend, and the They|HeShe reference model only included intercepts.

In HeShe|SheHe trials (Table A.24), neither the main effect of Order (β = 0.05, z = 0.47, p = .64) nor its interaction with Pronoun Pair (β = 0.06, z = 0.70, p = .49) were significant. In HeShe|They trials (Table A.25), the main effect of Order was significant as anticipated, with participants more likely to be looking at target characters who had been named first in the story than target characters who had been named second (β = 0.22, z = 2.20, p < .05). The interaction between Order and Pronoun Pair was also significant (β = -0.20, z = -2.20, p < .05). Probing this interaction indicated that listeners were less likely to be looking at the target in HeShe|They trials than in HeShe|SheHe and They|HeShe trials when the target was mentioned second (β = 0.17, z = 2.51, p < .05), but not when the target was mentioned first (β = -0.04, z = -0.62, p = .54). In They|HeShe trials (Table A.26), neither the main effect of Order (β = 0.00, z = -0.03, p = .97) nor its interaction with Pronoun Pair (β = 0.14, z = 1.48, p = .14) were significant.

Table A.24: Experiment 4: Model results for the effects of Pronoun Pair and Order on the likelihood of looking at the target character (=1) or not (=0). The HeShe|SheHe condition is coded as 0, and the HeShe|They and They|HeShe conditions are coded as 1.
Experiment 4: Looks to the Target Character
(HeShe|SheHe As Reference Group)
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.480 0.089 -50.111 <0.001
AR(1) (0, 1) 9.971 0.109 91.859 <0.001
Pronoun Pair: HeShe|SheHe (0) vs HeShe|They (1) +
They|HeShe (1)
-0.347 0.047 -7.454 <0.001
Order: Target Mentioned Second (-.5) vs First (+.5) 0.046 0.098 0.467 0.641
Trend (mean-centered) 0.049 0.038 1.309 0.191
Pronoun Pair * Order 0.065 0.093 0.698 0.485
Random Effects
τ00 Story 0.005
τ00 Participant 0.196
τ11 Trend | Story 0.004
τ11 AR(1) | Participant 0.268
τ11 Order | Participant 0.110
τ11 Trial Number | Participant 0.027
ρ01 Trend | Story 0.018
ρ01 AR(1) | Participant -0.296
ρ01 Order | Participant 0.084
ρ01 Trial Number | Participant 0.508
N Participant 30
N Story 60
Observations 296022
Table A.25: Experiment 4: Model results for the effects of Pronoun Pair and Order on the likelihood of looking at the target character (=1) or not (=0). The HeShe|They condition is coded as 0, and the HeShe|SheHe and They|HeShe conditions are coded as 1.
Experiment 4: Looks to the Target Character
(HeShe|They As Reference Group)
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.768 0.088 -54.437 <0.001
AR(1) (0, 1) 9.988 0.110 90.880 <0.001
Pronoun Pair: HeShe|They (0) vs HeShe|SheHe (1) + They|HeShe (1) 0.062 0.046 1.329 0.184
Order: Target Mentioned Second (-.5) vs First (+.5) 0.218 0.099 2.196 0.028
Trend (mean-centered) 0.044 0.038 1.152 0.249
Pronoun Pair * Order -0.204 0.093 -2.205 0.027
Random Effects
τ00 Story 0.006
τ00 Participant 0.189
τ11 Order | Story 0.025
τ11 Trend | Story 0.005
τ11 AR(1) | Participant 0.274
τ11 Order | Participant 0.106
τ11 Trial Number | Participant 0.032
ρ01 Order | Story 0.951
ρ01 Trend | Story 0.130
ρ01 AR(1) | Participant -0.289
ρ01 Order | Participant 0.097
ρ01 Trial Number | Participant 0.544
N Participant 30
N Story 60
Observations 296022
Table A.26: Experiment 4: Model results for the effects of Pronoun Pair and Order on the likelihood of looking at the target character (=1) or not (=0). The They|HeShe condition is coded as 0, and the HeShe|They and HeShe|SheHe conditions are coded as 1.
Experiment 4: Looks to the Target Character
(They|HeShe As Reference Group)
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.901 0.089 -54.975 <0.001
AR(1) (0, 1) 9.983 0.110 90.621 <0.001
Pronoun Pair: They|HeShe (0) vs HeShe|SheHe (1) + HeShe|They (1) 0.273 0.046 5.884 <0.001
Order: Target Mentioned Second (-.5) vs First (+.5) -0.003 0.099 -0.034 0.973
Trend (mean-centered) 0.047 0.037 1.269 0.205
Pronoun Pair * Order 0.138 0.093 1.479 0.139
Random Effects
τ00 Story 0.005
τ00 Participant 0.192
τ11 AR(1) | Participant 0.277
τ11 Order | Participant 0.110
τ11 Trial Number | Participant 0.030
ρ01 AR(1) | Participant -0.301
ρ01 Order | Participant 0.101
ρ01 Trial Number | Participant 0.493
N Participant 30
N Story 60
Observations 296022
Table A.27: Experiment 4: Model results for the effects of Target Pronoun and Order on the likelihood of looks to the target character (=1) or not (=0), during the window starting 200ms after pronoun onset and ending at 1210ms, the earliest shape word onset.
Experiment 4: Target Pronoun
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.723 0.084 -56.372 <0.001
AR(1) (0, 1) 9.999 0.108 92.291 <0.001
Target Pronoun: They (-.66) vs He (+.33) + She (+.33) 0.275 0.047 5.864 <0.001
Target Pronoun: She (-.5) vs He (+.5) 0.039 0.053 0.728 0.467
Order: Target Mentioned Second (-.5) vs First (+.5) 0.085 0.077 1.099 0.272
Trend (mean-centered) 0.047 0.037 1.291 0.197
Target Pronoun (They vs He + She) * Order 0.139 0.094 1.478 0.140
Target Pronoun (She vs He) * Order 0.200 0.107 1.867 0.062
Random Effects
τ00 Story 0.005
τ00 Participant 0.191
τ11 AR(1) | Participant 0.278
τ11 Order | Participant 0.111
τ11 Trial Number | Participant 0.029
τ11 Order * Trial Number | Participant 0.014
ρ01 AR(1) | Participant -0.300
ρ01 Order | Participant 0.103
ρ01 Trial Number | Participant 0.488
ρ01 Order * Trial Number | Participant -0.339
N Participant 30
N Story 60
Observations 296022
Table A.28: Experiment 4: Trend Interactions. Model results for the interactions between Trend, Pronoun Pair, and Order on the likelihood of looking at the target character (=1) or not (=0).
Experiment 4: Interactions with Trend
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.711 0.083 -56.631 <0.001
AR(1) (0, 1) 9.974 0.109 91.708 <0.001
Pronoun Pair: They|HeShe (-.66) vs HeShe|They (+.33) +
HeShe|SheHe
(+.33)
0.284 0.061 4.624 <0.001
Pronoun Pair: HeShe|They (-.5) vs HeShe|SheHe (+.5) 0.264 0.073 3.594 <0.001
Order: Target Mentioned Second (-.5) vs First (+.5) 0.085 0.078 1.082 0.279
Trend (mean-centered) 0.058 0.037 1.587 0.113
Pronoun Pair (They|HeShe vs HeShe|They + HeShe|SheHe) * Order 0.132 0.094 1.403 0.161
Pronoun Pair (HeShe|They vs HeShe|SheHe) * Order -0.188 0.107 -1.755 0.079
Pronoun Pair (T|HS vs HS|T + HS|SH) * Trend -0.109 0.079 -1.392 0.164
Pronoun Pair (HS|T vs HS|SH) * Trend -0.135 0.089 -1.519 0.129
Order * Trend 0.032 0.073 0.438 0.661
Pronoun Pair (T|HS vs HS|T + HS|SH) * Order * Trend -0.165 0.157 -1.054 0.292
Pronoun Pair (HS|T vs HS|SH) * Order * Trend -0.115 0.176 -0.654 0.513
Random Effects
τ00 Participant 0.193
τ11 AR(1) | Participant 0.268
τ11 Order | Participant 0.118
τ11 Trial Number | Participant 0.030
τ11 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant 0.038
τ11 Pronoun Pair (HS|T vs HS|SH) | Participant 0.065
ρ01 AR(1) | Participant -0.285
ρ01 Order | Participant 0.063
ρ01 Trial Number | Participant 0.463
ρ01 Pronoun Pair (T|HS vs HS|T + HS|SH) | Participant 0.441
ρ01 Pronoun Pair (HS|T vs HS|SH) | Participant 0.272
N Participant 30
Observations 296022
Table A.29: Experiment 4: AR(1) Interactions. Model results for the interactions between AR(1), Pronoun Pair, and Order on the likelihood of looking at the target character (=1) or not (=0).
Experiment 4: Interactions with AR(1)
  Looks to Target
Predictors Log-Odds SE z p
(Intercept) -4.708 0.085 -55.706 <0.001
AR(1) (0, 1) 9.972 0.109 91.872 <0.001
Pronoun Pair: They|HeShe (-.66) vs HeShe|They (+.33) +
HeShe|SheHe
(+.33)
0.322 0.060 5.378 <0.001
Pronoun Pair: HeShe|They (-.5) vs HeShe|SheHe (+.5) 0.276 0.066 4.170 <0.001
Order: Target Mentioned Second (-.5) vs First (+.5) 0.081 0.083 0.976 0.329
Trend (mean-centered) 0.052 0.037 1.409 0.159
Pronoun Pair (They|HeShe vs HeShe|They + HeShe|SheHe) * Order 0.070 0.120 0.582 0.561
Pronoun Pair (HeShe|They vs HeShe|SheHe) * Order -0.016 0.132 -0.119 0.906
Pronoun Pair (T|HS vs HS|T + HS|SH) * AR(1) -0.111 0.098 -1.131 0.258
Pronoun Pair (HS|T vs HS|SH) * AR(1) -0.002 0.112 -0.014 0.989
AR(1) * Order 0.022 0.093 0.233 0.815
Pronoun Pair (T|HS vs HS|T + HS|SH) * Order * AR(1) 0.179 0.195 0.917 0.359
Pronoun Pair (HS|T vs HS|SH) * Order * AR(1) -0.447 0.223 -2.001 0.045
Random Effects
τ00 Story 0.006
τ00 Participant 0.197
τ11 Trial Number | Story 0.012
τ11 AR(1) | Participant 0.270
τ11 Order | Participant 0.113
τ11 Trial Number | Participant 0.027
ρ01 Trial Number | Story 0.048
ρ01 AR(1) | Participant -0.293
ρ01 Order | Participant 0.088
ρ01 Trial Number | Participant 0.485
N Participant 30
N Story 60
Observations 296022