3  Experiment 3

Effects of Including Pronouns on Nametags and in Introductions on Spoken Production


3.1 Motivation

Encouraging people to include their pronouns when introducing themselves and providing space for people to indicate their pronouns in display names, email signatures, and nametags are common recommendations for making environments inclusive of TGD people (Richards & Barker, 2013). Options to specify your pronouns are currently included in many social media platforms such as Instagram and LinkedIn, in institutional platforms such as Brightspace and Slack, and in tools such as Zoom and Github. Ideally, group norms of indicating pronouns makes disclosure less marked, which supports the individuals who need to explicitly state their pronouns in order to avoid being misgendered—which is the majority of people who use they/them pronouns, in the majority of contexts.

Some recent research has investigated the effects of pronoun-sharing practices. Both TGD and other LGBQ+ people evaluated a potential workplace more positively when a biography of a staff member included that she used she/her pronouns, when she/her would have been assumed (Johnson et al., 2021). This suggests that other people indicating their pronouns can act as an identity-safety cue when TGD and LGBQ+ people are forming initial appraisals of an environment, so they may be more likely to choose that environment and be more comfortable being out.1 Additionally, directly explaining that a character used they/them supported people’s ability to correctly comprehend they as singular, not plural (Arnold et al., 2021) (see Section 0.4.4).

An unanswered question, however, is whether pronoun-sharing practices affect people’s language production—specifically, if they reduce the frequency of misgendering and support accurate use of singular they. While creating an environment where TGD people feel safe asking for the correct gendered language and where others understand the usage of singular they are both good outcomes, EDI practices need to affect allies’ concrete behavior (e.g., Kattari et al., 2018), beyond their knowledge and attitudes.

In the first two experiments, I have argued for a model where learning to use they/them pronouns requires retrieving information about a person’s pronouns from episodic memory, instead of inferring pronouns from morphosyntactic features of their name or from an inference about their gender. Experiment 1 demonstrated that when a character’s pronouns cannot be inferred from their name, people can learn to associate pronouns with the character and use that information in language production, and Experiment 2 demonstrated that accurately producing they/them pronouns can be supported by providing participants with information about why paying attention to a person’s pronouns is important. Experiment 3 moves from testing an intervention that may influence how participants pay attention to and attempt to recall information about a person’s pronouns to comparing different ways of presenting information about pronouns. The current experiment investigates two practices for providing explicit information about what pronouns a person uses: stating pronouns when introducing someone, which makes this information highly salient at the beginning of the conversation, and including pronouns on someone’s nametag/display name, which keeps this information accessible throughout the conversation. Ideally, pronoun-sharing practices will support production of singular they. However, pronoun-sharing practices may not consistently impact pronoun production—at all, or at least not to a degree that has real-world applicability. The frequency of seemingly-counterintuitive errors like “she uses they/them” shows that speakers may have information about a person’s pronouns available, but still produce the incorrect pronoun.

3.2 Methods

The design and analysis plan were preregistered on the Open Science Framework. Sources and attributions for the images are included with the materials, and the edited images are available upon request. The de-identified data and analysis code are available at this dissertation’s Github repository.

3.2.1 Participants

A simulation-based power analysis using the simr package in R (Green & MacLeod, 2016) estimated the number of participants required to detect a 2-way interaction between the condition manipulation and pronoun type with the same effect size (β = -1.91, OR = 0.15) as the production task in Experiment 2. This indicated that 156 participants, each completing 30 trials, would have 0.93 [0.91, 0.94] power at α = .05 to detect the interaction.

Participants were recruited from Prolific (Peer et al., 2022) and required to be over the age of 18, be native or advanced speakers of English, and have a device with a microphone to record audio. The task took approximately 20 minutes. 162 participants are included in the analysis, with an additional 11 participants excluded for stopping the study before completing 25 test trials, having technical errors saving data, or having >5 test trial recordings that did not include a response to the task. Participant demographics are described in Table 3.1.

3.2.2 Materials

Characters

Participants were introduced to 3 characters, each of whom was associated with a set of pronouns (1 he/him, 1 she/her, 1 they/them), a name, an image, a pictured brother character, and a pictured sister character. The character images were selected from The Gender Spectrum Collection, a stock photo library created to provide a diverse range of images of transgender and nonbinary people (Drucker, 2019). The sibling images were selected from a free-use stock photo database. Each image was edited to show 1 person from the shoulders up with a white background. Nametags were shown in white text on a black bar along the bottom of the image, resembling display names in the Zoom interface. For characters, the nametags showed the first name and, depending on the condition, their pronouns. For siblings, the nametags showed [character name]’s [brother/sister], in order to limit the number of names participants needed to learn and to elicit possessive pronouns referring to the character.

Across the lists, there were 6 images for the main characters. These were selected by conducting a norming study on 12 images. Participants (N = 30) on Prolific saw each image paired with a sentence completion prompt, which referred to the pictured person as “this person” and did not include a name (e.g., This person made a mug of tea. Before it cooled…). No information about pronouns was included, either in the instructions or in the prompts. Participants wrote a completion to each prompt, in order to measure which pronouns, if any, were chosen to refer to the character (Table A.14). Unlike in the first two studies, participants produced they/them frequently (71% of responses). This is potentially due to prompts not including names, as well as a different participant population on Prolific compared to MTurk. From the results of this norming study, 3 images where people did not produce she/her pronouns and 3 images where people did not produce he/him pronouns were selected. The goal was to create stimuli where we can expect that participants are primarily choosing between he/him and they/them or between she/her and they/them, not between all three, which may involve a different mechanism. This design allows us to test if including pronouns on nametags and in introductions increases accuracy for they/them, as compared to the pronouns that speakers would typically have defaulted to. This is the situation in which many people who use they/them find themselves, where they want people to stop using he/him or she/her pronouns. Additionally, the decision between he/him and she/her involves additional social dynamics that are outside the scope of the current study.

Participants were randomly assigned to 1 of 6 lists, in order to counterbalance the images and names associated with the character who uses they/them. Out of the 6 images, 3 appeared twice with he/him and once with they/them across lists, and 3 appeared twice with she/her and once with they/them across lists. There were 6 names, all gender-neutral: Alex, Casey, Jaime, Jordan, Sam, Taylor (Flowers, 2015). While people who use they/them pronouns use a variety of names, this experiment uses gender-neutral names because counterbalancing the gender associations of the names within lists was not feasible. Critically, across lists they/them appears once with each image and once with each name, in order to avoid confounding interpretations about which aspects of a person’s name or appearance may make it easier for someone to learn that they use they/them pronouns.

Pronoun Elicitation Task

Following a task established by Pozzan & Antón-Méndez (2017), each trial showed 2 characters in the center of the screen, with their siblings in the 4 corners (Figure 3.1). An animation showed an object moving from a character to one of their siblings, and participants verbally described what happened. This was designed to elicit possessive pronouns about the target character, e.g., Jaime gave the apple to their brother. It also allowed participants to produce subject pronouns referring to the character, e.g., They gave the apple to their brother, or to avoid pronouns entirely, e.g., Jaime gave the apple to Jaime’s brother. Trials manipulated which pronouns the two pictured characters used [Pronoun Pair]: they/them targets with he/him or she/her distractors [They|HeShe], he/him and she/her targets with they/them distractors [HeShe|They], and he/him and she/her targets with she/her or he/him distractors [HeShe|SheHe]. Trials also counterbalanced whether the object was passed to the brother or the sister and the locations of the characters. These trial frames were identical for each of the 6 character lists. Unlike Pozzan & Antón-Méndez (2017), no filler trials were included, as it was not possible to conceal that the study was targeting pronoun production.

Figure 3.1: Experiment 3: Example Trial in the +⁠Nametag condition, which preferentially
elicited “Jaime gave the apple to their brother.”

3.2.3 Procedure

Introductions to Characters

Participants were randomly assigned to 1 of 4 between-participants conditions, manipulating what information about pronouns was given [+⁠Nametag vs –⁠Nametag; +⁠Introduction vs –⁠Introduction], then to 1 of 6 lists within each condition, counterbalancing the images and names of the characters. First, participants read introductions to 3 characters (1 he/him, 1 she/her, 1 they/them) (Figure 3.2). The characters were introduced by name, and in the +⁠Introduction condition, their pronouns were explicitly stated (e.g., This is Jaime, who uses they/them pronouns). The images associated with each character included their name, resembling a display name in Zoom; in the +⁠Nametag conditions, the images also included the character’s pronouns in parentheses after their name. Each character had a brother and a sister, whose nametags indicated their relationship to the character. In all conditions, the facts about the character and the introductions to their siblings included 3 instances of the character’s pronouns, with the character facts preceding the sibling introductions to reduce the ambiguity of singular they (i.e., not plural referring to the character and the sibling introduced next).

Figure 3.2: Experiment 3: Stimuli and Procedure. [A] Example stimuli for the introductions to the characters, shown for the +⁠Nametag +⁠Introduction and –⁠Nametag –⁠Introduction conditions. [B] Experiment procedure.

Speech Production

Participants saw 1 example trial for each character, which demonstrated the frame [Name] gave the [object] to [their/his/her] [brother/sister] and included another instance of the character’s pronouns. Participants then completed 1 practice trial for each character, to learn the timing of the task. Each scene, with the object moving from beside a character to beside their brother or sister, took a total of 5 seconds. Then the microphone recorded for 8 seconds, with the images remaining on the screen. After each practice trial, participants read feedback in the frame Did you say something like, “[Name] gave the [object] to [their/his/her] [brother/sister]?” At this point in the experiment, participants in the condition where the characters’ pronouns were not directly indicated (–⁠Nametag –⁠Introduction) had observed 5 examples of each characters’ pronouns. Participants then completed 30 test trials, which did not include feedback. The order of all trials was randomized. Trials were divided evenly between 3 within-subjects conditions, which varied the pronouns of both the target character and the other pictured character [Pronoun Pair: They|HeShe; HeShe|They; HeShe|SheHe].

Survey

After the speech production task, participants completed a survey measuring their prior beliefs about singular they and TGD identities. First, they judged 6 sentences using singular they: coreferring with generic antecedents (e.g., the ideal barista), quantified antecedents (e.g., each dog owner, every music fan), and proper names (masculine, feminine, and gender-neutral). Items were drawn from Conrod (2019), and participants rated them on a Likert scale with 1 being “very unnatural” and 7 being “very natural.” Second, participants were asked about their prior familiarity with using they/them and pronoun-sharing practices. They could choose one or more options: use they/them for themself, close to someone who uses they/them, have met someone who uses they/them, have heard about using they/them but have not met anyone who does, and have not heard anything about using they/them. For including pronouns in introduction and in places like nametags or signatures, participants indicated frequency in the groups they were a part of (all, most, some, a few, none) and for themselves (always, usually, sometimes, rarely, never because prefer not to, never because had not heard of it). Third, participants completed Nagoshi et al.’s Transphobia Scale, which measures endorsement of gender essentialism and the gender binary, and discomfort with people who violate these expectations (Nagoshi et al., 2008; see also Tebbe & Moradi, 2012); this is referred to as the Gender Beliefs measure from here forward.

Finally, participants completed demographic questions. They were asked about their age, gender, and sexuality, as prior work indicates that being younger and part of the LGBTQ+ community correlates with higher acceptability ratings for singular they (Camilliere et al., 2021; Conrod, 2019; Hekanaho, 2020; Hernandez, 2020). The question about gender was two steps: a free-response box, then options to indicate whether their gender was the same or different than the sex indicated on their original birth certificate. This follows recommendations for identifying the broadest set of TGD people: anyone whose gender does not match their sex assigned at birth, of whom not all call themselves transgender. This format also accounts for the fact that terms for gender vary widely, allowing participants to choose the language that best describes them, but avoiding relying on terms that many participants may not be familiar with (Ansara & Hegarty, 2014; Cameron & Stinson, 2019; NASEM, 2022; Zimman, 2017). In addition to or instead of the options for sex assigned at birth, participants could indicate whether they considered themselves cisgender, transgender, or neither. Although these factors do not relate directly to the current research questions, participants were also asked about their race, ethnicity, and education level, in order to characterize the participant sample (Buchanan et al., 2021). All demographic questions included the option to not respond. The experiment was coded and hosted using PCIbex (Zehr & Schwartz, 2018).

3.3 Predictions

Like in the first two experiments, we expect to observe lower accuracy for they/them characters compared to he/him and she/her characters, but recall that the characters in Experiment 3 differ in several ways. First, instead of masculine or feminine names, where two thirds of the characters used the expected he/him or she/her and one third used they/them, all of the names here are gender-neutral. If participants rely on lexical knowledge about the gender associations of the name to select a pronoun, responses would be split between he/him or she/her, instead of being strongly biased to one or the other. Second, the characters now include images, which provide additional information that participants may be using to make an inference about the character’s gender and then to select pronouns. Third—while still brief—the introductions to the characters here contain more repetitions of the character’s pronouns. While the introductions to the characters in Experiments 1 & 2 stated their pronouns directly but did not use pronouns to refer to the character (e.g., This is Emily, who uses they/them pronouns. Emily…), participants in all four Experiment 3 conditions see each character’s pronouns used three times in the introductions, once in the example trials, and once in the practice trials. Putting aside potential differences between spoken and written production for the moment, this means that we may see higher accuracy for they/them in the –⁠Nametag –⁠Introduction condition than in Experiments 1 & 2, since information about the characters’ pronouns is presented multiple times.

The primary hypotheses concern whether the Nametag and Introduction conditions attenuate the lower accuracy of singular they. Including pronouns in introductions makes the information about pronouns salient at the beginning. If speakers use this information in production—presumably by retrieving it from episodic memory—we would expect to see a smaller penalty for they/them in the two +⁠Introduction conditions. Alternatively, if speakers do not remember the information about the character’s pronouns from the beginning of the experiment, or if this information is not used when selecting the pronoun to produce, we would see no differences between the +⁠Introduction and –⁠Introduction conditions.

Including pronouns on the characters’ nametags keeps the information accessible throughout the experiment, and compared to the Introduction manipulation, does not require speakers to retrieve information from episodic memory. If speakers use the nametag information when selecting the pronoun to produce, we would expect to see a smaller penalty for they/them in the two +⁠Nametag conditions. If speakers do not use the nametag information, instead relying on their lexical knowledge of the name or an inference about the character’s gender based on their appearance, we would see no differences between the +⁠Nametag and –⁠Nametag conditions.

If the introductions and nametags do reduce the relative difficulty of singular they, the combination of the two may be more effective than just one, resulting in the higher accuracy for the +⁠Nametag +⁠Introduction condition compared to the +⁠Nametag –⁠Introduction and the –⁠Nametag +⁠Introduction conditions. This could result if including pronouns in introductions directs people to pay attention to the nametags, and if the nametags serve as a cue to retrieve the a memory of the introduction information.

3.4 Results

3.4.1 Participant Backgrounds

Participants were older than the typical college student sample (range = 19–81, Mdn = 34). Around half the participants were women, and 6 were under the nonbinary umbrella. 6 participants said that their gender was different than their sex assigned at birth and/or that they considered themselves transgender, and 38 were LGBQ+ (Table 3.1). These rates are somewhat higher than the U.S. average, but in line with previous data about the Prolific participant population (Douglas et al., 2023). Overall, just about all participants were at least somewhat familiar with singular they before the experiment: a third had heard about people using they/them pronouns but had not met anyone who does, a third had met but were not close to anyone who uses they/them, and a third were close to someone who uses they/them and/or used they/them themselves (Figure 3.3D). Similarly, most participants were familiar with including pronouns when introducing yourself and on nametags/display names, but didn’t consider it a part of their or their social circles’ norms. When describing their own habits, about half said they never do either because they prefer not to, about a quarter said they do rarely or sometimes, and about a tenth said they do usually or always (Figure 3.3B). When describing what people around them do, about a third were never around people who share pronouns, about half were rarely or sometimes around people who share pronouns, and about a fifth were usually or always around people who share pronouns (Figure 3.3C). Including pronouns on nametags was somewhat more common than including pronouns in introductions, which is unsurprising given that the former can be less marked. When rating the naturalness of singular they coreferring with different types of referents (Figure 3.3A), acceptance of indefinite forms was generally high (M = 5.56, SD = 1.48), and acceptance of proper names was more variable (M = 4.58, SD = 1.97) and was significantly lower (β = 0.98, t = 4.89, p < .001) (Table A.16). For gender beliefs (Nagoshi et al., 2008), responses on a 1–7 Likert scale were scaled to the 0–6 range, giving a total range of 0–54, with higher scores indicating higher endorsement of the gender binary and gender essentialism and thus less favorable attitudes about trans and gender-nonconforming people (Figure 3.3E). Participant totals spanned the entire scale (range = 0–52), but were skewed towards the lower end, with a mean response that was moderately favorable towards trans and gender-nonconforming people (M = 18.02, SD = 14.38; see Table A.17 for item text and means). While this experiment did not include direct measures of political affiliation, other studies show that the Prolific population skews left, with 35% identifying as a strong Democrat, and only 20% identifying as a strong, weak, or independent Republican (Douglas et al., 2023).

Table 3.1: Experiment 3: Participant Demographics. The trans & gender diverse and sexuality categories have variable totals, as participants could select multiple options. Participant education, English experience, and race/ethnicity are included in the appendix (Table A.15).

Experiment 3: Participant Demographics

Age

162

18-24

31

25-34

53

35-44

35

45-54

19

55-64

12

65-74

8

75+

1

Prefer not to answer / Missing data

3

Gender

162

Male

63

Female

88

Nonbinary

4

Genderqueer

1

Female/nonbinary

1

Woman/questioning

1

Prefer not to answer / Missing data

4

Transgender & Gender-Diverse

238

I consider myself cisgender

63

I consider myself transgender

4

I don't consider myself cisgender or transgender

12

My gender is the same as what was written on my original birth certificate

150

My gender is different than what was written on my original birth certificate

6

Prefer not to answer / Missing data

3

Sexuality

175

Asexual

6

Bisexual/Pansexual

24

Gay/Lesbian

6

Heterosexual/Straight

120

Queer

9

Questioning

3

I use a different term

2

Prefer not to answer / Missing data

5

Total Participants

162

Figure 3.3: [A] Naturalness ratings (1 = very unnatural, 7 = very natural) for singular they coreferring with indefinites and proper names. [B] Frequency that participants include their pronouns when introducing themselves and in places like nametags. [C] Frequency that the participants’ social circles include pronouns in introductions and on nametags. [D] Experience with using they/them. [E] Gender binary and essentialism beliefs (Nagoshi et al., 2008), with higher scores indicating higher endorsement and thus more negative attitudes about trans and gender non-conforming people. The black line is the mean response.

3.4.2 Distribution of Pronouns Produced

Trials were automatically transcribed using whisper (Radford et al., 2022), then checked to include disfluencies. After excluding trials that were inaudible or did not include a response to the task (1.17% of data), 4803 trials were included in the analysis. Each trial was coded for pronoun(s) referring to the target character, which occurred in nearly all trials (98.44%). Because subject pronouns were infrequent (1 he, 2 she, 2 they) and did not occur in trials without a corresponding possessive pronoun, the analyses only include possessive pronouns. There were no outliers between the 6 lists varying the name-image-pronoun combinations (Figure A.8).

Figure 3.4A shows the distribution of final pronouns produced by target character and condition. Trials with one pronoun are shown in darker colors; trials with multiple pronouns (e.g., Jaime gave the apple to her bro—to their brother, 1.60%) show the final pronoun in lighter colors. Participants were numerically more likely to not use pronouns for they/them characters (N = 31) than for he/him (N = 22) and she/her characters (N = 22), but the comparison between the mean rates of no-pronoun responses for they/them and he/him + she/her was not statistically significant, t(2747.18) = -1.42, p = .16. Rather, participants who only used names (e.g., Jaime gave the apple to Jaime’s brother) tended to do so for all 3 characters. In trials where participants produced multiple pronouns, self-corrections from his to their (N = 14) and her to their (N = 23) were more common than self-corrections from their to his (N = 5) or her (N = 8), or between his and her (N = 12, N = 15). Unexpectedly, their responses for each participant were typically at floor or near ceiling (Figure 3.4B), with the Nametag and Introduction conditions affecting whether a participant produced their at all, more than affecting accuracy within participants who produced their in some trials.

Figure 3.4: Experiment 3: Distribution of Responses.[A] Final pronoun produced. Trials where participants used multiple pronouns to refer to the character (e.g., Jaime gave the apple to her bro—to their brother) are grouped based on the final pronoun and shown in lighter colors. [B] Number of their responses per participant.

3.4.3 Pronoun Accuracy

The primary analysis measured accuracy of pronouns referring to the target character (Figure 3.5). Trials where participants produced different pronouns (1.60%) were coded based on the final pronoun (e.g., Jaime gave the apple to her bro—to their brother would be coded based on the accuracy of their); trials with no pronouns (1.56%) were excluded from this analysis. Pronoun was manipulated between both the character described (target) and the other character pictured on the screen (distractor). The first Pronoun Pair contrast compared trials with they/them target characters to trials with he/him and she/her target characters (They|HeShe vs HeShe|They + HeShe|They). Within he/him and she/her character trials, the second contrast compared trials with they/them distractors to trials with he/him and she/her distractors (HeShe|They vs HeShe|They). There were no significant effects of the second Pronoun Pair contrast, so for simplicity’s sake, the first Pronoun Pair contrast is referred to as the effect of Pronoun from here on. The fixed effects of Nametag and Introduction (between-participants) were both mean-center effects coded. The maximal model justified by the experimental design (Baayen et al., 2008; Barr et al., 2013) included by-participant slopes for Pronoun and by-item intercepts. Item was defined as the combinations of names, images, and pronouns that varied across the 6 lists of characters; because these combinations did not fully vary across pronouns, by-item random slopes were not included. The final model (Table 3.2) included all interactions between fixed effects, plus by-item and by-participant intercepts (Bates et al., 2015; R Core Team, 2023; Voeten, 2023).

Figure 3.5: Experiment 3: Production Accuracy, split by Nametag and Introduction conditions. Trials using no pronouns are excluded. By-participant means are shown as points; error bars indicate 95% CIs calculated over the by-participant means.

Across all conditions, participants were more likely to produce the correct pronoun than not (β = 13.16, z = 12.24, p < .001). Participants were more accurate for he/him and she/her characters (M = 0.98 across Nametag and Introduction conditions) than for they/them characters (M = 0.88) (β = 5.05, z = 5.09, p < .001). Within he/him and she/her trials, there was no significant difference between trials where the other pictured character used he/him or she/her and trials where the other character used they/them (β = -0.76, z = -1.61, p = .11). The main effects of Nametag and Introduction were not significant (β = 1.71, z = 1.54, p = .12; β = -0.80, z = -0.72, p = .47). The three-way interaction between Pronoun, Nametag, and Introduction was significant (β = 6.24, z = 3.91, p < .001). This was qualified by significant interactions between Pronoun and Introduction (β = -2.58, z = -3.36, p < .001), and Pronoun and Nametag (β = 2.32, z = 2.91, p < .01). Post-hoc comparisons showed that Introduction attenuated the difference in accuracy between they/them and he/him + she/her in the –⁠Nametag conditions (β = -5.70, z = -6.92, p < .001), but not in the +⁠Nametag conditions (β = 0.54, z = 0.41, p = .68). Figure 3.6 shows the means for each condition: accuracy for he/him and she/her characters was near ceiling for all conditions, and accuracy for they/them characters was highest in the –⁠Nametag +⁠Introduction condition (M = 0.95, SD = 0.21), slightly lower in the +⁠Nametag +⁠Introduction (M = 0.91, SD = 0.28) and +⁠Nametag –⁠Introduction (M = 0.91, SD = 0.29) conditions, and lowest in the –⁠Nametag –⁠Introduction condition (M = 0.73, SD = 0.44).

Figure 3.6: Experiment 3: Condition Means. Means and 95% CIs of
accuracy for he/him + she/her characters and they/them characters,
split by Nametag and Introduction conditions.
Table 3.2: Experiment 3: Production Accuracy. Model results for the effects of Pronoun Pair, Nametag, and Introduction on Pronoun Accuracy (=1), with trials that did not include a pronoun referring to the target character excluded, and trials that contained different pronouns coded based on the final one.
Experiment 3: Production Accuracy
  Production Accuracy
Predictors Log-Odds SE z p
(Intercept) 13.162 1.075 12.239 <0.001
Pronoun Pair: T|HS (-.66) vs HS|T (+.33) + HS|SH (+.33) 5.047 0.991 5.093 <0.001
Pronoun Pair: HS|T (-.5) vs HS|SH (+.5) -0.763 0.476 -1.605 0.108
Nametag (-.5; +.5) 1.706 1.109 1.538 0.124
Introduction (-.5; +.5) -0.797 1.109 -0.718 0.473
Pronoun (T|HS vs HS|T + HS|SH) * Nametag 2.324 0.800 2.907 0.004
Pronoun (HS|T vs HS|SH) * Nametag -1.045 0.951 -1.099 0.272
Pronoun (T|HS vs HS|T + HS|SH)* Introduction -2.575 0.767 -3.357 0.001
Pronoun (HS|T vs HS|SH) * Introduction -0.294 0.951 -0.309 0.757
Nametag * Introduction 1.296 2.270 0.571 0.568
Pronoun (T|HS vs HS|T + HS|SH) * Nametag * Introduction 6.240 1.597 3.906 <0.001
Pronoun (HS|T vs HS|SH) * Nametag * Introduction 0.324 1.903 0.170 0.865
Random Effects
τ00 Participant 71.775
τ00 Character 2.841
N Participant 161
N Character 18
Observations 4729

3.4.4 Exploratory Analyses

To estimate internal reliability, I used the Bayesian mixed-effects model approach described in Staub (2021). The trials were split in half, so that each half of the data included 5 he/him, 5 she/her, and 5 they/them characters for each participant. Pronoun was coded as 2 separate variables: the first comparing they/them (-.66) to he/him (+.33) and she/her (+.33) in even trials, with odd trials coded as 0, and the second comparing they/them to he/him + she/her in odd trials, with even trials coded as 0. The brms package in R (Bürkner, 2017) fit a model with the odd and even trial Pronoun variables as fixed effects predicting accuracy and as by-participant random slopes. The model kept the default priors and was fit using 4 chains, each with 4000 iterations, of which 2000 were warm-up. The random slope estimates represent the relative accuracy of they/them compared to he/him + she/her for each participant, and these estimates were strongly correlated between halves of the data, r = 0.97 [0.90, 1.00]. This matches the distribution of results for they/them characters (Figure 3.4B), where participants tended to produce singular they in all or nearly all trials, or in none.

After confirming that the task showed high internal reliability, I conducted exploratory analyses with participant covariates that have previously been shown to correlate with acceptability ratings for singular they. For the sentence naturalness ratings (1–7 with 7 as “very natural”), the mean ratings for the generic, each, and every sentences [Indefinite Ratings] and for the 3 proper name sentences [Name Ratings] were calculated for each participant. For experience using they/them pronouns [Familiarity], participants were split into 3 similar-sized groups: had not heard about it before the study or had heard about it but hadn’t met anyone who does (N = 55); had met someone who uses they/them but weren’t close to them (N = 53); used they/them pronouns themself and/or were close to someone who uses they/them (N = 51). For familiarity with pronoun-sharing practices [Sharing], “none” and “never, because I had not heard about this” responses were coded as 0; “never, because I prefer not to” responses were coded as 1; “rarely” and “a few” responses were coded as 2; “sometimes” and “some” responses were coded as 3; “usually” and “most” responses were coded as 4; and “always” and “all” were coded as 5. These 4 questions were summed to create 1 composite score, with 0 indicating the lowest familiarity and 20 indicating the highest (M = 7.24, SD = 4.85). Responses for the gender binary and gender essentialism beliefs measure [Gender Beliefs] were rescaled from 1–7 to 0–6 and summed, with higher responses indicating stronger endorsement. Sexuality was coded as 1 for participants who said they were asexual, bisexual/pansexual, gay/lesbian, and/or queer (N = 38) and 0 otherwise (N = 121). Because only 6 participants said they were transgender and/or that their gender is different than their sex assigned at birth, analyzing this as a separate factor is difficult. Instead, LGBTQ+ Identity is treated as one variable, noting that the 6 TGD participants were also LGBQ+.

The strongest correlation between participant covariates (Figure 3.7) was familiarity with pronoun-sharing practices and familiarity with using they/them pronouns (r = 0.62, p < .001), which is unsurprising given that people who use they/them nearly always have to explicitly state their pronouns in order to not be misgendered. The brief naturalness ratings questionnaire largely replicated prior results correlating gender beliefs, familiarity, LGBTQ+ identity, and age with judgments about singular they (Camilliere et al., 2021; Conrod, 2019; Hekanaho, 2020; Hernandez, 2020; Minkin & Brown, 2021; Nichols et al., 2019; Parker et al., 2019). The second-strongest correlation was between naturalness ratings and gender beliefs, with participants who more strongly endorsed the gender binary and gender essentialism rating singular they coreferring with proper names as less natural (r = -0.52, p < .001) (see in particular Hernandez, 2020). LGBTQ+ participants (r = 0.30, p < .001) and participants more familiar with using they/them (r = 0.33, p < .001) and with pronoun-sharing practices (r = 0.38, p < .001) rated they coreferring with proper names as more natural. Older participants rated it as less natural (r = -0.22, p < .01). However, ratings for indefinite singular they (generic, each, every) were not significantly correlated with other participant covariates—or with ratings for they coreferring with proper names (r = 0.07, p = .40).

Figure 3.7: Experiment 3: Correlations Between Participant Covariates.
Age, LGBTQ+ identity, familiarity with using they/them pronouns and
pronoun-sharing practices, naturalness ratings for they coreferring with
indefinite referents and with proper names.

I then tested if adding these participant covariates to the hypothesis-testing model significantly contributed to fit. Age, Familiarity, Gender Beliefs, Name Ratings, and Sharing were mean-centered; and LGBTQ+ was mean-center effects coded. The distributions of the rescaled variables are shown in Figure A.9. The buildmer package in R (Bates et al., 2015; R Core Team, 2023; Voeten, 2023) was used to identify the most complex converging model, allowing all interactions between fixed effects. It then performed backwards stepwise elimination to remove participant covariate terms that did not significantly contribute to model fit, while retaining of the fixed and random effects from the hypothesis-testing model. The final model included Gender Beliefs, Familiarity, and a subset of their two- and three-way interactions (Table A.18). No effects of Familiarity were significant after Bonferroni correction for multiple comparisons.

Participants who more strongly endorsed the gender binary and gender essentialism were less accurate overall (β = -10.07, z = -3.44, p < .001) and showed a larger relative difference in accuracy between they/them and he/him + she/her (β = 6.50, z = 3.53, p < .001) (Figure 3.8). The interaction between Pronoun, Introduction, and Gender Beliefs was marginally significant after correction for multiple comparisons (β = 13.32, z = 3.11, p < .01), such that Gender Beliefs had a larger effect on the relative accuracy of they/them in the –⁠Introduction conditions than in the +⁠Introduction conditions.

Figure 3.8: Experiment 3: Accuracy by Gender Beliefs. By-participant mean accuracy for they/them characters, predicted by endorsement of the gender binary and gender essentialism. Points are by-participant means; the line is a GLM fit over the raw data.

3.5 Discussion

In Experiment 3, participants learned about three characters, each of whom was associated with pronouns (1 he/him, 1 she/her, 1 they/them), a gender-neutral name, an image, and two sibling images. In all conditions, participants saw a total of five examples of each character’s pronouns in use before beginning the test trials. Additional information about the characters’ pronouns varied by two factors: the Introduction conditions manipulated whether the introductions to the characters explicitly stated who uses __ pronouns, and the Nametag conditions manipulated whether the images of the character included pronouns alongside their name. In each trial, participants saw two characters in the center, with their four siblings in the corners. An object moved from a character to a sibling, prompting spoken descriptions in the frame Jaime gave the apple to their brother (Pozzan & Antón-Méndez, 2017). This structure preferentially elicited—but did not require—participants to produce a possessive pronoun.

Baseline accuracy for they/them characters was high compared to the first two experiments, with participants in the –⁠Nametag –⁠Introduction condition correctly producing singular they in about three quarters of trials. Both the nametag and introduction manipulations facilitated singular they, with accuracy in conditions with one or both rising to over 90%. Accuracy for they/them characters was, unexpectedly, highest in the –⁠Nametag +⁠Introduction condition and slightly lower in the +⁠Nametag +⁠Introduction and +⁠Nametag –⁠Introduction conditions. Generally, the Nametag and Introduction conditions tended to affect whether or not participants used singular they at all, with the majority of participants producing singular they in all or nearly all trials, or in no trials.

The original goal of this experiment was to investigate how introductions and nametags may reduce the number of errors speakers make—potentially mirroring the real-life situation where well-intentioned people do get they/them pronouns correct, but only when paying attention to it, and will frequently default back to he/him or she/her when the demands of the conversation direct their attention elsewhere. The speech production task proved relatively easy for participants, since they did not have to remember names for the characters and their siblings, the trial pacing erred on the side of not requiring them to rush, and the objects were typically easy to name. While it would not have been possible to conceal the fact that the experiment is about pronouns, it is likely that producing pronouns was the most difficult aspect of the task, and that participants were able to focus their attention on it. From this perspective, the task proved too easy for participants.

However, the all-or-nothing distribution of responses means that the task showed high internal reliability, warranting individual differences analyses. Participants were recruited from Prolific and had a wide range of experience using they/them pronouns, naturalness ratings for various forms of singular they (Conrod, 2019), experiences with pronoun-sharing practices, and beliefs about the gender binary and gender essentialism (Nagoshi et al., 2008). This experiment replicates the expected relationships between acceptability judgments for they coreferring with proper names. Age and endorsement of the gender binary negatively correlated with naturalness ratings, and LGBTQ+ identity, experience with they/them, and familiarity with pronoun-sharing positively correlated with naturalness ratings (Camilliere et al., 2021; Conrod, 2019; Hekanaho, 2020; Hernandez, 2020; Minkin & Brown, 2021; Parker et al., 2019). However, when testing if adding participant covariates to the hypothesis-testing model improved fit, acceptability ratings did not predict production accuracy. Instead, gender beliefs was the strongest predictor of accuracy for they/them characters, with participants who more strongly endorsed the gender binary and gender essentialism and expressed more discomfort with gender non-conforming people being less likely to use singular they.

Collecting speech production data online came with both advantages and drawbacks. Compared to in-lab experiments, recruiting participants on Prolific resulted in a sample more diverse in age, language experience, and sociopolitical beliefs (Douglas et al., 2023), as well as the ability to collect enough data to be appropriately powered. The fact that participants never interacted directly with an experimenter means that social desirability pressures may have been different. While it was clear that the experiment wanted them to use they/them pronouns, speakers may feel differently about refusing or failing to do so when an addressee is present. How the social context—either another participant completing the same task, or a researcher who participants may infer is LGBTQ+ and is invested in the outcome of the experiment—is an area for future research.


  1. However, see McGonagill (2023), discussed in Section 0.7, for data about how nonbinary people felt that being out would negatively affect their job search and how this was borne out in experiments with resume response rates and hiring manager surveys. When an otherwise-identical resume included they/them pronouns, the applicant received less interest and was rated as less qualified.↩︎