The Probable Error of a Man

Content notes: trans identity, gender dysphoria, testosterone/HRT, medical systems, fertility, statistical concepts as a heavy-handed metaphor, family dynamics, pregnancy/IUI

Oct 26, 2025

So this Sunday morning, instead of doing my chores or catching up with the countless number of tasks I have (some for work, some for extracurricular hobbies, some for social maintenance, some for apartment maintenance... oh and I haven’t showered yet), I submitted a contribution to Atlas Obscura for the “William Sealy Gosset Plaque at the Guinness Storehouse.” (Update: the Atlas Obscura listing is up!!)

For those who don’t spend their free time nerding out over statistical history: William Sealy Gosset was the chemist and statistician who developed the t-test while working at Guinness Brewery in Dublin. Because Guinness didn’t want competitors knowing about their statistical methods, he published under the pseudonym “Student”—which is why we call it Student’s t-test to this day.

In 1908, Gosset revolutionized statistics by asking: What do you do when your sample size is too small for the normal curve to apply?

His answer became the Student’s t-distribution, a probability distribution that accounts for uncertainty when you’re working with limited data. When n is small (say, n=10 or less), the tails get fatter. The extreme values become more probable than the normal distribution would predict. You need wider confidence intervals to capture the truth.

(I think about this constantly. Sample size n=1. Population: me. Variable: gender. Question: What are the confidence intervals around “man” when the only data point is my own embodied experience?)

The Plaque at the Storehouse

In 2018—two years into my PhD program—I went to Dublin, Ireland, for my first international conference, where I was speaking. The 8th World Congress of Biomechanics, July 8-12, 2018. My talk was titled after my first “first-author” paper: “You can tell by the way I use my walk: Predicting the presence of cognitive load with gait measurements.”

During the conference, we toured the Guinness Storehouse. When I saw the William Sealy Gosset plaque, I absolutely geeked out. I took pictures. I got excited in that specific way that only fellow statistics nerds understand. No one else on the tour—a tour full of academics at a biomechanics conference!—seemed to share my enthusiasm.

I was really proud of that paper. I think I’m a better writer now, though.

Sometimes I wish I had done my PhD when I was more aware of who I was.

The Weight of Not Knowing

I was so physically and mentally unwell throughout my doctoral program. The chronic pain was a constant haze. I was still entangled in the politics of my family, terrified that my brother was going to die. I chose all the wrong partners. I was struggling just to keep up, let alone thrive.

That Dublin conference was weird. I was deeply in love with the person who would become my ex-spouse. I was there alone while my family called about yet another incident with my brother. I was utterly depressed and found it so hard to move with my chronic pain... but I forced myself to move anyway. I gave my talk. I toured the city and country as best I could. I drank an absurd amount of hot chocolate.

I also ran into my undergraduate research advisor at the conference. He hugged me.

It was strange because when I worked for him at Cornell, I felt like I wasn’t properly mentored. Like, maybe I wasn’t cut out for this research thing with how bad my health was. I remember really loving that lab—we were doing biomechanics research on bovine and human bone. I helped embed bone specimens for histological processing prior to imaging. During one spring break, I spent the entire week suited up in protective gear, helping clean flesh from human spines with power tools.

AND I LOVED EVERY MINUTE OF IT.

I even helped with rat experiments! But when it came time to write up what I had done... I froze. I froze during lab presentations. I felt like the dumbest undergraduate researcher ever—and I was an engineer at an Ivy League school. I remember having a panic attack before one of my last presentations.

I didn’t know I was AuDHD back then. I didn’t have my formal interstitial cystitis diagnosis yet, let alone my endometriosis. I didn’t know that PTSD was wreaking havoc on my life. I just thought I was a dumb person who got “lucky” even to be there, even though I had worked so incredibly hard to get to Cornell for undergrad.

The Statistical Significance of Self-Doubt

And now... It’s been over four years since I matriculated with my PhD.

I’m a statistician and data scientist, but I’m not a good statistician.

Sure, I’ve spent quite a long time in my life performing and analyzing statistics. I’ve written code. I’ve taken the classes. I’ve worked in staff capacities. I got the PhD in Biomedical Informatics. I’ve developed statistical protocols, built pipelines, and performed so much analysis, and now I’m chest-deep in building out a data science and operational analytics system for my current job. I’ve even collaborated on interesting clinical NLP projects. I’m now a Senior Data Scientist building operational analytics systems.

But to date, I only have 7 papers or so to my name, and not all of them are “first author.”

It’s hard to be a thriving academic when your physical health and your relational life have both been in shambles for the better part of your twenties and thirties.

Here’s what’s frustrating: I’m in a non-academic role now where I’m not expected to publish. I make more money (and money does help when you’re disabled!). I’m more authentic about my gender identity—I’m more me overall. I’m not in toxic relationships anymore. I have more mental and physical health resources.

And I feel like I’m so much more capable of actually doing the work that it would take to be an academic.

I know it’s not too late to go back to academia.

I know it’s not too late, even when authoritarianism in the world is on the rise, and I’m trying to create a baby by myself.

The Sample Size Problem (Revisited)

But before I can talk about whether I’m a “good enough” statistician, I need to talk about what Gosset actually understood that most people miss.

Gosset’s paper opens with a statement about experimental populations: “Any experiment may be regarded as forming an individual of a ‘population’ of experiments which might be performed under the same conditions.”

But what happens when the “same conditions” don’t exist? When you’re the experiment that breaks the experimental design? When your body is simultaneously the hypothesis, the method, the data collection instrument, and the result?

My sample size is catastrophically small. n=1. Just me, just this body, just this lifetime of trying to calculate my own mean.

The Measurement Problem

In February, the last week I got my testosterone prescription refilled was also when I got my prenatal vitamins filled (I started them quite early, in preparation). The pharmacy tech had to explain to the other pharmacy techs that I wasn’t pregnant yet and I could still be on T.

“This is for gender.”

(This is for the probable error of assuming I was ever a woman. This is for the variance around a mean that was always miscalculated. This is a correction factor for a lifetime of being measured against the wrong distribution.)

When I started testosterone in August 2024 (you can read about that journey here), it was part of a larger transformation toward becoming more fully myself. I stopped taking T around the beginning of February 2025, and that decision was also about honoring what my body and life needed in that moment—specifically, the possibility of pregnancy.

The medical establishment loves its measurements. Testosterone levels: 300-1000 ng/dL for “men,” 15-70 ng/dL for “women.” When I was on T, mine rose pretty quickly into the hundreds—solidly in “male range,” whatever that means when your body refuses to resolve into a single category.

I could see my face changing… I was very slowly becoming less soft, and my angles were becoming more defined. I was slightly annoyed that my skin texture was changing, but every time I did my T injection, I felt… so good.

I’m still genderfluid though… I love fashion and makeup (things that have somehow become associated with gender expression even though they’re social constructs), and I’ve given myself the permission and freedom to express myself and my body any way I’d like. It’s nice to live somewhere where I can still do that (how many places in the world am I not allowed to do that?).

I’m back to being more soft now, now that I’ve been off of T for so long… I’ve regulated my hormones now to be as fertile as I can be. I’m genuinely so excited for pregnancy, even though I know how painful my body can make situations for me.

Here’s what Gosset understood that the gender binary doesn’t. When you’re working with small samples, the extremes become more probable. When n=1—when you’re the only population you can truly know from the inside—you can’t rely on population parameters. You need a different kind of math entirely.

The t-distribution has fatter tails precisely because it accounts for this uncertainty. It says: With limited information, we must be humble about our certainty. The unusual becomes more likely. The outliers are part of the story.

(I am the fat tail of the gender distribution. I am statistically significant precisely because I shouldn’t exist according to the model.)

The Standard Deviation of Self

Gosset’s paper investigates “the error of random sampling”—how much the mean of a sample deviates from the true population mean. He writes: “If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty.”

Source of uncertainty #1: My embodied experience deviates widely from what was predicted. I was assigned female at birth (AFAB in the clinical notes, like it’s a diagnosis even though it’s an imperfect label, at best). But my body and gender have always operated on a different probability distribution than the one they mapped me onto.

Source of uncertainty #2: The population parameters themselves are suspect. What even IS the “normal distribution” of gender? Who decided? (Spoiler: colonial medicine, rigid binaries, the pathologization of variance, and a whole lot of people who never asked whether their measurement instruments were measuring what they thought they were measuring.)

The standard deviation of my gender identity? Incalculable. Not because it’s chaotic, but because standard deviation assumes we know what we’re measuring deviations from.

The Population Mean (That Never Existed)

Here’s what the medical gender machine tried to tell me:

Population: FEMALES
Mean characteristics: [long list of anatomical and hormonal parameters]
Your deviation from mean: [margin of error too small to matter]
Conclusion: You are a woman.

Here’s what that calculation missed:

The population they assigned me was the wrong population.

Gosset dealt with this exact problem when measuring barley yields. If you sample from Field A but your statistical model assumes Field B, your confidence intervals don’t mean anything. You’re calculating precision around the wrong center point.

I spent almost three decades trying to reduce my variance—to bring my measurements closer to the “female” population mean. Gender performance as outlier management. Constantly adjusting myself to fit tighter confidence intervals around “woman.”

It never worked because I was trying to minimize my deviation from a distribution I was never drawn from in the first place.

The Degrees of Freedom Problem

In Gosset’s t-distribution, degrees of freedom matter intensely. The formula is based on n-1 (sample size minus one). When your sample size is 2, you have 1 degree of freedom. When it’s 10, you have 9. The more degrees of freedom, the more your t-distribution approximates the normal curve.

When your sample size is 1 (n=1), your degrees of freedom equal zero.

(This is the existential mathematics of singular embodiment: I have zero degrees of freedom when it comes to being anyone other than myself. I am constrained by the n=1 condition of consciousness. I can’t sample from alternative versions of myself. I can only work with the data I have.)

But the plot twist that saves me from statistical despair is that the very act of choosing to transition adds degrees of freedom.

Before testosterone: n=1, df=0. Just me, stuck in a body that operated according to parameters I didn’t choose, producing data I didn’t consent to.

After testosterone: Still n=1, but the degrees of freedom increased. Because I introduced a new variable (exogenous androgens). Because I rejected the null hypothesis (that my assigned gender was correct). Because I ran a new experiment on myself.

Degrees of freedom aren’t just statistical abstractions. They’re literally about FREEDOM. About having options. About not being locked into a single predetermined outcome.

And now… I’m in the middle of the best experiment of my life (yes, this is what I’ll tell my child… I wonder if they’ll roll their eyes or just feel nice?). I’m so excited to have a child.

The Hypothesis Test I Failed (On Purpose)

The medical establishment’s null hypothesis for my body:

H₀: This patient is female and will remain within normal parameters for female physiology.

My lived experience as an alternative hypothesis:

H₁: This patient’s gender identity does not correspond to assigned sex; intervention is required.

To prove H₁, you’re supposed to provide overwhelming evidence. The standard is ridiculously high—not p<0.05, more like p<0.0001. You have to be SO CERTAIN that you’re trans that there’s virtually no probability you’re wrong.

(This is bad statistics, by the way. Setting your significance threshold so high that you’re basically requiring people to be impossibly certain about subjective experience before they can access medical care? That’s not science; that’s gatekeeping dressed up in Greek letters.)

I spent years trying to meet that threshold. Trying to gather enough evidence that my “sample mean” (my lived experience of gender) was significantly different from the “population mean” (womanhood) that I could reject H₀ with confidence.

The problem? I was using the wrong test entirely.

Gosset’s insight was that with small samples, you can’t use the z-test (which assumes you know the population standard deviation). You have to use the t-test (which estimates it from your sample).

Applied to gender: I can’t use other people’s experiences to set my significance threshold. I can only work with my own embodied data. My sample. My t-statistic. My degrees of freedom.

When I finally understood this, everything changed. I stopped trying to prove I was trans “enough” by external standards. I started treating my own experience as the legitimate sample it is.

The Confidence Interval of Manhood

I’m not a man in the way the medical establishment measures manhood.

When I was on T, my testosterone levels were somewhat in “male range,” but my voice has remained high. I grew slightly more facial hair (thanks, PCOS, genetics + those months of T), but I also have huge breasts that I may or may not surgically alter. I have a clitoris that grew—what I lovingly call my t-dick—definitely not a penis in the standard urological sense, but also definitely something that mediates between categories.

(See also: my previous Substack post about the lingam as metaphor. The yoni and lingam aren’t gendered in the Western binary sense anyway. They’re about energetic principles—giving and receiving, expanding and contracting, projecting and containing. Hinduism has always known that gender is more complex than colonial medicine tried to make it.)

So what’s my confidence interval around “man”?

If manhood is a point estimate—a single value on a number line—then I’m nowhere near it.

But if manhood is a confidence interval—a range of probable values, a distribution rather than a single point—then I’m definitely in there somewhere.

Gosset’s tables show that for small samples, a 95% confidence interval is MUCH wider than it would be for large samples. The smaller your n, the wider your confidence interval needs to be to capture the truth.

When n=1, my confidence interval for “man” is necessarily wide. Wide enough to include: a mustache, more body definition (however one defines that), and breasts. T-dick and the capacity for pregnancy. “Sir” and “ma’am” on the same day. All of it. All within the interval.

Right Now

Today is October 26, 2025.

I am:

7 days post-IUI (maybe-pregnant, maybe-not)
Several months off testosterone (definitely still masculinized, definitely capable of ovulating)
35 years old (significantly older than optimal fertility suggests, young enough to try)
Trans masculine (confident in this description)
Non-binary genderfluid (also confident in this description)
Possibly going to be a father (experimenting with that word intentionally)
Possibly going to be a mother (sometimes this word fits too)
Exploring other parental terms (I have one that I think is cute)
Definitely going to be a parent if any of this works (the only word that always fits, and well… maybe “definitely” isn’t that statistically accurate to say. Who knows what the future will bring?)

All of these statements are true simultaneously. None of them cancel each other out. They all exist within my confidence interval of self.

The probable error of a man includes: all of this. The variance. The multitudes. The refusal to resolve into a single point.

The journey hasn’t been linear. Nothing about my life has been linear—not my gender, not my health, not my career trajectory, not my path to possibly becoming a parent.

What Gosset Knew (That Gender Medicine Doesn’t)

There’s something profound about standing in the Guinness Storehouse in 2018, looking at William Sealy Gosset’s plaque, that feels emblematic of my entire academic journey—and my gender journey too.

Here was a brilliant statistician who had to publish under a pseudonym (“Student”) because his employer probably wouldn’t allow him to publish under his real name. He had to mask his identity to make his contribution.

There’s something there about trans people and pseudonyms, about having to hide to be heard, about institutions that want your labor but not your visibility.

I have to fight for my visibility in the workplace constantly… and not just that, I have to justify who I am constantly. (I know I chose “Rose” as my chosen English name… but I swear I didn’t choose it because it’s feminine.)

But here’s what Gosset’s work ultimately proved: The individual matters. The small sample matters. You can’t just dismiss n=1 as “not enough data” and move on.

Sometimes n=1 is all you have. Sometimes you ARE the experiment. Sometimes your embodied experience is the only data you’ll ever have access to from the inside.

And that’s legitimate. That’s real. That’s enough to build inference on.

You just need the right distribution.

The t-test is fundamentally about asking: “Is this difference real, or could it have happened by chance?” It’s about quantifying our confidence in the face of variability and small sample sizes. It’s about being rigorous even when—especially when—you don’t have perfect data or perfect conditions.

Gosset titled the paper “The Probable Error of a Mean.” The probable error: that beautiful, honest acknowledgment that we are working with imperfect measurements of an imperfect world.

The Distribution I Choose

I choose the “t-distribution” of gender.

The one with fat tails that make room for bodies like mine. The one that says “with limited information, we should be humble about certainty.” The one that treats n=1 as valid rather than dismissing it as insufficient.

I choose confidence intervals over point estimates. Probability distributions over binary categories. Bayesian updating over dogmatic certainty.

I choose to be my own data. My own experiment. My own population of one, legitimately sampled, honestly measured.

Maybe I’m not a “good” statistician by traditional academic metrics—7 papers, no tenure-track position, needing clarity in the workplace, being an introvert who doesn’t like public speaking, and working in industry instead of academia.

But I know how to honor uncertainty. I know how to work with what I have. I know how to be rigorous even when my body is in pain, even when my gender doesn’t fit into neat categories, even when my life doesn’t follow the expected trajectory.

I know how to keep moving, keep analyzing, keep asking questions—even when I feel like an impostor standing in front of a plaque at a brewery, the only person on the tour who understands what it represents.

The Probable Error

The probable error of this man?

Approximately ±4.303 standard errors (df=2, 95% confidence interval), because I’m working with limited information and I’m okay with that.

The probable error of this man includes:

Growing a human (maybe?? Results in one more week)
Having breasts and facial hair simultaneously (definitely)
Confusing the hell out of medical systems (constantly)
Not having enough publications to be a “good” academic (objectively true)
Being capable of the work anyway (also true)
Teaching my future child that gender is more interesting than anyone admits (hopefully)

All within the confidence interval. All are statistically probable once you use the right distribution.

The Experiment Continues

My IUI results: Unknown. One more week until I can test (I know I’ve repeated myself a lot here, but the anxiety is real). The hypothesis test isn’t complete.

My gender: Unknown in the sense that it refuses stable measurement. Known in the sense that I’m intimately familiar with the probability distribution.

My academic worth: Unknown by publication metrics. Known by the fact that I’m still here, still thinking, still writing, still building analytical systems that matter.

My body: Producing data that breaks the models. Requiring new mathematics. Insisting on its own legitimacy despite the too-small sample size.

Gosset revolutionized statistics by taking small samples seriously. By refusing to dismiss limited information as worthless. By building a framework that made the individual experiment meaningful.

I’m revolutionizing my own life by taking my singular embodiment seriously. By refusing to dismiss my n=1 as insufficient evidence. By building a framework that makes my individual experience meaningful.

The probable error of a man is large when you use the wrong distribution.

The probable error of a man is manageable when you accept uncertainty as epistemologically honest.

The probable error of THIS (genderfluid, feminine, masculine, non-binary, etc.) man includes everything I’ve become and everything I’m still becoming.

Sample size: 1
Degrees of freedom: Finally, after 35 years, more than zero
Confidence level: 95%, which is all anyone can ever honestly claim
Conclusion: Statistically significant. Will continue the experiment. Results pending.

That’s probably good enough.

Selected References

Gosset, W.S. (Student). (1908). “The Probable Error of a Mean.” Biometrika 6(1):1-25. — The foundational paper on small-sample statistics that introduced the t-distribution.

Overqualified Oversharing

Discussion about this post