standard normal distribution question

**JeenLeen** · 2018-02-25, 04:50 PM (ISO 8601)

In generating standard normal (mean 0, std error 1) variables, we find most values come between -3 and 3.

My question is: there is chance (which for all practical purposes is 0%, but is technically higher than that) that we could get an absurdly high or low value, like 3000 or -3000? The probability is probably smaller than there are atoms in the universe and maybe no computer holds that precision of decimal places to give me the probability, but it's a number that at least in theory could be computed, right?

I'm mainly asking since I have a school problem where I'm asked to find the smallest possible value that could fall into a random standard normal distribution. I think that, technically, the smallest number is essentially negative infinity, but for practical purposes (which is what really matters here) doing something like a confidence interval of what falls between -3 and 3 would work.

**Kato** · 2018-02-25, 05:15 PM (ISO 8601)

Originally Posted by JeenLeen

In generating standard normal (mean 0, std error 1) variables, we find most values come between -3 and 3.

My question is: there is chance (which for all practical purposes is 0%, but is technically higher than that) that we could get an absurdly high or low value, like 3000 or -3000? The probability is probably smaller than there are atoms in the universe and maybe no computer holds that precision of decimal places to give me the probability, but it's a number that at least in theory could be computed, right?

I'm mainly asking since I have a school problem where I'm asked to find the smallest possible value that could fall into a random standard normal distribution. I think that, technically, the smallest number is essentially negative infinity, but for practical purposes (which is what really matters here) doing something like a confidence interval of what falls between -3 and 3 would work.

Well... what exactly are we talking about here? If it is purely hypothetical: Yeah, I guee minus infinity would be the answer. If we talk about some application the likely answer is dependant on the application... If it's an IT question, it depends on the programming of your RNG and what it will consider a valid prpbability.

**jayem** · 2018-02-25, 06:08 PM (ISO 8601)

That does seem odd, also for any range above 1 S.D most points are inside but not all, so 3 (99% IIRC) is a bit arbitrary, 6 sigma also has precedent.
Is it a true normal? But otherwise if the outcome is physically possible it's possible, although Excel will be unable to store the value at around 40S.D*

* it can store the value of the distribution at 38S.D which is less than the probability of 37-38 (obviously, and hence of 37-infinity).
It can't store the value at 39, and as it should fall by more than a factor of two each unit step the the total area has to be much less than N(38)+0.5*N(38)+0.25...=2*N(39)

With that logic, I reckon P(>3000) should be between N(2999) and N(3001). ~ (1/3)*exp(-9,000,000/2) ~ so a number with somewhere around a (two) million zeros before it?

**JeenLeen** · 2018-02-25, 07:58 PM (ISO 8601)

Context is theoretical.

The full context is this is a problem related to item response theory, that is, statistics based on test items, like for the SAT or ACT. I'm being asked what the minimum ability level someone could have to get a certain number of items right. I'm thinking the answer would technically be -infinity, although the real answer is, well, more realistic.

Thanks for your answers. My main concern was a worry about if the standard normal distribution actually did have a lower or upper cut-off point. I was pretty sure it didn't, but sometimes there's details that are some obscure that they are left out of discussions.

**Jay R** · 2018-02-25, 10:48 PM (ISO 8601)

Yes, the distribution is asymptotic. There is no lower or upper bound.

**Kato** · 2018-02-26, 02:24 AM (ISO 8601)

Originally Posted by JeenLeen

Context is theoretical.

The full context is this is a problem related to item response theory, that is, statistics based on test items, like for the SAT or ACT. I'm being asked what the minimum ability level someone could have to get a certain number of items right. I'm thinking the answer would technically be -infinity, although the real answer is, well, more realistic.

Thanks for your answers. My main concern was a worry about if the standard normal distribution actually did have a lower or upper cut-off point. I was pretty sure it didn't, but sometimes there's details that are some obscure that they are left out of discussions.

Can you get a negative score on these tests?

sorry, I'm not American.

**jayem** · 2018-02-26, 02:51 AM (ISO 8601)

Originally Posted by Kato

Can you get a negative score on these tests?

sorry, I'm not American.

I think the question is the other way round. Given a fixed score what % of each people get over it. For which I don't think the cut-off makes too much difference (unless you actively ask about 0), if you can assume that it 'would be normal'. Although obviously it shows that something isn't normal there.

You could explicitly give confidence levels, perhaps? "We're 95% confident that ...", though that runs the risk of the examiner thinking you are unsure, and runs the risk of doubling down on what could be a misunderstanding.

**Kato** · 2018-02-26, 04:46 AM (ISO 8601)

Originally Posted by jayem

I think the question is the other way round. Given a fixed score what % of each people get over it. For which I don't think the cut-off makes too much difference (unless you actively ask about 0), if you can assume that it 'would be normal'. Although obviously it shows that something isn't normal there.

Well, then there is still the (real life) limitation that zero is the lower limit since no less than zero (percent) of people can get a certain score. While normal distribution is handy as a tool, it's always a good idea to consider what you use it for, since very rarely the lower / upper ends of the distribution make sense in reality.

**JeenLeen** · 2018-02-26, 09:47 AM (ISO 8601)

Originally Posted by Kato

Can you get a negative score on these tests?

sorry, I'm not American.

Those tests (and most... all?... tests--at least all I've heard of) have an upper and lower bound. BUT that's due to practical limitations of estimating ability.
In theory (or, well, one school of thought--there's a couple in the theoretical realms), a person has a latent trait called ability in X (say, math ability). You can make a test to try to assess their math ability, but you can never really capture what it truly is due to measurement error and stuff like the person being lucky at guessing or having an off day.

In practice, we have answers to a test and try to use statistics about the test items to estimate the person's ability, to give a fair score based on observed score.
My question was going at it from the other angle.

I actually re-read the question, and found out I was supposed to look for a 90% confidence interval, not some "smallest possible limit", so that's bad on me for not reading carefully. Though it was nice for me to have to think about the normal distribution in this way, and now I understand it better.

**warty goblin** · 2018-02-26, 10:27 AM (ISO 8601)

It's important to realize that nothing in reality is actually exactly normally distributed*. Many things are well approximated by a normal distribution, and the mathematical pliability of the normal makes it a very attractive choice. But nothing is actually normal.

So for instance we model human heights as normally distributed, but you never find people with a height of 1 inch, let alone negative values. In practice this isn't a problem, because given the mean and variance in heights, the amount of mass that a normal distribution puts below zero is entirely negligible.

For the case of a latent variable like ability, you need to ask yourself whether or not it's believable that you can get people of arbitrarily high or low ability. Myself, I'm pretty sure human capability is bounded, though I suppose the difference between that and having astronomically small probabilities of people of really out there ability levels is rather philosophical. Exams can be a case where the bounded nature of the possible scores presents a problem for a normal approximation though. People write perfect or near perfect scores in the GRE rather more than one would expect for instance, because the exam's got nothing after high school algebra on it so it's a cakewalk if you did a math heavy undergrad.

*A consequence of this is that tests to determine if your data is normally distributed are, in practice, tests of how big your sample size is. Get enough data and you'll conclude your data isn't normal.

**wumpus** · 2018-02-26, 10:41 AM (ISO 8601)

According to the ever-reliable wiki, the odds of 3000 or -3000 should be roughly 1/(3000)², or 1 in 9 million.

As noted above, normal distributions typically model some other distribution, but the gaussian is close enough. Often weather records are forced into normal distributions when they are obviously fractal, and fractal distributions are famous for having much higher probabilities in the "long tails" you are looking at. To achieve 7 digits of accuracy, not only would your distribution have to rely on things that are *absolutely* producing a normal distribution, absolutely nothing else is interfering with your system. In general, this only happens in very precise physics experiments after some very smart guys (gender not specified) spend years removing all the noise.

**jayem** · 2018-02-26, 05:53 PM (ISO 8601)

Originally Posted by wumpus

According to the ever-reliable wiki, the odds of 3000 or -3000 should be roughly 1/(3000)², or 1 in 9 million.

As noted above, normal distributions typically model some other distribution, but the gaussian is close enough.

I think you've missed raising e to the power of that? Which makes it even more tiny.

In addition one thing that absolutely does tend to be Normally Distributed is the average (or sum) of a random set of data drawn from an (arbitary) distribution.
So with care, you can manipulate the way you gather data, to allow use of the Central Limit Theorum.
So even something that has a spiky flat probability (like dice) combine them in boxes of 5 and you get the 5th binomial distribution which is a lot more Normal. And counter-intuitively by taking 20 readings from that nearly-known distribution (100 throws), gives you more accuracy and confidence in the mean than just straight off using the 100 throws.

**warty goblin** · 2018-02-27, 08:59 AM (ISO 8601)

Originally Posted by jayem

I think you've missed raising e to the power of that? Which makes it even more tiny.

In addition one thing that absolutely does tend to be Normally Distributed is the average (or sum) of a random set of data drawn from an (arbitary) distribution.
So with care, you can manipulate the way you gather data, to allow use of the Central Limit Theorum.
So even something that has a spiky flat probability (like dice) combine them in boxes of 5 and you get the 5th binomial distribution which is a lot more Normal. And counter-intuitively by taking 20 readings from that nearly-known distribution (100 throws), gives you more accuracy and confidence in the mean than just straight off using the 100 throws.

It doesn't quite work for any arbitrary distribution, since the CLT requires at least two moments to exist. So the CLT doesn't work if for some reason you find yourself sampling from a Cauchy or a number of other very heavy-tailed distributions. I've seen it argued for instance that because the prevalence of human violence is dominated by high body count but low probability events like World War II, an analysis based on the mean, which suggests people are getting less violent, is misleading because the mean doesn't really drive the behavior of the system. The rare tail events do, and there's simply not enough time since WWII to know if we're actually less likely to have giant world wars.

And yes, the limiting distribution of the sqrt(n)(xbar - mu) is normal. So is sqrt(n)(theta-hat - theta), where theta-hat is the MLE of theta, under some regularity conditions. These are again approximations however, albeit often fairly good ones. One of the advantages of modern computer based methods like the bootstrap is that we don't need to rely so heavily on these approximations.

**wumpus** · 2018-02-27, 10:38 AM (ISO 8601)

Originally Posted by warty goblin

One of the advantages of modern computer based methods like the bootstrap is that we don't need to rely so heavily on these approximations.

The disadvantage is that you might actually believe the approximation and fail to notice that data only supports 3 digits of accuracy. Any event at 3000 normal distributions would likely rely on a completely different source than typically seen, and not be part of the equation at all.

I suspect part of the reason we are seeing so many 100 year and 1000 year floods comes as a result of forcing chaotic processes into guassian functions. The tails are simply going to be much higher than expected.

Thread: standard normal distribution question

Thread Tools

Spoilers