View Single Post

Thread: F statistic/distribution question

  1. - Top - End - #4
    Titan in the Playground
    Join Date
    May 2007
    Location
    Tail of the Bellcurve
    Gender
    Male

    Default Re: F statistic/distribution question

    If I'm doing a t-test of mu_1 - mu_2 = 0, then although I'm talking about two parameters, the difference between mu_1 and mu_2 is one dimensional. Thus we can say it has sides, because mu_1 - mu_2 falls somewhere on the real line, either less than zero, equal to zero (H0) or greater than zero. The sides of the hypothesis test describe which of those deviations I care about.

    I can also test this with an F distribution; as I said before the F statistic is exactly the square of the t statistic. Now I'm making a hypothesis about (mu_1 - mu_2)^2, which is either zero (H0) or not (HA). There's no sides left, because I'm looking at squared differences in means, and those are always going to be greater than or equal to zero. The F test can't tell me anything about how mu_1 and mu_2 differ, because I take the square. The t-test can, and is therefore generally preferable when doing comparisons of only two means.

    In practice, most things you test with an F distribution have complex hypotheses, you can't really say they're one or two sided because they are statements about the relative values of parameters in high dimensional space. The t-test can't do this, it can only test a single comparison at at a time.

    Consider the simplest ANOVA scenario that doesn't reduce to a t test. I've got three treatments, and I want to know if they have different means; does mu_1 = mu_2 = mu_3 or not? Now I care about two differences; mu_1 - mu_2 and mu_1 - mu_3*. Under H0, both these differences are zero. The F test tests whether (mu_1 - mu_2)^2 = 0 and (mu_1 - mu_3)^2 = 0. Again, there are no sides, both because I've squared things, and because it's not really a meaningful concept in the two-dimensional space of differences between three means.


    edit: Under H0, the expectation of MSTR, the mean squared treatment error, is sigma^2. The expectation of SSE, sum of squares error is also sigma^2. The test statistic is F = MSTR/MSE, but E(F) = E(MSTR/MSE) != E(MSTR)/E(MSE) under H0. The expectation is actually df_denominator/(df_denominator - 2) when df_denominator is > 2, and otherwise the distribution fails to have an expectation**. If you have a lot of error degrees of freedom, this gets very close to one, but it is not one.

    *These particular differences are arbitrary. I could also pick mu_1 - mu_2 and mu_2 - mu_3. Point is, there's only two, and any other difference of treatment means is a linear combination of these two.

    **Which is vaguely surprising, until one recalls that a t with df=1 is Cauchy. One need not have a mean to have a valid test statistic and p-value, because p-values are probabilities, not statements about the mean of the test statistic.
    Last edited by warty goblin; 2018-05-11 at 02:16 PM.
    Blood-red were his spurs i' the golden noon; wine-red was his velvet coat,
    When they shot him down on the highway,
    Down like a dog on the highway,
    And he lay in his blood on the highway, with the bunch of lace at his throat.


    Alfred Noyes, The Highwayman, 1906.