Sometimes, sample proportions are continuous rather than of the binomial form (number of successes)/(number of trials). Each observation is any real number between 0 and 1, such as the proportion of a tooth surface that is covered with plaque. For independent responses {yi}, Aitchison and Shen (1980) and Bartlett (1937) modeled logit (Yi) ∼ N(βi, σ2). Then Yi itself is said to have a logistic-normal distribution.
a. Expressing a N(β, σ2) variate as β + σZ, where Z is standard normal, show that Yi = exp(βi + σZ)/[1 + exp(βi + σZ)].
b. Show that for small σ,
c. Letting µi = eβi/(1 + eβi), when σ is close to 0 show that E(Yi) ≈ µi, var(Yi) ≈ [µi(1–µi)]2 σ2.
d. For independent continuous proportions {yi}, let µi = E(Yi). For a GLM, it is sensible to use an inverse cdf link for µi, but it is unclear how to choose a for Yi. The approximate moments for the logistic-normal motivate a quasi-likelihood approach (Wedder-burn 1974) with variance function υ(µi) = ϕ[µi(1 – µi)]2 for unknown ϕ. Explain why this provides similar results as fitting a normal regression model the sample logits assuming constant variance. (The QL approach has the advantage of not requiring adjustment of 0 or 1 observations for which sample logits don’t exist.)
e. Wedderburn (1974) gave an example with response the proportion of a leaf showing a type of blotch. Envision an approximation of binomial from based on cutting each leaf into a large number of small regions of the same size and observing for each region whether it is mostly covered with blotch. Explain why this suggests that υ(µi) = ϕµi(1 – µi). What violation of the binomial assumptions might make this questionable? [The parametric family of beta distributions has variance function of this form. Barndorff-Nielsen and Jorgensen (1991) proposed a having υ(µi) = ϕ[µi(1 – µi)]3].