# Probability Distributions
# May 15, 2012
# Stat560

This exercise uses a population genetics problem as an example of
maximum likelihood estimation (MLE). It will also serve as some 
familiarization with how you write functions of variables in R.

Suppose that there is a disease (let's say halitosis) which is partly 
genetically determined.  The genotype  aa  has a 40% chance of getting the 
disease, and the other two possible genotypes,  AA  and  Aa, each has a 10% 
chance of getting the disease.  We want to estimate the frequency of the  A  
allele, and put a rough confidence interval on it.

If the gene frequency of the  A  allele is  p,  and that of  a  is  1-p,
then the frequency of the disease is expected to be (if the genotypes
are in Hardy-Weinberg proportions as a result of random mating:

   F   =   p^2 (0.1) + 2p(1-p) (0.1) + (1-p)^2 (0.4)

(where the notation "^2" means "squared").  Now suppose we observe
1000 indviduals and find that the 182 of them have the disease.

Using a binomial distribution, the probability that 182 out of 1000
have the disease is the binomial class probability for 182 out of 1000
when the probability of the event is F (which is a function of p).

This is  

             1000!
            --------  F^(182) (1-F)^(818)
            182! 818!

we want to maximize this by varying  p.  The first thing to notice
is that the factorial stuff in front doesn't matter -- it will be the
same for all values of  F.  So we drop it.

Then the log-likelihood is    182 log(F) + 818 log(1-F)

We want to write a function  LL, the log-likelihood for  p.
For this we create two functions (just copy them but look at them too
and try to figure out what they do, as this is how you make your own
functions):

F <- function(p) {
 return(0.1*(p^2+2*p*(1-p)) + 0.4*(1-p)^2)
}

LL <- function(f) {
  return(182*log(f)+818*log(1-f))
}

Note -- more generally you put statements inside the parentheses that define
the body of the function, and these statements are separated by semicolons.
The last one should be  return(...) which computes the value of the
value of the function.  A function can also have multiple arguments such
as LL(a, b, x)

Enter these commands.  Then

1. Calculate LL(F(p)) for some values of  p.

2. To make life easier, using commands like  seq( ... ) to make a vector
(called, say, x) of gene frequencies from just above zero to nearly 1 (don't 
use 0 or 1 as the logarithms will explode, which is scary).

3. Make a vector of log-likelihood values by doing LL(F(x)).

4. Plot this against  x

5. Repeat commands 2 and 3 to narrow down the search, using  seq(...)
in appropriate ways.