# Outline of PRML with a few details - 2

# 1. Binary Variable

**Likelihood:** $ p(D | \mu) = \prod_{n=1}^{N}\mu ^ {x_n} (1-\mu)^{1 - x_n}$

**Beta distribution(Prior):** $ Beta(\mu | a, b) = \frac{\Gamma(a + b)}{\Gamma a \Gamma b}\mu^{a-1}(1 - \mu)^{b-1} $ ** Here the term with $\Gamma$ aims to do normalization*

Posterior: $ p(\mu | m, l, a, b) = \frac{\Gamma(m + a + l + b)}{\Gamma (m + a) \Gamma (l + b)}\mu^{m + a - 1}(1 - \mu)^{l + b - 1} $ |

The reason choosing beta distribution is that it’ll have the same functional form(both in the form of $\mu^x (1 - \mu)^{1-x}$) as posterior, which is a good property called *conjugacy*.

# 2. Multinomial Variables

**Likelihood:** $Mult(m_1, m_2, …, m_K|\mu, N) = \begin{pmatrix} N

m_1m_2…m_K \end{pmatrix} \prod_{k=1}^{K}\mu_k^{m_k} $

**Dirichlet distribution(Prior):** $ Dir(\mu | \alpha ) = \frac{\Gamma(\alpha_0)}{\Gamma (\alpha_1)… \Gamma(\alpha_K)} \prod_{k=1}{K}\mu_k^{\alpha_k-1} $ ** Here the term with $\Gamma$ aims to do normalization*

Posterior: $ p(\mu | D, \alpha) = \frac{\Gamma(\alpha_0 + N)}{\Gamma (\alpha_1 + m_1)… \Gamma(\alpha_K + m_K)} \prod_{k=1}^{K}\mu_k^{\alpha_k + m_K-1} $ |

The reason choosing beta distribution is that it’ll have the same functional form as posterior, which is a good property called *conjugacy*.