Maximum Likelihood Estimation

In the previous post, we introduced the Accelerated Failure Time (AFT) model.

(1)   \begin{equation*}\ln{(y_i)} = \sum_{j=1}^d w_j x_{ij} + \sigma\epsilon_i\end{equation*}

We’d now like to estimate the weights w_1, w_2, \ldots, w_d. We will use a well-known technique in statistics called Maximum Likelihood Estimation (MLE):

Principle of Maximum Likelihood Estimation. Given a set of observations, the choice of parameter value \theta^{\mathrm{MLE}} that maximizes the likelihood of the observations is chosen as the “best” estimate for the parameter \theta.

Whew! That’s a mouthful of words! In the remainder of the post, we’ll take some time to unpack this definition. Then in a subsequent post we’ll use MLE to estimate the weights w_{*} in the AFT model (1).

Read more

Accelerated Failure Time model

Survival analysis is a huge topic in statistics. For a comprehensive survey, see this article from ACM. In this post, we will cover one popular model known as Accelerated Failure Time (AFT). The AFT model makes the following key assumptions:

Accelerated Failure Time assumption

1. A unit increase in each input feature multiples the log survival time by a constant factor.
2. The effects of features on the log survival time are additive.
3. Noise in the training data is random and does not depend on any particular data point.

Read more

What is survival analysis?

Survival analysis is a discipline within statistics where the statistician models the distribution of time to an event of interest. The rest of this post will unpack this definition.

Survival analysis is a special kind of regression and differs from the conventional regression task as follows:

  • The label is always positive, since you cannot wait a negative amount of time until the event occurs.
  • The label may not be fully known, or censored, because “it takes time to measure time.”
Read more