Survival analysis is a huge topic in statistics. For a comprehensive survey, see this article from ACM. In this post, we will cover one popular model known as **Accelerated Failure Time (AFT)**. The AFT model makes the following key assumptions:

Accelerated Failure Time assumption1. A unit increase in each input feature multiples the log survival time by a constant factor.

2. The effects of features on the log survival time are additive.

3. Noise in the training data is random and does not depend on any particular data point.

In math, we express the AFT assumption as follows:

(1)

where

- is the (true) label for the data point.
- is the value of feature for the data point.
- is the weight (coefficient) associated with the feature.
- is a random Gaussian noise drawn from the normal distribution with mean 0 and standard deviation 1, i.e. . We assume that are i.i.d.
- is a parameter that scales the size of the Gaussian noise .
- is the number of features available in the training data.
- is the size of training data.
- is the natural logarithm.

Let’s make a few observations. First, the multiplicative effect of features on the log survival time. Let’s suppose that we modified the feature in the training data point from to (added a unit) while keeping other features the same. Then the new value for is times the old value:

(2)

(3)

Second, the effects of features on the log survival time are additive. That is, we can increase the value of two features and simultaneously and their effects will add on top of each other. Define and , and is multiplied by :

(4)

(5)

Third, the error term is independent of the choice of data point . The terms are independently drawn from the standard normal distribution . This means that we can learn absolutely nothing about from either the data point or other error terms . In a future post, we will use this assumption to simplify the task of computing the maximum likelihood estimate for the weights .

Lastly, it is straightforward to predict with the AFT model. Given a previously unseen data point , we compute the point estimate of the log survival time as follows:

(6)

We’ve made quite a few assumptions here and there that made the AFT model simple to understand and use. Unfortunately, real world is messy and sometimes not all of these assumptions are justified. For example, what if we have two features that interacted with each other negatively or positively? Then the effects of feature increases won’t be additive. Or what if the error term is not i.i.d.? For example, it is known that clinical trials have a sex bias, with more men enrolled than women. In this case we’d have a good reason to suspect that the error term would have larger variance for women. The problem of non-i.i.d. error terms is well studied in the field of econometrics^{1}. For now, I’ll gloss over this issue, since I don’t know much of econometrics. However, the other issue, the non-additive interaction of features, will be addressed in a future post (Hint: use decision trees!)