Maximum Likelihood Estimation

In the previous post, we introduced the Accelerated Failure Time (AFT) model.

(1)   \begin{equation*}\ln{(y_i)} = \sum_{j=1}^d w_j x_{ij} + \sigma\epsilon_i\end{equation*}

We’d now like to estimate the weights w_1, w_2, \ldots, w_d. We will use a well-known technique in statistics called Maximum Likelihood Estimation (MLE):

Principle of Maximum Likelihood Estimation. Given a set of observations, the choice of parameter value \theta^{\mathrm{MLE}} that maximizes the likelihood of the observations is chosen as the “best” estimate for the parameter \theta.

Whew! That’s a mouthful of words! In the remainder of the post, we’ll take some time to unpack this definition. Then in a subsequent post we’ll use MLE to estimate the weights w_{*} in the AFT model (1).

Read more

Accelerated Failure Time model

Survival analysis is a huge topic in statistics. For a comprehensive survey, see this article from ACM. In this post, we will cover one popular model known as Accelerated Failure Time (AFT). The AFT model makes the following key assumptions:

Accelerated Failure Time assumption

1. A unit increase in each input feature multiples the log survival time by a constant factor.
2. The effects of features on the log survival time are additive.
3. Noise in the training data is random and does not depend on any particular data point.

Read more

What is survival analysis?

Survival analysis is a discipline within statistics where the statistician models the distribution of time to an event of interest. The rest of this post will unpack this definition.

Survival analysis is a special kind of regression and differs from the conventional regression task as follows:

  • The label is always positive, since you cannot wait a negative amount of time until the event occurs.
  • The label may not be fully known, or censored, because “it takes time to measure time.”
Read more

Setting Up My First Blog

I had great fun setting up my first ever blog. Some comments:

  • vs Self-hosting WordPress. I initially tried out because it was easy to set up and I wouldn’t have to worry about the cost of hosting. However, I soon ran into a significant limitation: you are not allowed to install WordPress plugins. It is true that I could remove this limitation by buying the Business plan ($25/month), but then at this price range I might as well pay for hosting the blog myself. So I installed WordPress on the web host I was using for my personal website ( The setup was a breeze, and I had 20 minutes later.
  • QuickLaTeX plugin is awesome. Why did I care so much about whether I can install plugins? Due to nature of this blog, I wanted top-quality presentation of mathematical formulas, and thus I needed a good way to embed LaTeX code in the posts. already has good support for inline formulas (see here), but unfortunately it lacked cross-references and did not allow embedding some “fancy” LaTeX packages I ended up needing (e.g. algorithmicx). But then I found the QuickLaTeX plugin and was blown away. I can use not only cross references and equation numbers but also use fancy LaTeX packages to typeset pseudocode. See it yourself:

Rendered by

  • Writing is not easy but fun. I thought I was pretty decent in writing. But as I wrote the first posts, I found myself re-writing sentences over and over again. First time, sentences would come out really awkward, then successively they’d get better and read more naturally. So writing is work. But it’s fun too: I get to organize what I learned. Somehow writing down my thought makes it clearer.

2019 Summer: GSOC!

This summer, I participated in Google Summer of Code (GSOC) 2019 as a mentor. GSOC is an awesome program where Google pays students stipend and have them work on open source projects under supervision of mentors.

How did I get involved? On March 8, Toby Hocking1 floated the idea of co-mentoring a student to work on XGBoost. Eager to get anyone to contribute to XGBoost, I took up on the offer.

What kind of work did we do? From May 6 to September 3, I mentored Avinash Barnwal2 to add a new objective function called Accelerated Failure Time (AFT). Well known model in the field of statistics, AFT is a popular model of choice for survival analysis, i.e. modeling time to event. See Avinash’s post for a summary.

Show me the code! See

Personal takeaway Before GSOC, I had no idea what survival analysis is, let alone AFT. Thankfully, Avinash gave me a bunch of papers to get me up to speed. I also got to ask him many questions on the subject of survival analysis. I intend to write a series of posts to summarize what I learned this summer.

About this blog

I extended XGBoost as part of my master’s thesis. As part of writing my thesis, I had to do a literature review about XGBoost and gradient boosting in general. XGBoost is a popular package (GitHub stars 17K+) and used in many winning solutions in Kaggle competitions, so I was surprised to learn that there isn’t much material about XGBoost internals. For example, XGBoost makes a crucial adaption to the well-known Gradient Boosting algorithm to speed up training1, but to my knowledge, there is not an accessible explanation of the adaptation. The KDD paper on XGBoost2 is rather light on algorithm description as well. I’ve already seen many users ask about XGBoost internals (see this and this). So far, I’ve shared my master’s thesis with users, and many found it useful. However, there are still many XGBoost internals I have yet to write about.

Hence this blog. This blog is meant to be an accessible and comprehensive collection of everything XGBoost. Also, it would serve as study notes for myself, as I review and study different concepts.