About this blog

I extended XGBoost as part of my master’s thesis. As part of writing my thesis, I had to do a literature review about XGBoost and gradient boosting in general. XGBoost is a popular package (GitHub stars 17K+) and used in many winning solutions in Kaggle competitions, so I was surprised to learn that there isn’t much material about XGBoost internals. For example, XGBoost makes a crucial adaption to the well-known Gradient Boosting algorithm to speed up training1, but to my knowledge, there is not an accessible explanation of the adaptation. The KDD paper on XGBoost2 is rather light on algorithm description as well. I’ve already seen many users ask about XGBoost internals (see this and this). So far, I’ve shared my master’s thesis with users, and many found it useful. However, there are still many XGBoost internals I have yet to write about.

Hence this blog. This blog is meant to be an accessible and comprehensive collection of everything XGBoost. Also, it would serve as study notes for myself, as I review and study different concepts.

  1. The original gradient boosting algorithm uses first-order gradients (derivatives), whereas XGBoost uses second-order gradients. This speeds up convergence of gradient boosting.
  2. Published by Tianqi Chen, the original creator of XGBoost.

Leave a Reply

Your email address will not be published. Required fields are marked *