Setting Up My First Blog

I had great fun setting up my first ever blog. Some comments:

  • WordPress.com vs Self-hosting WordPress. I initially tried out wordpress.com because it was easy to set up and I wouldn’t have to worry about the cost of hosting. However, I soon ran into a significant limitation: you are not allowed to install WordPress plugins. It is true that I could remove this limitation by buying the Business plan ($25/month), but then at this price range I might as well pay for hosting the blog myself. So I installed WordPress on the web host I was using for my personal website (NearlyFreeSpeech.net). The setup was a breeze, and I had blog.hyunsu-cho.io 20 minutes later.
  • QuickLaTeX plugin is awesome. Why did I care so much about whether I can install plugins? Due to nature of this blog, I wanted top-quality presentation of mathematical formulas, and thus I needed a good way to embed LaTeX code in the posts. WordPress.com already has good support for inline formulas (see here), but unfortunately it lacked cross-references and did not allow embedding some “fancy” LaTeX packages I ended up needing (e.g. algorithmicx). But then I found the QuickLaTeX plugin and was blown away. I can use not only cross references and equation numbers but also use fancy LaTeX packages to typeset pseudocode. See it yourself:

Rendered by QuickLaTeX.com

  • Writing is not easy but fun. I thought I was pretty decent in writing. But as I wrote the first posts, I found myself re-writing sentences over and over again. First time, sentences would come out really awkward, then successively they’d get better and read more naturally. So writing is work. But it’s fun too: I get to organize what I learned. Somehow writing down my thought makes it clearer.

2019 Summer: GSOC!

This summer, I participated in Google Summer of Code (GSOC) 2019 as a mentor. GSOC is an awesome program where Google pays students stipend and have them work on open source projects under supervision of mentors.

How did I get involved? On March 8, Toby Hocking1 floated the idea of co-mentoring a student to work on XGBoost. Eager to get anyone to contribute to XGBoost, I took up on the offer.

What kind of work did we do? From May 6 to September 3, I mentored Avinash Barnwal2 to add a new objective function called Accelerated Failure Time (AFT). Well known model in the field of statistics, AFT is a popular model of choice for survival analysis, i.e. modeling time to event. See Avinash’s post for a summary.

Show me the code! See https://github.com/dmlc/xgboost/pull/4763

Personal takeaway Before GSOC, I had no idea what survival analysis is, let alone AFT. Thankfully, Avinash gave me a bunch of papers to get me up to speed. I also got to ask him many questions on the subject of survival analysis. I intend to write a series of posts to summarize what I learned this summer.

About this blog

I extended XGBoost as part of my master’s thesis. As part of writing my thesis, I had to do a literature review about XGBoost and gradient boosting in general. XGBoost is a popular package (GitHub stars 17K+) and used in many winning solutions in Kaggle competitions, so I was surprised to learn that there isn’t much material about XGBoost internals. For example, XGBoost makes a crucial adaption to the well-known Gradient Boosting algorithm to speed up training1, but to my knowledge, there is not an accessible explanation of the adaptation. The KDD paper on XGBoost2 is rather light on algorithm description as well. I’ve already seen many users ask about XGBoost internals (see this and this). So far, I’ve shared my master’s thesis with users, and many found it useful. However, there are still many XGBoost internals I have yet to write about.

Hence this blog. This blog is meant to be an accessible and comprehensive collection of everything XGBoost. Also, it would serve as study notes for myself, as I review and study different concepts.