This post is the first in a series of articles about natural language processing (NLP), a subfield of machine learning concerning the interaction between computers and human language. This article will be focused on attention, a mechanism that forms the backbone of many state-of-the art language models, including Google’s BERT (Devlin et al., 2018), and OpenAI’s GPT-2 (Radford et al., 2019).