Humans master millions of words, but computationally speaking: how can we manipulate large amounts of text using programming techniques?