Tensorflow jane austen and text generation

Image gallery for: Tensorflow jane austen and text generation

TensorFlow, Jane Austen, and Text Generation

I remember the first time I saw a deep learning text generation project that was truly compelling and delightful to me. It was in 2016 when Andy Herd generated new Friends scenes by training a recurrent neural network on all the show’s episodes. Herd’s work went pretty viral at the time and I thought: via GIPHY And also: via GIPHY At the time I dabbled a bit with Andrej Karpathy’s tutorials for character-level RNNs; his work and tutorials undergird a lot of the kind of STUNT TEXT GENERATION work we see in the world. Python is not my strongest language, though, and I did not ever have a real motivation to understand the math of what was going on. I watched the masters like Janelle Shane instead. TensorFlow for R has changed that for me. Not only is the R interface that RStudio has developed just beautiful, but now these fun text generation projects provide a step into understanding how these neural networks model work at all, and deal with text in particular. Let’s step through how to take the text of Pride and Prejudice and generate ???? new ???? Jane-Austen-esque text. This code borrows heavily from a couple of excellent sources. Jonathan Nolis’ project on offensive license plates (That link is for their code; you can read a great narrative explanation as well.) RStudio’s example code for text generation Before starting, you will need to install keras so be sure to check out details on installation. Tokenize We are going to train a character-level language model, which means the model will take a single character and then predict what character should come next, based on the ones that have come before. First step? We need to take Pride and Prejudice and divide it up into individual characters. via GIPHY The code below keeps both capital and lowercase letters, and builds a model that learns when to use which one. This is computationally more intensive than training a model that only learns about the letters themselves in lower case; if you want to start off with that kind of model, change to the default behavior for tokenize_characters() of lowercase = TRUE. library(keras) library(tidyverse) library(janeaustenr) library(tokenizers) max_length % pull(text) %>% str_c(collapse = " ") %>% tokenize_characters(lowercase = FALSE, strip_non_alphanum = FALSE, simplify = TRUE) print(sprintf("Corpus length: %d", length(text))) ## [1] "Corpus length: 684767" chars % unique() %>% sort() print(sprintf("Total characters: %d", length(chars))) ## [1] "Total characters: 74" A good start! CHOP CHOP CHOP Next we want to cut the whole text into pieces: sequences of max_length characters. These will be the chunks of text that we use for training. dataset
Advertisement
A noob’s guide to implementing RNN-LSTM using Tensorflow

Machine Learning
rOpenSci unconference 2018 + introduction to TensorFlow Probability & the ‘greta’ package

Deep Learning
tfjs/GALLERY.md at master · tensorflow/tfjs

Deep Learning
GitHub - skorch-dev/skorch: A scikit-learn compatible neural network library that wraps PyTorch

Deep Learning
Advertisement
Regression using Tensorflow and Gradient descent optimizer

Deep Learning
Visualizing TensorFlow Graphs in Jupyter Notebooks

Machine and Deep Learning
Get started with TensorFlow Data Validation | TFX

Deep Learning
How to use ‘Tensorflow Serving’ docker container for model testing and deployment

Deep Learning
Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch

Technology
Google Dopamine: New Reinforcement Learning framework

Logo
How to do text classification with CNNs, TensorFlow and word embedding

Machine and Deep Learning
Advertisement
Advertisement
Advertisement
Finding patterns in time series using regular expressions

Machine Learning