Probabilistic language models are once again foundational to many advances in natural language processing research, bringing the exciting opportunity to harness raw text to build language technologies. With the emergence of deep architectures and protocols for finetuning a pretrained language model, many NLP solutions are being cast as simple variations on language modeling. This talk is about challenges in language model-based NLP and some of our work toward solutions. First, we’ll consider evaluation of generated language. I’ll present some alarming findings about humans and models and make some recommendations. Second, I’ll turn to an ubiquitous design limitation in language modeling – the vocabulary – and present a linguistically principled, sample-efficient solution that enables modifying the vocabulary during finetuning and/or deployment. Finally, I’ll delve into today’s most popular language modeling architecture, the transformer, and show how its attention layers’ quadratic runtime can be made linear without affecting accuracy. Taken together, we hope these advances will broaden the population of people who can effectively use and contribute back to NLP.