Approaching LLMs, softly-softly

This last week at work, in a personal first, I used the ChatGPT API in earnest to prototype something. I am working with Hidden Door on generative narrative experiences. This is basically the perfect application for large language models, applied judiciously, as a piece of the puzzle, and by people who care about safety and control (I use “safety” in the sense of “safe to put in front of people”, not in the sense of existential risk).

I’ve maintained a thread of interest in language models since reading The Unreasonable Effectiveness of Recurrent Neural Networks way back in 2015, several centuries of language model development ago. I remember watching my tiny character-level RNN generate Shakespeare (then, when someone goaded me at a local Python meetup, Trump speeches) one character at a time. It felt kind of magical. In 2018, I joined Fast Forward Labs, right around when BERT was released. We did a whole bunch of experimentation with it; benchmarking on common NLP tasks, and building toy apps. Just last year, I made THE EMOTINOMICON using the Cohere API.

Yet, when speaking to a colleague a few days ago, I mentioned I hadn’t really used LLMs. I guess what I meant to say was I haven’t used this very latest generation of LLMs much (which is now also not true). Recent developments — ChatGPT & co. — with their heavy emphasis on prompt engineering, feel like a qualitatively different thing.

Given this sustained, if sparse, interest, it’s maybe a bit surprising that I’ve been cautious on current generation LLMs. I should be all over this hype wave, if only as a quiet voice of moderation, but I haven’t been. It’s not because I don’t think the technology is useful, when thoughtfully applied.

I’ve kept my professional distance from LLMs because, well, it seems like lots of people are yelling about them. Not angry yelling, just yelling. (I mean, there’s some angry yelling). I think I have ChatGPT screenshot fatigue. It’s easy, fast, and cheap engagement bait to post a screenshot and incredulous reaction. Writing about GPT has the same sort of grift density as Web3. Here’s hoping I can lower that density by one microgrift-per-unit-blog here.

Most of the successes and failures seem, if not obvious, certainly explicable, when we acknowledge that what is happening is (really, really good) prompt completion. It is surprising that prompt completion works so well, but prompt completion is what is happening, mechanistically. A bunch of matrices are multiplied, etc, and we repeatedly sample from a conditional distribution of tokens.

GPT is just a sequential model of language (yeah, yeah, I’m ignoring the multi-modal abilities of GPT-4. Spirit, not letter.), but it turns out that a sufficiently good sequential model of language is enough to kind of fake “reasoning”, at least about things that can be interpolated from the content of the internet. We can build useful stuff with that level of reasoning. I like Simon Willison’s analogy: GPT as a (nondeterministic) calculator for words. I also like sci-fi author Ted Chiang’s: ChatGPT is a Blurry JPEG of the Web. Neither is precise, but that’s the point of an analogy, and both are more useful mental models than GPT-as-oracle.

I’m finally allowing myself to get a little bit excited, and explore applications beyond text generation. We can build around the fact that LLM output is probabilistic text. We can apply parsers for structured output, allow for regenerating responses, and provide sensible fallbacks. We can pipe into an embedding model and place ourselves in the familiar land of similarity search. We can do the engineering around language models to make them genuinely useful, and it isn’t even very hard engineering.

The ability of LLMs to make interpolative queries of essentially unbounded amounts of text seems to me to represent a novel computational capability. A capability in the same sense that structured query languages, and concurrency primitives, and, I dunno, the DOM, each give us some capabilities. LLMs will not replace all of software, though as they find their way into the software development process itself, I expect an exponentially increasing amount software will have involved an LLM in its construction. One must hope that they are employed intelligently, as an aid, and not carelessly, as a replacement for thinking. They are not fully capable general purpose reasoners, they are not sentient. They are a usefully different sort of capability than what brains have, and what previous software systems have had.

I’m looking forward to the breathlessness waning, and exploring how we integrate these things into useful tools for humans to use.