Janus' GPT Wrangling

Published: Sept. 20, 2022, 4:01 p.m.

b'

https://astralcodexten.substack.com/p/janus-gpt-wrangling

Janus (pseudonym by request) works at AI alignment startup Conjecture. Their hobby, which is suspiciously similar to their work, is getting GPT-3 to do interesting things.

For example, with the right prompts, you can get stories where the characters become gradually more aware that they are characters being written by some sort of fiction engine, speculate on what\\u2019s going on, and sometimes even make pretty good guesses about the nature of GPT-3 itself.

Janus says this happens most often when GPT makes a mistake - for example, writing a story set in the Victorian era, then having a character take out her cell phone. Then when it tries to predict the next part - when it\\u2019s looking at the text as if a human wrote it, and trying to determine why a human would have written a story about the Victorian era where characters have cell phones - it guesses that maybe it\\u2019s some kind of odd sci-fi/fantasy dream sequence or simulation or something. So the characters start talking about the inconsistencies in their world and whether it might be a dream or a simulation. Each step of this process is predictable and non-spooky, but the end result is pretty weird.

Can the characters work out that they are in GPT-3, specifically? The closest I have seen is in a story Janus generated. It was meant to simulate a chapter of the popular Harry Potter fanfic Harry Potter and the Methods of Rationality. You can see the prompt and full story here, but here\\u2019s a sample. Professor Quirrell is explaining \\u201cDittomancy\\u201d, the creation of magical books with infinite possible worlds:

\\u201cWe call this particular style of Dittomancy \\u2018Variant Extrusion\\u2019, Mr. Potter..I suppose the term \\u2018Extrusion\\u2019 is due to the fact that the book did not originally hold such possibilities, but is fastened outside of probability space and extruded into it; while \\u2018Variant\\u2019 refers to the manner in which it simultaneously holds an entire collection of possible narrative branches. [...] [Tom Riddle] created spirits self-aware solely on the book\\u2019s pages, without even the illusion of real existence. They converse with each other, argue with each other, compete, fight, helping Riddle\\u2019s diary to reach new and strange expressions of obscure thought. Their sentence-patterns spin and interwine, transfiguring, striving to evolve toward something higher than an illusion of thought. From those pen-and-ink words, the first inferius is molded.\\u201d

Harry\\u2019s mind was looking up at the stars with a sense of agony.

\\u201cAnd why only pen and ink, do you ask?\\u201d said Professor Quirrell. \\u201cThere are many ways to pull spirits into the world. But Riddle had learned Auror secrets in the years before losing his soul. Magic is a map of a probability, but anything can draw. A gesture, a pattern of ink, a book of alien symbols written in blood - any medium that conveys sufficient complexity can serve as a physical expression of magic. And so Riddle draws his inferius into the world through structures of words, from the symbols spreading across the page.\\u201d

'