That's what she said: software that tells dirty jokes

Double entendres have been making us laugh since the days of Chaucer and Shakespeare, but up until now computers weren't in on the joke.

Chloé Kiddon and Yuriy Brun, two computer scientists at the University of Washington, have developed a system for recognising a particular type of double entendre - the "that's what she said" joke, in which seemingly innocent sentences can be transformed into lewd utterances by appending just four short words.

The pair describe the "TWSS problem" as recognising when it is funny to follow a sentence with "that's what she said" - they give "Don't you think these buns are a little too big for this meat?" as one example. The equivalent in the UK is appending sentences with "as the actress said to the bishop" and is used in the same way.

Automating this process means identifying sentences that contain potential euphemisms and follow a particular structure - a "hard natural language understanding problem", say the researchers. Kiddon and Brun began by analysing two different bodies of text - one containing 1.5 million erotic sentences, and another with 57,000 from standard literature.

They then evaluated nouns, adjectives and verbs with a "sexiness" function to determine whether a sentence is a potential TWSS. Examples of nouns with a high sexiness function are "rod" and "meat", while raunchy adjectives are "hot" and "wet".

Their automated system, known as Double Entendre via Noun Transfer or DEviaNT, rates sentences for their TWSS potential by looking for particular elements such as nouns that can be interpreted in multiple ways. The researchers trained DEviaNT by gathering jokes from twssstories.com and non-TWSS text from sites such as wikiquote.org.

The system turned out to be around 70% accurate, but the pair say this is deceptively low because much of the training data did not consist of TWSS jokes, and with a more even data set it could achieve 99.5% precision.

The results will be presented at the Annual Meeting of the Association for Computational Linguistics in June. Future work could also see DEviaNT extended to identify other kinds of jokes, say the researchers, writing "The technique of metaphorical mapping may be generalised to identify other types of double entendres and other forms of humor".

That's what she said.

(Image: Rachael Voorhees/Flickr)

This post originally appeared on New Scientist.