Computer program deciphers a dead language that mystified linguistsS

The lost language of Ugaritic was last spoken 3,500 years ago. It survives on just a few tablets, and linguists could only translate it with years of hard work and plenty of luck. A computer deciphered it in hours.

The computer program relies on a few basic assumptions in order to make intuitive guesses about the language's structure. Most importantly, the lost language has to be closely related to a known, deciphered language, which in the case of Ugaritic is Hebrew. Second, the alphabets of the two languages need to share some consistent correlations between the individual letters or symbols. There should also be recognizable cognates of words between the two languages, and words that have prefixes or suffixes in one language (like verbs that end in "-ing" or "-ed" in English) should show the same features in the other language.

That might seem like a lot of information for the program to require, but even all that is no guarantee of decipherment. After Ugaritic was first discovered in 1929, it remained untranslatable for years. It finally revealed some of its secrets to German cryptographer Hans Bauer, who was only able to make substantial headway when he guessed the drawing of an ax was next to the Ugaritic word for "ax." Even this breakthrough wasn't a complete success, because although Bauer's guess was correct he matched the wrong sounds and letters together, resulting in a mistranslation.

So, the question for the computer program wasn't just how quickly it could translate Ugaritic compared to its human counterparts; there's also whether it could avoid the mistakes and pitfalls that had slowed down the initial decipherment. The program worked by looking for correlations and correspondences at the various levels of languages described above - individual sounds and letters, different segments of the word, and cognates between languages. It then mapped the similarities between Hebrew and Ugaritic, starting with the sounds and then bringing in the other aspects to figure out the most probable matches. By cross-referencing these different parts of language and repeating the process hundreds of thousands of times, the program arrives at a fully deciphered Ugaritic.

The results were stunning. Of the thirty letters in the Ugaritic alphabet, the computer correctly identified twenty-nine of them. Of the roughly third of all Ugaritic words that share Hebrew cognates, the program figured out sixty percent of them, and many of the errors were only off by a letter or two. These results are particularly encouraging because the program still doesn't use any contextual clues, meaning it can't differentiate between the different uses of a Ugaritic word that means both "daughter" and "house", something that is (thankfully) pretty easy to identify in context. The program also wasn't able to use the "ax" coincidence that had made the human decipherment of the language possible. Best of all, the program did all this in only a few hours.

Ugaritic itself is an awesomely fascinating language. Spoken 3,500 years ago in the city of Ugarit, located in modern Syria, the language is a Semitic relative of Hebrew, although its alphabet closely resembles the cuneiform used in ancient Sumeria. The surviving Ugaritic texts tell the stories of a Canaanite religion that is similar but not identical to that recorded in the Old Testament, providing Bible scholars a unique opportunity to examine how the Bible and ancient Israelite culture developed in relation to its neighbors.

[Original paper]