These new language systems learn by analyzing millions of sentences written by humans. A system built by OpenAI, a lab based in San Francisco, analyzed thousands of self-published books, including romance novels, science fiction and more. Google’s Bert analyzed these same books plus the length and breadth of Wikipedia.
Each system learned a particular skill by analyzing all that text. OpenAI’s technology learned to guess the next word in a sentence. Bert learned to guess missing words anywhere in a sentence. But in mastering these specific tasks, they also learned about how language is pieced together.
If Bert can guess the missing words in millions of sentences (such as “the man walked into a store and bought a ____ of milk”), it can also understand many of the fundamental relationships between words in the English language, said Jacob Devlin, the Google researcher who oversaw the creation of Bert. (Bert is short for Bidirectional Encoder Representations from Transformers.)
The system can apply this knowledge to other tasks. If researchers provide Bert with a bunch of questions and their answers, it learns to answer other questions on its own. Then, if they feed it news headlines that describe the same event, it learns to recognize when two sentences are similar. Usually, machines can recognize only an exact match.
Bert can handle the “common sense” test from the Allen Institute. It can also handle a reading comprehension test where it answers questions about encyclopedia articles. What is oxygen? What is precipitation? In another test, it can judge the sentiment of a movie review. Is the review positive or negative?
This kind of technology is “a step toward a lot of still-faraway goals in A.I., like technologies that can summarize and synthesize big, messy collections of information to help people make important decisions,” said Sam Bowman, a professor at New York University who specializes in natural language research.
In the weeks after the release of OpenAI’s system, outside researchers applied it to conversation. An independent group of researchers used OpenAI’s technology to create a system that leads a competition to build the best chatbot that was organized by several top labs, including the Facebook AI Lab. And this month, Google “open sourced” its Bert system, so others can apply it to additional tasks. Mr. Devlin and his colleagues have already trained it in 102 languages.