The Recent Progress in Natural Language Machine Learning and Processing

Google introduced in October 2019 at first only in their English language searches a new Transformer-based machine learning technique for natural language processing pre-training. It is called Bidirectional Encoder Representations from Transformers (BERT). It seems to have a significant effect on how the Google search queries are (will be) processed.

In terms of language machine learning, BERT has introduced evolutionary aspects to how machines can understand and interpret language, even without the inclusion of meaningful context. BERT is unique in that it has implemented a process of bidirectional unsupervised language representation. This represents a major step in language machine learning and the potential to create applications and programs and even machines that “understand” exactly what you are saying, no matter how you may choose to say it.

It is the bidirectional aspect that makes BERT unique, as in the past, similar algorithms were only capable of using the preceding words or the following words, each independently of the other, to determine the context and potential responses to a query. While BERT is built upon and incorporates previous models, it is the first one to utilize the bidirectional approach.

Previous models were limited to examine sequences of text either from left to right or from right to left in order to determine the context of the words being used and to determine meaningful responses. Some have argued that BERT is in reality, non-directional though it hardly seems more than a matter of semantics.

Despite BERT, Human Translators Will Still Be Needed

It is true that BERT is capable of developing an inherent and deep “understanding” of language and its usage. The implications thus, move far beyond the direct web searches. This same approach may ultimately be used to assist in the evaluation of websites and ultimately, to change how language machine learning is used to determine the merits of the content on a page and its relevance in much more detail (see the last section).

BERT may also bring about radical changes in professional translation services given its ability to recognize figures of speech and other subtle and nuanced linguistic characteristics that have previously only been recognizable to human translators and interpreters. Thus we can expect machine translation to rival human translation services. However, in order to avoid any potentially embarrassing translation mistakes, qualified human translators will probably still be needed for a very long time.

Humans generally have a sufficient, inherent understanding not only of the principles of grammar, but the ability to understand words in the context of the conversation being spoken. Machine translation services generally lack this capacity, at least before the days of BERT. To demonstrate this, we will examine a single example of polysemy (the coexistence of many possible meanings for a word or phrase):

“Makeup”. How do you translate or interpret that word without any context? Without any context whatsoever, it could mean the makeup that someone uses to decorate or otherwise enhance their face and facial features. Makeup may refer to the physical composition or makeup of an item. Makeup may also be the happy ending that comes from an unsuccessful breakup.

While this is one of those cases that makes the work of a live interpreter exceptionally challenging, the human mind by and large, is capable of understanding which version of the word is in use based on the context of the conversation. A machine still has no real understanding of these words in the human sense.

Open Source Communities and Natural Language Processing

One move that was largely unexpected, though perhaps not to everyone, was the fact that all of the source code for BERT was made available to developers as Open Source code. Is Google learning from the success of Linux? Google released the source code for BERT in October of 2018. Will BERT be improved with so many developers having access to the source code?

BERT comes in two models: BERTlarge and BERTbase. If there is any real challenge here for the open source community, it seems to be the requisite computing power necessary for BERTlarge as opposed to merely testing with BERTbase.
BERTlarge uses a set with 345 million parameters, making it the largest model of its kind. BERTbase on the other hand, despite the fact that it uses the same architecture, has a comparatively small set of parameters at 110 million. Both models were pre-trained on the same data set containing 3,300 million words.

While the issues of computing processor power may be challenging for some smaller developers, Google should have no such concerns, having developed TPU processors or Tensor Processing Units to handle such tasks. The TPU is a custom integrated circuit specifically designed for machine learning and to handle the open source machine learning framework of Google, TensorFlow.

Implications for Language Machine Learning and AI Research

Besides improving Google search engine, some other exciting implications are being seen in academic and research papers from other developers using the BERT source code in their research into language machine learning.

BERT now offers the language machine learning apps to “recognize” figures of speech and other linguistic “anomalies” even with an incomplete context:
In the English language there are roughly 180,000 words that are in relatively common use, and about 50,000 obsolete words. However, the addition of industry specific language, antiquated words and other figurative and literal speech produced the Oxford English Dictionary (unabridged and unexpurgated edition) with approximately 600,000 listings.

The challenge for language machine learning applications is to put all of those words into context, and to filter out expressions of speech, localized vernacular and other lexical anomalies that may or may not merit a literal translation.

BERT may also be a potential candidate to finally fully satisfy the requirements of the Turing Test.

Implications for the Website Content

Since the early days of personal computers, Word Processors have long had the capacity to recognize basic grammar and language. These days, there are language websites, apps and programs like Hemingway and Grammarly that help people to write more succinctly and clearly, though not always so freely.

When the power of BERT is combined with the power of Google bots and spiders, the content on the websites may become even more important than it is now: As the computers come to have a better “understanding” of the content on a website, it will be substantially easier to determine which pages are written solely for the search engines. As a consequence, most of the web content writers may very well be forced to write content once again for the human website visitors.

Image1
Image Source: Google.com - Official BERT Logo