Computers just got a lot better at mimicking human language. Researchers created computer programs that can write long passages of coherent, original text.

Language models like GPT-2, Grover, and CTRL create text passages that seem written by someone fluent in the language, but not in the truth. That AI field, Natural Language Processing (NLP), didn’t exactly set out to create a fake news machine. Rather, it’s the byproduct of a line of research into massive pretrained language models: Machine learning programs that store vast statistical maps of how we use our language. So far, the technology’s creative uses seem to outnumber its malicious ones. But it’s not difficult to imagine how these text-fakes could cause harm, especially as these models become widely shared and deployable by anyone with basic know-how.

Read more here: 

Machine Learning with Phil ponders the question: “is it better to specialize or generalize in artificial intelligence and deep learning?”

The answer depends on your career aspirations. Do you want to be a deep learning research professor?

Do you want to go to work for Google, Facebook, or other global mega corporations?

Or do you want to be your own unicorn start up founder?

Each has their own specialization requirements that Phil breaks down in this video.

Machine Learning with Phil show you how to do sentiment analysis with TensorFlow 2 in this natural language processing (NLP) tutorial.

This natural language processing model is relatively straight forward, as it’s just an encoder coupled to some bidirectional layers and a couple dense layers to handle the classification. We’ll compare two different models, one with a single LSTM layer and the other with two LSTM layers and some dropout.

Here’s a talk by Danny Luo Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.Toronto Deep Learning Series, 6 November 2018


Natural language processing (NLP) powered by deep learning is about to change the game for many organizations interested in AI, thanks in particular to BERT (Bidirectional Encoder Representations from Transformers).

Watch this webinar if you want to learn how BERT will power a new wave of language-based applications, from sentiment analysis to automatic text summarization to similarity assessment and more.

Microsoft Research features a talk by Wei Wen on Efficient and Scalable Deep Learning (slides)

In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.

See more on this talk at Microsoft Research: