Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations.

Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I discuss two alternative approaches to constructing unbiased gradient estimates in machine learning problems. The first approach uses randomized truncation of objective functions defined as loops or limits. Such objectives arise in settings ranging from hyperparameter selection, to fitting parameters of differential equations, to variational inference using lower bounds on the log-marginal likelihood.

The second approach revisits the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost.

MIT Introduction to Deep Learning 6.S191: Lecture 6 with Ava Soleimany.

Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

Lecture Outline

  • 0:00 – Introduction
  • 0:58 – Course logistics
  • 3:59 – Upcoming guest lectures
  • 5:35 – Deep learning and expressivity of NNs
  • 10:02 – Generalization of deep models
  • 14:14 – Adversarial attacks
  • 17:00 – Limitations summary
  • 18:18 – Structure in deep learning
  • 22:53 – Uncertainty & bayesian deep learning
  • 28:09 – Deep evidential regression
  • 33:08 – AutoML
  • 36:43 – Conclusion

Microsoft Research features a talk by Wei Wen on Efficient and Scalable Deep Learning (slides)

In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.

See more on this talk at Microsoft Research:

Jon Wood shows us how to install the C# Jupyter Kernel and then uses it to build a ML.NET AutoML experiment with the DataFrame package.

Installation instructions – https://github.com/dotnet/try/issues/408#issue-487051763

Notebook – https://github.com/jwood803/MLNetExamples/blob/master/MLNetExamples/Notebooks/Dataframe%20with%20AutoML.ipynb

Sample Notebook from Microsoft – https://github.com/dotnet/try/blob/master/NotebookExamples/csharp/Samples/HousingML.ipynb

ML.NET allows .NET developers to easily build and also consume machine learning models in their NET applications.

In this episode, Bri Achtman joins Rich to show off some really interesting scenarios that ML.NET and its family of tools enables. They talk about training models, AutoML, the ML.NET CLI, and even a Visual Studio Extension for training models!

Useful Links

In this video, Siraj Raval explores Automatic Machine Learning or “AutoML,” a field of Artificial Intelligence that’s gaining a lot of ground of late. The idea is that doing any kind of task related to machine learning involves a whole lot of steps like cleaning a dataset, choosing a model, deciding what the right configurations of that model should be, deciding what the most relevant features are etc.

From the video description:

The goal of AutoML is to automate all of that up to a point where all a data scientist would need to do is tell a machine to perform some task using a dataset and wait for it to learn how by itself. In this episode, i’m going to explain several popular AutoML techniques, then compare top AutoML frameworks like AutoKeras, Auto Sklearn, h20, Ludwig, etc. to help you decide which one will be the best for your needs.

What will the future of Data Science work look like when technologies like AutoML promise to automate much of it?

Here’s an interesting look from TDWI.

AutoML is the umbrella term for tools and platforms that automate the steps of selecting the right model and optimizing its hyperparameters to generate the best model possible under a given set of data. There are libraries such as auto-sklearn and auto-WEKA that provide these autoML capabilities.

Siraj Raval has a great talk on genetic algorithms and neuroevolutionary strategies offer us a way to replicate the process of natural selection en silico.  (Bonus points for the Latin usage, Siraj!)

Google already uses self-creating AI as part of its AutoML service that finds the best model for customers.