Matteo Interlandi on Project Hummingbird

Hello and Welcome to Data Driven.

In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird.

Listen below or at the episode page on the Data Driven website.

Transcript

Transcript

00:00:00 BAILey

Hello and welcome to dated driven.

00:00:02 BAILey

In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird.

00:00:09 BAILey

Now on with the show.

00:00:10 Frank

Second, hello and welcome to data driven.

00:00:21 Frank

The podcast where we explore the emerging fields of data science, machine learning and artificial intelligence.

00:00:27 Frank

If you’d like to think of data as the new oil, then you can consider us.

00:00:30 Frank

Car Talk because we focus on where the rubber meets the virtual road and with me on this epic Rd.

00:00:36 Frank

We’re on the information superhighway as oh is Andy Leonard.

00:00:39 Frank

How you doing Andy?

00:00:40 Andy

I’m well Frank, how are?

00:00:41 Frank

You I’m doing alright. We’re recording this on Wednesday, September 1st, 2021 and the the.

00:00:51 Frank

The the remnants of Hurricane Ida are ripping through the DC area.

00:00:57 Frank

Uh, so if, uh, if I suddenly get dropped, that’s because we probably lost power.

00:01:03 Frank

But I do have the backup generator, the one that the professionals installed and my.

00:01:10 Frank

Duct taped together a solar generator so.

00:01:15 Frank

I will be offline.

00:01:17 Frank

For a short.

00:01:18 Frank

Bit and hopefully come back online.

00:01:20 Frank

How how you doing, Eddie.

00:01:23 Andy

I’m doing alright Frank. Well, we are you know I’m about gosh 250 miles South of UM we didn’t get near the near the effects of Hurricane Ida as you did.

00:01:34 Andy

We’re getting a little bit of rain now.

00:01:36 Andy

We’ve had some wind.

00:01:37 Andy

Gusts, but it’s been really mild, and if you look on the radar.

00:01:41 Andy

Gotta watch it into track and I I do.

00:01:43 Andy

I’m a weather weenie and amateur but it it just kind of went around us to the to the West and it actually started the east when it got a little north of us and aimed right for your house.

00:01:54 Andy

I was looking outside that’s where Frank lived, right?

00:01:56 Andy

And look, the eye is coming right for.

00:01:58 Andy

Frank what’s left?

00:02:00 Frank

Well, fortunately we’re safe.

00:02:02 Frank

There was some kind of flooding in Rockville and the small overnight, and some folks they got up.

00:02:09 Frank

No one, nobody died that I’m.

00:02:10 Frank

Aware of so.

00:02:11

It it says.

00:02:12 Frank

You know we’re not.

00:02:13 Frank

Custom the floods or hurricanes or tornadoes up here in DC and and we’re more used to the human threats of, you know, little things like terrorism and things.

00:02:25 Frank

Like that, but.

00:02:26 Andy

Yeah yeah, you guys got a little bit more to worry about that than we do here in FarmVille, right?

00:02:32 Andy

But you know these days.

00:02:33 Andy

Who knows?

00:02:35 Andy

The, uh, definitely our thoughts and prayers are with the folks in in Louisiana and Mississippi.

00:02:40 Andy

They were hit very hard.

00:02:42 Andy

I’ve got got friends in Georgia, Western Georgia were telling me that.

00:02:47 Andy

They they took a beating as well and you know it just it looks horrible I.

00:02:53 Andy

I you know, I’ve I’ve been in a few of those places after hurricanes have hit as part of like church efforts to help clean up and stabilize and stuff like that.

00:03:04 Andy

It looks like I don’t know.

00:03:06 Andy

They people describe it as like a war.

00:03:09 Andy

I’ve never been in a war so I don’t know.

00:03:10 Andy

I’ve seen pictures and.

00:03:13 Andy

There’s a lot.

00:03:14 Andy

It looks like a lot of stuff is blowing over, and that sort of.

00:03:16 Andy

Stuff, it’s just.

00:03:18 Andy

So, and they’re talking weeks and weeks before power comes back on.

00:03:22 Frank

That’s horrible, that’s.

00:03:23 Andy

Similar places, yeah.

00:03:25 Frank

That’s that’s.

00:03:26 Frank

Probably going to be do more damage from for a lot of things.

00:03:30 Andy

Were you worried?

00:03:30

But on a.

00:03:30 Frank

More positive note, uh, a positive note.

00:03:31 Andy

Yes, on a positive note.

00:03:35 Frank

Uh, we are.

00:03:37 Frank

I am super excited to have a special guest and I say super excited because he’s from Microsoft.

00:03:42 Frank

He’s a senior scientist in Jelt at Microsoft, working on scalable machine learning systems.

00:03:50 Frank

Before he was at Microsoft, he was a postdoc scholar at the Computer Science department at UCLA, and this he was doing a lot of interesting stuff there.

00:04:03 Frank

He was doing research at Qatar or Qatar.

00:04:05 Frank

I’m not sure how to say that exactly, but he has a PhD in computer science.

00:04:11 Frank

In university.

00:04:12 Frank

Of Modena and or?

00:04:15 Frank

I’m going to botch this.

00:04:15 Frank

Reggio Emilia.

00:04:17 Frank

Welcome to the show, Mateo.

00:04:22 Frank

Awesome, so we are really excited to have you here.

00:04:25 Frank

We actually booked you a whole month in advance.

00:04:27 Frank

I’ve been looking forward to this.

00:04:29 Frank

Yeah, because you’re coming by way of some of the folks at the Mlad conference.

00:04:35 Frank

And for those who don’t know, I’m a I’ve mentioned this.

00:04:37 Frank

Mlad stands for machine learning and data science summit.

00:04:40 Frank

It used to be in person I think now it’s entirely virtual for the foreseeable future.

00:04:45 Frank

Uh, but that why I attended M lads in 2016 summer of 2016 and it was uh, it was life altering like I don’t say that.

00:04:55 Frank

Lightly so.

00:04:56 Frank

So Microsoft does amazing work in the machine learning and data science space.

00:05:02 Frank

Very much cutting edge stuff very much I.

00:05:06 Frank

I wouldn’t say under the radar, but Microsoft does not do a great job putting its own horn, so we’re very excited for you to come on Mateo and talk about this little project that you’re working on.

00:05:17 Frank

And what is the is it have a code name or what?

00:05:20 Frank

What is it called?

00:05:22 Matteo

Hummingbird should the code name is actually I’m in.

00:05:26 Matteo

Don’t have any specific internal names for.

00:05:28 Matteo

This for this.

00:05:28 Frank

OK, what what is GL stand for?

00:05:32 Frank

That was my that was my first question.

00:05:33 Frank

When I saw your bio.

00:05:35 Matteo

Uh is for Gray system lamp and is the after Jim Gray which.

00:05:41

Oh, OK.

00:05:41 Matteo

Is putting award yeah?

00:05:45

OK.

00:05:46 Matteo

So these are the search lab after this name yeah and use within the Azure data organization.

00:05:49

Oh, interesting.

00:05:53 Frank

And uhm, So what?

00:05:56 Frank

What what cool stuff does Hummingbird do?

00:06:00 Matteo

So, Hummingbird, uh?

00:06:03 Matteo

Is a little bit, uh, weird project in the sense that when we started this project we didn’t know if it was going to.

00:06:10 Matteo

To be a success or not?

00:06:12 Matteo

Because what we try to do basically is to uhm translate traditional machine learning models and into neural networks.

00:06:22 Matteo

Actually not Internet format into tensor programs such that then we can run over tensor runtime, such as pipers.

00:06:30 Matteo

In terms of.

00:06:32 Matteo

Uhm, so when we started this project actually idea was hey there is a lot of investment in general pulling into this neural network frameworks and.

00:06:45 Matteo

Coming from the Azure data organization, instead, we are more interested in these traditional machine learning methods such as decision trees.

00:06:52 Matteo

Linear models were not encoding all those boring traditional algorithms.

00:07:00 Matteo

And so we look at this.

00:07:01 Matteo

The neural network system and say hey how we can take advantage of all this technology that is built.

00:07:05 Matteo

Into this domain so you can run neural.

00:07:08 Matteo

Network over CPU.

00:07:10 Matteo

Over the GPU, then you can use like fancy compilers to compile to generate the transfer programs.

00:07:16 Matteo

All those sort of techniques and we were.

00:07:19 Matteo

Kind of struggling.

00:07:20 Matteo

To see what we could do with the with this stack and and what we come up with with is this Amber project.

00:07:27 Matteo

So we basically take a.

00:07:32 Matteo

Traditional machine learning pipelines composed right feature iser and machine learning models.

00:07:37 Matteo

After the day trained.

00:07:39 Matteo

So first you need to train it using cycle ornamental net or.

00:07:43 Matteo

Uhm, uhm, one of those traditional machine learning platforms and then once it is trained we basically convert it into a set of tensor operations in.

00:07:54 Matteo

In the current version we use basically PY torch for doing this conversion and then basically you have a pipeline model so you can do whatever you can do with Python.

00:08:03 Matteo

Models so you can deploy it in in it into a PY torch.

00:08:08 Matteo

Uhm, deployments you can run over CPU ran over the GPU or you can do the torch script if you want to get rid of all the Python dependency and just have a C++ program you can.

00:08:19 Matteo

Do all those all those tricks.

00:08:22 Frank

Interesting, does it impact accuracy precision?

00:08:26 Frank

Does it improve it?

00:08:27 Frank

Keep it the same.

00:08:29 Matteo

We tried to keep it the same so we are able to keep.

00:08:33 Matteo

It The same up to floating point numbers roundings?

00:08:36 Matteo

So since we use, you know we use PY torch to run these programs and not like a socket or ornamental net.

00:08:44 Matteo

There are some differences in how they do you know, floating point operations.

00:08:48 Matteo

So the.

00:08:49 Matteo

Accuracy is up to roundings in the Floating Points, which sometimes are actually.

00:08:54 Matteo

It can be quite a bit, but most of the time is really small, almost not noticeable.

00:09:00 Frank

Interesting, interesting, uhm.

00:09:03 Frank

Do you would you know.

00:09:05 Frank

If there was like.

00:09:06 Frank

A discrepancy, or you Dutch as part of testing?

00:09:09 Matteo

It’s part of testing.

00:09:10 Frank

Right, all software is tested, right Andy?

00:09:11 Matteo

So we have we have.

00:09:13 Frank

Sometimes intentionally is that the email.

00:09:15 Andy

That’s right.

00:09:17 Frank

And he has a saying where all softwares I I forget exactly what it is.

00:09:21 Frank

But what is it?

00:09:23 Andy

Yeah, all software is tested, some intentionally.

00:09:27 Frank

There you go.

00:09:30 Frank

Uhm, so what’s the?

00:09:33 Frank

What’s the real?

00:09:34 Frank

What are?

00:09:34 Frank

What are the advantages of of of converting kind of a traditional model over to a tensor model?

00:09:41 Frank

Is it?

00:09:41 Frank

Is it portability?

00:09:42 Frank

Is it speed?

00:09:43 Frank

You did mention that you can run it on.

00:09:45 Frank

You could take advantage of GPU as well as CPU.

00:09:51 Matteo

Yes, exactly so you most mostly is related to speed, so you can basically run your socket, learn model on GPU end to end and and this user provides you know a little bit of quite a bit of speed up we for some of our example we even saw like 2 ordinal Magneto speedups.

00:10:11 Matteo

For some of the models.

00:10:13 Matteo

And uhm, and usually we try to show that.

00:10:18 Matteo

If you use GPU.

00:10:19 Matteo

Can be much faster, but on CPU we try to be kind of as close as possible scikit learn or the base or the base or diminished model.

00:10:27 Matteo

Sometimes we can, sometimes we are a little bit slower.

00:10:31 Matteo

Uh, but we.

00:10:32 Matteo

We had some really interesting result.

00:10:34 Matteo

Like for instance, we did some experiment with some.

00:10:39 Matteo

Some folks at the VM and we took some extra boost model and we compiled some training accuracy boost model.

00:10:47 Matteo

Uh, using Hummingbird anti VM into some uh, we basically do code generation and we show that the that model that was compiled to Python was even faster than they quoted the C++ implementation that they’re having next used, but those CPU and GPU. Yeah, there was kind of OK. What’s going on?

00:11:06 Matteo

This is not.

00:11:08 Matteo

This was not expected.

00:11:08 Frank

Wait, did you say it was faster than a C++ implementation?

00:11:11 Matteo

Yes, I mean if she used.

00:11:13 Matteo

Underneath C++ even scikit learn.

00:11:15 Matteo

You know they use like.

00:11:16 Matteo

From C++ library and yeah, using TVM for doing the code generation, they are able to do like a operator fusion which you don’t normally have for like these traditional models.

00:11:28 Matteo

So we told these tricks bigger, basically that are coming from the neural network.

00:11:31 Matteo

Famous we were able to get like this.

00:11:34 Matteo

These surprising numbers.

00:11:36 Frank

Interesting, so that’s a real performance boost, and probably if you scale that up into the cloud that probably.

00:11:44 Frank

Means a lot of money saving too in terms of on cloud computing things like, I imagine a company like the size of Microsoft would be very interested in getting better results faster with less cloud compute.

00:11:56 Frank

You did mention an acronym, I just wanna make sure folks know.

00:11:59 Frank

What that is?

00:12:00 Frank

Tyvm what is that?

00:12:03 Matteo

Uh, I don’t know what is exactly for, uh, some tensor maybe?

00:12:08 Frank

Andy looks like he knows, but he’s on mute.

00:12:10 Andy

I don’t, yeah I I don’t know.

00:12:13 Frank

OK, I’m just curious.

00:12:13 Andy

I’ll go look it up.

00:12:15 Frank

There you go.

00:12:16 Andy

EVM acronym.

00:12:19 Matteo

I think is for tensor virtual machine, but I’m.

00:12:21 Matteo

Not sure if this is approach.

00:12:22 Frank

That sounds about right.

00:12:23 Frank

Tector, yeah tencer.

00:12:26 Frank

Vector machine.

00:12:28 Andy

Ah, I see.

00:12:30 Andy

So thanks very much comes up, that’s interesting.

00:12:34 Frank

Well, we’ll we’ll figure out what it is putting.

00:12:36 Andy

Put tensor in here at TTVM you said.

00:12:36 Frank

This junction so.

00:12:40 Frank

Yeah, yeah.

00:12:40 Matteo

Yes, is a project is a GitHub project, but I think it also is Apache project and these are our top where you have.

00:12:45 Andy

Yeah there TV m.apache.org yeah.

00:12:50 Andy

And it doesn’t tell me what it stands for, but that’s that’s where you can go and learn more about it.

00:12:55 Andy

It’s according to the website and end to end machine learning compiler framework for CPU, GPU’s and accelerators.

00:13:05 Andy

Interesting, it does sound interesting, yeah.

00:13:09 Frank

That’s what’s great about this space.

00:13:10 Frank

There’s so much you could geek out on and spend like.

00:13:15 Frank

Like I’m just looking through, I found some, uh, a web, an article on machine learning, knowledge dot AI about Hummingbird and it’s just like wow.

00:13:25 Frank

They basically it looks like they copied and pasted the fake.

00:13:29 Frank

From here.

00:13:29 Frank

It’s intelligent, but it does look fascinating in terms of what it can.

00:13:35 Frank

Do so so.

00:13:36 Frank

What what motivated what motivated the creation of Hummingbird?

00:13:43 Matteo

So the motivation was actually different, so the so the initial motivation was actually tried to.

00:13:51 Matteo

To do.

00:13:54 Matteo

Uh, not to accelerate.

00:13:56 Matteo

The trischen machining pipelines, but to use differentiation.

00:14:00 Matteo

Uhm, basically all this, uh, backpropagation.

00:14:04 Matteo

All these tools that are using for training over neuron actors and try to translate them over traditional machine learning models.

00:14:11 Matteo

So try to do basically backpropagation over scikit learn pipelines.

00:14:15 Matteo

And that is the biggest tool.

00:14:17 Matteo

So we started with this tool that basically was translating this tradition machine pipelines.

00:14:22 Matteo

This second only pipelines at the beginning are into Pytorch such that we can do end to end differentiation.

00:14:27 Matteo

But then once.

00:14:28 Matteo

We have we were at.

00:14:29 Matteo

Point and of course, as you can imagine, we were trying to do end to end differentiation for increasing increasing accuracy of the pipeline to see whether if you use backpropagation you can increase accuracy.

00:14:40 Matteo

And then once we did this translation, we basically realized that OK, since we are on Python sword, we can exploit all these other, uh, you know the Python framework and hardware acceleration on those other two rings.

00:14:52 Matteo

And then basically we kind of ditch this idea of doing end to end differentiation and running by propagation over over the pipelines and instead we focus more.

00:15:00 Matteo

Going to be linear system for accelerating inference prediction over distillation, machine learning.

00:15:07 Andy

So I’m curious, Mateo.

00:15:09 Andy

This is not my fortune Franks, the data scientists of our pair.

00:15:13 Andy

Here I am a data engineer, so can you give me an example of a problem that I I get the speed part of this, I really do.

00:15:25 Andy

I we need that in data engineering too.

00:15:27 Andy

I think everyone needs needs that performance part, but can you give me an example of something that you’ve applied this to?

00:15:34 Andy

And you already gave us a, you know, a interesting number about how much faster it was.

00:15:39 Andy

A couple of good references from that.

00:15:41 Andy

Was there something in particular that you’ve worked on or that your team has worked on and applied this and saw some you know some interesting results?

00:15:52 Matteo

So I mean first of all, I’m a database person too.

00:15:54 Matteo

I’m not a machine learning, so another I think would be speaking the same language.

00:15:57 Andy

OK.

00:15:59 Matteo

I’m a I’m a database person that.

00:16:02 Matteo

Yeah, it’s.

00:16:03 Matteo

I’m trying to basically understand all the machine learning domain and see how much that amazing can take advantage of these techniques.

00:16:10 Matteo

And my needs help.

00:16:12 Matteo

Uh, I mean the the start of my investigation was traditional method because those are the ones that.

00:16:17 Matteo

You in general.

00:16:18 Matteo

Use or tabular data, that is the one that we have.

00:16:23 Matteo

At the most.

00:16:23 Matteo

Dumb and so related to use cases.

00:16:30 Matteo

Let me think so we.

00:16:32 Matteo

Uhm, so we try to use it internally for some of our first party customer.

00:16:38 Matteo

Uhm, to just because they have like cyclotron models.

00:16:42 Matteo

And they want to kind.

00:16:43 Matteo

Of try to see if they can speed up the the inference of this.

00:16:46 Matteo

The prediction over these models.

00:16:48 Matteo

Uhm, when someone reaching out from outside, uh, mostly with kind of try to accelerate like a 33 based algorithm such as gradient boosting light GBM, extra boost those those.

00:17:03 Matteo

Teams and yeah.

00:17:06 Matteo

Yeah, in general the use case are really.

00:17:08 Matteo

Simple is you know you have a secretary models and you want to deploy your your your secretary models.

00:17:14 Matteo

Uh, and when you deploy you want to take advantage of GPU.

00:17:18 Matteo

You did because you already have some GPU deployments, so you already have some neural network.

00:17:22 Matteo

Uh, there and uh you also want to take advantage of the GPU that you are in your deployment by with this.

00:17:30 Matteo

Yeah, traditional models or just because you have like a a traditional model, you want to increase the the inference time.

00:17:32

Got you?

00:17:38 Matteo

I have to say that the most of the performance boost we usually see is related to batch inference, so not when you’re doing one single one single point inference, but when you have like a batch of records that we can basically saturate the performance of a GPU of a GPU order for instance.

00:17:55 Andy

So just to follow up on that, then it sounds like a lot of what you’re doing is.

00:18:02 Andy

You know you’re focused on the on the tool that does these translations for you into other platforms.

00:18:08 Andy

Other technologies allows you to use you know GPU versus CPU, and I think what you’re creating if I understand you and I didn’t do my homework, apologies.

00:18:20 Andy

I think what you’re building is away.

00:18:22 Andy

To to to exactly what we were joking about earlier about testing.

00:18:27 Andy

You want to see how can I get the peak performance?

00:18:31 Andy

For you know this part of of that.

00:18:33 Andy

Maybe this module or this operation of the batch and maybe the answer here and you mentioned this may be the answer here.

00:18:41 Andy

Is CPUs or GPUs? Maybe it’s C++ and you’re just able to, you know, kind of pick the high spots and say I’m getting order.

00:18:50 Andy

Case of performance.

00:18:51 Andy

The low spots right?

00:18:52 Andy

Just stuff that runs it fast.

00:18:54 Andy

And then you can put that together and hand it back to your client or someone who’s interested in it and say right now, given the volume and the data and the state of hardware, you can get the maximum performance.

00:19:07 Andy

If you do this part here and that part there, that part there is that fair.

00:19:13 Matteo

So you’re you’re actually looking into the some future work that we are investigating now so kind.

00:19:18 Matteo

Of is matching.

00:19:18

OK.

00:19:19 Matteo

The different for the different part of the pipeline.

00:19:22 Matteo

So what we focus actually right now is try to translate the machine learning models end to end, so taking the featurization’s and all the models and.

00:19:31 Matteo

Then because basically we saw that that is the the where we can get most of the time, that is where we can get to the mass, the mass maximum performance because by looking at the model end to end we can run it completely over the GPU instead of having to go back and forth from GPU to CPU for example.

00:19:47 Matteo

But what you point out is something that we are considering.

00:19:51 Matteo

So kind of look at the model, not as a kind of, you know, a unique.

00:19:55 Matteo

The black box kind of a artifact, but is something that we can actually split in different parts and eventually we can run it in over different over different hardware over different runtime.

00:20:08 Matteo

I’m such such TV.

00:20:09 Matteo

As I said before, so some particle on TV and some parts random Pytorch the the sort of those sort.

00:20:14 Frank

Of things so kind of like a meta optimizer.

00:20:15 Andy

OK, it’s a combination.

00:20:18 Andy

Like that’s exactly where I was going.

00:20:19 Andy

Yeah, it’s like you’re tuning stored Procs Mateo.

00:20:24 Andy

And you’re deciding I want this one to run on SQL Server.

00:20:27 Andy

I want that one to go to Postgres.

00:20:29 Andy

And yeah, it’s just that that is interesting that you can span hardware and software.

00:20:36 Andy

You can pick platforms in the software.

00:20:39 Andy

To do it.

00:20:40 Andy

And I I’m with you.

00:20:41 Andy

I got my head around us now and I I think that’s really really cool I the this just sounds like something that’s going to accelerate the field really.

00:20:51 Andy

Because if you the last time you’re sitting around twiddling your thumbs waiting for a result, you know the more you can get done.

00:20:59 Andy

I mean, that’s just.

00:21:00 Andy

Common sense, so I love what you guys are doing.

00:21:01

Yeah, yeah exactly.

00:21:04 Andy

That’s that’s really cool and I like that.

00:21:07 Andy

I don’t think I’ve ever heard anybody talk about.

00:21:10 Andy

You know, changing libraries and changing you know hardware platforms even.

00:21:17 Andy

I mean it’s it’s hard to even say I don’t know what you’d even classify that as because running different chips you know, running the processes on different chipsets.

00:21:26 Andy

That’s something we used to do back.

00:21:28 Andy

In the seventh, you know.

00:21:29 Andy

I mean, but it was.

00:21:30 Frank

Let’s just say that Harkins back to like the.

00:21:31 Andy

Mainframe days it kind of does. I mean 68 hundreds and his the 80s and all of that and but?

00:21:39 Andy

I mean, this is way, way, way more advanced than all that, but I like the idea.

00:21:46 Andy

I like being able to to do that and I hear what you’re saying right now.

00:21:50 Andy

You’re just after picking a platform, picking on an approach and saying, you know we’re going to run this.

00:21:57 Andy

We’re going to generate C++. It’s going to run on CPU’s, and that’s overall that’s going to be your fastest result. It’s going to give you your best performance.

00:22:06 Andy

I I get you.

00:22:07 Andy

But that I I didn’t realize I jumped ahead there.

00:22:10 Andy

But that happens sometimes rare, but it happens.

00:22:15 Andy

Y’all could totally take that idea Mateo and run with.

00:22:18

Yeah, if you.

00:22:19 Matteo

You can run right the paper together if you want to.

00:22:22 Frank

There you go.

00:22:22 Frank

You know, right?

00:22:23 Andy

Away I could.

00:22:24 Andy

I could do the punctuation.

00:22:28 Frank

He’s really good at.

00:22:29 Frank

Reviewing stuff, I will say that his personal experience from him him reviewing my articles in the now defunct MSDN magazine.

00:22:31

Here we go.

00:22:38 Andy

I remember that those were fun.

00:22:39 Andy

I learned a lot reviewing your articles.

00:22:42 Andy

Frank ’cause you were always on the cutting edge.

00:22:44 Frank

I try.

00:22:45 Andy

Yeah, neat stuff what?

00:22:46 Frank

But this this Hummingbird stuff looks really cool and it looks like it’s as easy to install as PIP install Hummingbird.

00:22:54 Matteo

Just be missing.

00:22:54 Frank

Hummingbird, Dash MLI think it is.

00:22:57 Matteo

Yes, yeah, that number was already taken off course.

00:23:00 Frank

Well, yeah, but no.

00:23:02 Frank

This is really cool.

00:23:02 Frank

Like I I I like where this is going.

00:23:05 Frank

I like the potential for it.

00:23:06 Frank

’cause you with the cloud you know you.

00:23:09 Frank

You think about.

00:23:11 Frank

Database as a.

00:23:12 Frank

Service like you don’t.

00:23:13 Frank

You know you don’t care what the heart women you care but I mean like from the end developers point of view.

00:23:19 Frank

They won’t necessarily care what type of hardware like that.

00:23:21 Frank

This does open.

00:23:22 Frank

Up some very interesting possibilities, just just kind of piggybacking on kind of what Andy said.

00:23:27 Frank

It’s like, wow, I mean one of the things and I forget who said it?

00:23:31 Frank

Might have been Kevin Hazzard, who said that you know now we live in an age where we’re not dealing with just spinning platters.

00:23:39 Frank

We can imagine.

00:23:41 Frank

What database time butchering what he said?

00:23:44 Frank

But he he did say he says a lot of profound things and one of the most profound things he said was something like you know what?

00:23:50 Frank

What would a database in the future look like?

00:23:52 Frank

Because we’re not.

00:23:52 Frank

Dealing with spinning platters is that did.

00:23:54 Frank

I get that right Andy or something along those lines.

00:23:55 Andy

You did he. He blogged about it out devattorney.com. We’ll have to look that up with the show news, but Kevin is one of those.

00:24:06 Andy

He’s a pretty pretty, profound thinker, and

00:24:08 Frank

I was going to say, uh, she’s a very deep thinker like he’s always like 10 moves ahead.

00:24:09 Andy

Yeah, I could tell.

00:24:14 Andy

Yeah, and I could tell reading the article ’cause I’ve known him for it.

00:24:18 Andy

Sort of you.

00:24:19 Andy

We’ve known him for a decade or more and he was struggling with trying to articulate the concept.

00:24:25 Andy

And if it’s tripping someone like Kevin Hazzard up, it’s pretty powerful console.

00:24:30 Frank

Right, right?

00:24:31 Andy

But he did a good job in devjourney.com. He’s not blogging as much ’cause he’s just too stinking busy. But yeah, you’re right. It. And I had a similar conversation.

00:24:44 Andy

With you know with with my son Stevie Ray not too long ago we were talking about.

00:24:52 Andy

You know flash drives, and you know that the memory that we have now is so much faster than the platters and I I made this comment to him and I kind of stopped and thought I don’t know if that’s accurate or not and maybe Mateo since you’re here working on a cutting edge, you can help us.

00:25:08 Andy

We were just poking around thinking about operating systems.

00:25:11 Andy

And we do a lot are here at the House in FarmVille, VA with IoT.

00:25:16 Andy

In fact, he’s building a new collection of sensors for me right now for nor do we know.

00:25:20 Andy

So we’re going to hook it to a π, because Pi’s can talk to, you know, to the Internet they can talk to our router, and that’s the next big secret. Don’t tell anybody.

00:25:31 Andy

Kidding, but.

00:25:33 Andy

It’s the one of the neat things about these Pi architectures versus even really powerful service that we have right now is both.

00:25:42 Andy

You can compare them.

00:25:43 Andy

They’re both messaging systems, they’re they’re just passing around messages physically on a bus.

00:25:47 Andy

When you get to that Pi level, and that’s how I learned it, so I’m really excited about him learning.

00:25:52 Andy

That way, but.

00:25:53 Andy

Nobody thought about because we didn’t.

00:25:55 Andy

We couldn’t conceive of it when hard drives came out.

00:25:58 Andy

Nobody thought about building.

00:26:00 Andy

The OS or something.

00:26:02 Andy

Second, you know second generation or higher language on that without those spinning disk.

00:26:08 Andy

And here’s the here’s my long winded place.

00:26:11 Andy

I wanted to get to is I don’t know.

00:26:15 Andy

If we’re there now, even I imagine there’s probably some OS is out there that.

00:26:22 Andy

Or setting on GitHub, there’s probably 100 of them by now that people are exactly doing that. They’re taking advantage of the new IO if you will, but I don’t think the big systems are doing it. I don’t think the major popular operating systems are and for good reason. They’re stable, it’s.

00:26:42 Andy

It’s hard to change all of that.

00:26:42 Frank

Well, there’s a lot of inertia.

00:26:45 Frank

When you when you have a widely deployed operating system, you you get a lot of inertia and you know I’m not.

00:26:51 Frank

And I’m not talking about just Windows, I mean iOS.

00:26:53 Frank

I mean Android, I mean Linux like.

00:26:54

Sure, sure.

00:26:55 Frank

Once you have a wide install base, you you lose the.

00:26:58 Frank

Ability to be very experimental.

00:27:01 Andy

Yeah, I totally concur with them and I see.

00:27:05 Andy

I see the cloud, I see Azure.

00:27:07 Andy

I see the you know that this leap that’s happened and it’s just it’s crazy to try.

00:27:13 Andy

I don’t even keep up with it, but just reading tidbits, reading, editing Franks articles and the like, it’s just taking these quantum leaps.

00:27:21 Andy

It’s like 10 years worth of stuff happening every six months.

00:27:26 Andy

And you guys just keep knocking it out, and I imagine at some you know at the Gray Systems lab that you’re surrounded by people who are just, you know, in Star Trek land or something.

00:27:41 Matteo

Happy yeah yeah.

00:27:44 Matteo

Yeah I totally agree on every.

00:27:45 Matteo

All the things that you said.

00:27:46 Matteo

Like I I was presenting a project related to Hummingbird.

00:27:50 Matteo

Actually kind of like a few days ago and I was preparing my.

00:27:54 Matteo

And I and I.

00:27:55 Matteo

Come up with this slide, I think.

00:27:56 Matteo

It was from just.

00:27:57 Matteo

Doing a few years back and.

00:27:59 Matteo

It basically was showing the number.

00:28:01 Matteo

Of papers that.

00:28:01 Matteo

Were published on machine learning or the public on archive and in in 2018 they were published 100 paper a day just to machine learning on that kind of just.

00:28:11 Andy

My fingers.

00:28:13 Matteo

Just to give an idea on how fast is now, the pace in which innovation is coming up, especially when the machine learning neural network domain is just.

00:28:22 Matteo

On on operating system database domain is a little bit slower, I would say because a Frank said that there is an answer there because this system are deployed and if you want to add even new hardware it will takes it takes forever.

00:28:37 Matteo

So I say Microsoft what happens when you have like a new outdoor community and you want to exploit it?

00:28:42 Matteo

It just sticks.

00:28:45 Matteo

And this is just because you know they’re used by many people, and even if you want to do a small change here, sweetheart.

00:28:53 Andy

And I’m seeing the articles about Windows 11 where when you try to make a change like that and say hey you need this minimum hardware.

00:29:00 Andy

Now everybody is going.

00:29:03 Frank

Oh yeah, yeah, everybody got the pitchforks out and like freaking out and like, yeah, I mean I, I remember I was at I was at Microsoft doing evangelism on the shift to Windows 8.

00:29:15 Frank

Just you would not believe this.

00:29:17 Frank

Well maybe you would, I don’t know.

00:29:18 Frank

But like just the the horror and people faces when they got rid of the start button like it was just like it was like the end of the world like you were you were killing somebody grandma.

00:29:26 Frank

Like you know it’s just.

00:29:27 Frank

Like it was, just like I mean, I disagree with the decision that was made, but but let’s let’s put it in perspective.

00:29:34 Frank

You know?

00:29:37 Frank

But, uh, but yeah, I mean.

00:29:37 Andy

You could still get there.

00:29:40

You can still start.

00:29:41 Andy

Things, but you could.

00:29:42 Frank

Still start things like in and and before.

00:29:46 Frank

This is funny like this is this is just a complete sidetrack in material.

00:29:50 Frank

We do this a lot.

00:29:51 Andy

’cause it never happens. Mateo.

00:29:53 Frank

Before keyboards had the Windows Key, there’s a you can hit control escape and it pulls up the same thing like.

00:30:01 Frank

Like I don’t know like it’s just.

00:30:03 Frank

Not the end.

00:30:03 Frank

Of the world anyway, sorry it flashed back to 2012, but so Mateo.

00:30:10 Frank

We have a bunch of kind of pre canned questions we’re going to ask you.

00:30:14 Frank

We ask this from all of our guests.

00:30:16 Frank

Most of them are about half of them, or kind of fill in the blanks, but the first one is how did you find?

00:30:22 Frank

Your way into data.

00:30:23 Frank

Did you find data or did data find you?

00:30:27 Matteo

Uh, I would say data finally.

00:30:32 Matteo

I think it was mostly because when I started my PhD, I wanted to do distributed systems.

00:30:39 Matteo

And for some reason I end up doing distributed system in a lab in a database lab.

00:30:44 Matteo

So I think that is why I think the data found me because I want I wanted to do something else.

00:30:49 Matteo

But then I end up doing data that probably was.

00:30:54 Matteo

I was really lucky to be honest.

00:30:57 Andy

Cool, very cool.

00:31:00 Andy

So our second question is what’s the favorite part?

00:31:03 Andy

Your favorite part of your current job?

00:31:08

Uh, no, this is.

00:31:09 Matteo

A hard question.

00:31:11 Matteo

Uh, I will say that I really love my management in the sense that they allow me us in general to be.

00:31:20 Matteo

We sort of independent in the sense that you know we are researcher and they allow us.

00:31:28 Matteo

They they find a way to.

00:31:30 Matteo

Kind of strike.

00:31:31 Matteo

A balance between having us be independent and kind of do our own research with crazy ideas like the one that.

00:31:37 Matteo

I presented with Hummingbird.

00:31:39 Matteo

And still be kind of, you know.

00:31:41 Matteo

With our foot on the ground and and kind of helping product improve improve.

00:31:46 Matteo

The system etc.

00:31:48 Matteo

So I think that is mostly what I love, so I on one I I can kind of look in what we.

00:31:53 Matteo

Can do next.

00:31:54 Matteo

Like having the operators running over different target and on the other I can kind of see what are the real problems that are coming from from from product and how we.

00:32:03 Matteo

Can solve.

00:32:03 Matteo

Them and I love this to be honest and I love this.

00:32:08 Frank

Awesome, our first complete this sentence when I’m not working I enjoy blank.

00:32:15 Matteo

I would say work but they will not.

00:32:23

Yeah, I don’t know.

00:32:25 Matteo

Maybe family at this point, maybe family spending a lot of time in family with the commute time.

00:32:29 Matteo

We are often at home and I have a two years old that is driving us nuts.

00:32:39 Andy

That’s pretty cool.

00:32:41 Andy

So we have.

00:32:41 Frank

My youngest did zoom kindergarten over zoom and it’s just as chaotic as it sounds.

00:32:47 Frank

Almost put it that way.

00:32:50 Matteo

Yeah, I cannot imagine I mean to be honest.

00:32:52 Matteo

Now he’s in daycare and we are really happy that now is in daycare because I’m, you know, at that age.

00:32:57 Matteo

But I guess that every kid needs to have interaction with.

00:33:00 Matteo

The with other.

00:33:01 Matteo

Kids and just stay at home is not, is not is not healthy, but I can’t imagine how.

00:33:06 Matteo

Hard it is to.

00:33:07 Matteo

Have like one year at home and.

00:33:09 Matteo

Having class or two courses.

00:33:12 Matteo

Yeah, I agree.

00:33:15 Andy

Go ahead, I’m sorry.

00:33:17 Matteo

Joe said, I hope that this all.

00:33:18 Matteo

This situation will end soon.

00:33:20 Frank

Me too yeah.

00:33:21 Matteo

It means it doesn’t like you, but.

00:33:23 Andy

Yeah, same here.

00:33:25 Andy

I think we all do the uh, I think it’s going to be one of those things where we look back for decades probably, and see these little things that we’re really not noticing right now.

00:33:36 Andy

We’re just coping and managing and going on that.

00:33:40 Andy

You know, we’re gonna look back and go.

00:33:41 Andy

Wow, you know that changed this.

00:33:44 Andy

And that, and there’s all these things that come from it.

00:33:47 Andy

I, I hope, mostly good.

00:33:48 Andy

But I think it takes us time to figure out the good.

00:33:53 Andy

I I look forward to that time.

00:33:56 Andy

When we are.

00:33:56 Andy

Reflecting and reminiscing on stuff like this.

00:34:01 Andy

I I want to, but we have to be on.

00:34:03 Andy

The other side though.

00:34:05 Andy

Yes, our our second of three complete descendants is is, I think, the coolest thing in technology today is blink.

00:34:20

I I.

00:34:23 Matteo

I mean, there’s other.

00:34:24 Matteo

Search, usually I’m attracted by things that I don’t know.

00:34:28 Matteo

Uh, so we’ll say something like quantum computing because I don’t know anything about quantum computing.

00:34:36 Matteo

Yeah, I I don’t know.

00:34:39 Frank

So go to impactquantum.com.

00:34:44 Andy

I’m smiling because I was waiting for Frank.

00:34:46 Frank

I actually it’s funny because in the I.

00:34:50 Frank

Went to the last M lads that was held in person. It was fall 2019 and the second day keynote was a hardware keynote and you know I go to uh, data science conference.

00:35:01 Frank

I want our data science like I I was kind of mad that they had a hardware person up and but then she started talking about quantum and it was just blew.

00:35:08 Frank

My mind, and ever since then I I.

00:35:11 Frank

I’ve really wanted to, I really.

00:35:14 Frank

I was just so overly excited about, like quantum computing, but the thing about quantum computing is, you know that night at the hotel.

00:35:22 Frank

Like you know I installed the Q Sharp SDK and stuff like that and then I was like OK Now what?

00:35:27 Frank

Because it made no flippin sense.

00:35:32 Frank

So I’ve been kind of on this, you know, intermittently, this journey of kind of learning more about quantum computing, so starting the podcast on impact quantum and then starting kind of like the blog.

00:35:42 Frank

Have kind of forced me to keep at least the regular cadence of figuring out what’s going on there, so it’s it’s fascinating.

00:35:49 Frank

I will say the one thing I’ve learned is the importance of linear algebra.

00:35:53 Frank

Apparently, linear algebra and the way the algorithms work in quantum systems tend to explain each other very well so.

00:36:02 Frank

But yeah, so definitely a quad impact.

00:36:05 Frank

Quantum.com is.

00:36:06 Frank

A blog I’ve I’ve started last week and regularly updating it, but that way.

00:36:13 Frank

But that’s you know, ending the shameless plug.

00:36:15 Frank

But I agree with you, I think quantum computing would be a very cool thing to explore for a number of reasons.

00:36:21 Frank

The the next and final completed sentence is I look forward to the day when I can use technology to blank.

00:36:32 Matteo

He used technology and I cannot have to drive the car that is like censoring cars is something I live in Los Angeles, so for me it’s half dozen cars.

00:36:40 Matteo

Can be.

00:36:40 Matteo

Kind of complete life change.

00:36:45 Frank

I totally agree, I I I used to enjoy driving like I used to.

00:36:50 Frank

I grew up.

00:36:52 Frank

I I didn’t have a license that was like 21 so like it was just like for me. I’ve done my time on mass transit.

00:36:57 Frank

I’ll put it that way, but like living in DC Everywhere is just bumper to bumper to do. Probably a lot like LA and it just really takes the joy out of it. And you know.

00:37:10 Frank

One of the things my last job.

00:37:11 Frank

At Microsoft I was at the MTC.

00:37:13 Frank

And the only thing I didn’t want to take that job was because I had to drive to Virginia.

00:37:20 Frank

Which despite it being 9 miles of the crow flies could take.

00:37:25 Frank

Could take 90.

00:37:26 Frank

Minutes to two hours, but as I don’t want to say as luck would have it, ’cause it certainly wasn’t lucky.

00:37:33 Frank

The pandemic kind of made it so I could work remotely and never had to do it.

00:37:37 Frank

But you know, I I I share your dream.

00:37:40 Frank

At day of the.

00:37:41 Frank

Of the driverless of the you know self driving cars so you can.

00:37:44 Frank

You can read you can you know be on the computer you can do work while you’re driving and things like.

00:37:48 Frank

That yeah, I’m I’m right there with you.

00:37:51 Matteo

Yeah, I I totally agree.

00:37:52 Matteo

With what you said.

00:37:53 Matteo

I mean, I’m from I’m from Italy and now I’m from Montana, which is where.

00:37:59 Matteo

Basically, we say we like a fast car and good food, so we have like Ferrari we have Ducati we have.

00:38:06 Matteo

They rolled into over that so.

00:38:08 Matteo

I was growing up with like hearing the Ferrari when they tried in.

00:38:11 Matteo

The in the.

00:38:13 Matteo

In the circuit AV in Chirag no.

00:38:16 Matteo

I I.

00:38:16 Matteo

Leave like I think 3.

00:38:18 Matteo

Or 4 miles from Fiona is still like a year when they turned.

00:38:21 Matteo

The engine on how?

00:38:22 Matteo

Loud were was that so I really like cars but.

00:38:25 Matteo

Yeah, I can not stand.

00:38:28 Matteo

You know, I believe the traffic line with other cars just for like for instance for going to work or to for going grocery shops.

00:38:35 Matteo

And it’s just kind of a waste of time.

00:38:37 Frank

Especially Ferrari, Ferrari is meant to go run free.

00:38:42 Andy

Yes, yes.

00:38:44 Andy

But that thing in Texas.

00:38:46 Frank

That’s right my my neighbor, a couple of my neighbors have.

00:38:48 Andy

Let her go.

00:38:51 Frank

Of one of my neighbors has a Ferrari and you can hear it go by. It sounds beautiful here go by so I totally relate somebody down the street owns a Jaguar V12.

00:39:05 Frank

And when that thing goes by, it’s like angels singing I.

00:39:09 Frank

I know it’s a British car and an Italian car, and that’s probably heresy.

00:39:12 Frank

But I will say it is sounds sounds impressive.

00:39:16 Frank

Uh, so so it sounds like.

00:39:20 Frank

You might also be a car guy.

00:39:22 Frank

Or at least used.

00:39:23 Frank

To be yeah.

00:39:24 Matteo

Yeah, yesterday.

00:39:26 Andy

Back home

00:39:28 Andy

So our next one is share something different about yourself, but a little caution.

00:39:35 Andy

It’s a.

00:39:36 Andy

It’s a family friendly podcast.

00:39:38 Andy

We want to keep that iTunes clean rating here, so don’t make us at it.

00:39:48 Matteo

Yeah, I don’t know.

00:39:49 Matteo

I mean I don’t know what about to share really.

00:39:51 Matteo

I’m kind of spending all my time either I work with or with family, so I probably have the boring life ever.

00:39:58 Matteo

Do you think that?

00:40:00 Matteo

I I think it is good.

00:40:02 Matteo

I mean I don’t know.

00:40:02 Matteo

If it’s good, the fact that now we are.

00:40:04 Matteo

Working from home.

00:40:05 Matteo

I have kind of more time to.

00:40:08 Matteo

Focus on other different things.

00:40:10 Matteo

Like for instance, I could watch stops right before I couldn’t watch stocks, and while I was at work.

00:40:16 Matteo

Uh, because I can drive my laptop and when I have a meeting I can just take a take.

00:40:20 Matteo

A peek and of course I can strip my stock there.

00:40:23 Matteo

Uh, while while I’m while I’m working.

00:40:27 Matteo

Uh, and yeah, and like I think it kind of yeah, kind of like a uh.

00:40:33 Matteo

Kind of looking at the stock market, especially because now is.

00:40:37 Matteo

A little bit.

00:40:37 Matteo

There’s a little bit of fraud around, so all these mem, stock, etc.

00:40:41 Matteo

Is you make exciting, but there’s a little bit dangerous so.

00:40:48 Frank

It’s become like a sport and if you will.

00:40:52 Matteo

Yeah, I mean I was trying this then.

00:40:55 Matteo

Auto renewed app.

00:40:56 Matteo

When they say gamification of stock market, I don’t know if you haven’t tried that is is crazy.

00:41:00 Matteo

It looks like gambling at all.

00:41:03 Frank

Right?

00:41:03 Matteo

It looks like.

00:41:08 Frank

And the final question, do you listen to audiobooks, and if so, do you have any recommendations?

00:41:16 Matteo

No, I don’t listen to any books.

00:41:18 Matteo

I think I’m more kind of on the old.

00:41:20 Matteo

Style I would say I.

00:41:23 Matteo

I prefer using it to read.

00:41:25 Matteo

Uh, rather than listen.

00:41:27 Matteo

You know, I.

00:41:28 Matteo

Don’t know why.

00:41:29 Matteo

I don’t know why.

00:41:31 Frank

I think it.

00:41:31 Frank

I think it depends on the person like.

00:41:33 Frank

I think it depends on kind of what you’re comfortable with.

00:41:36 Frank

I mean, my audiobook listening is nowhere near where it was when I would drive everywhere all the time.

00:41:42 Frank

So yeah, yeah. So the reason we asked him ’cause audible is a sponsor of the show and if you go to the data drivenbook.com you can sign.

00:41:53 Frank

Up for free.

00:41:53 Frank

Audible membership and if you sign up then they give us a a little pat on the back and probably enough money to buy a Starbucks.

00:42:02 Frank

Help support the show.

00:42:05 Frank

And they’ve actually been one of our number one.

00:42:07 Frank

Sponsors so far.

00:42:08 Frank

Because of this program so.

00:42:10 Frank

Yeah, so you mentioned you had a website where can folks find out more about you?

00:42:19 Matteo

Who is my my website?

00:42:20 Matteo

I think it is.

00:42:22 Matteo

I I don’t remember.

00:42:24 Matteo

Uh oh, into result is a GitHub website into result Dot GitHub dot IO.

00:42:29 Frank

All right, we’ll make sure it goes on the show.

00:42:32 Frank

Notes so folks can find out more about this and definitely go to your favorite command line prompt and type in PIP install Hummingbird Mel to check out what’s going on.

00:42:44 Frank

I’m definitely going to experiment with this.

00:42:46 Frank

’cause it does look fascinating and and like Andy said, the potential for this is fascinating.

00:42:52 Frank

Because this could end up in, this could end up in a lot of different places, ’cause it solves a lot of different problems.

00:43:00 Frank

So anything else would fail.

00:43:03 Matteo

Yeah, if you try it let us know and we are kind of, you know, looking for contributors and feedbacks.

00:43:08 Matteo

So if you try it let us know what do you think and how we can improve.

00:43:12 Frank

Awesome, thanks and I’ll add the nice British lady and the show.

00:43:16 BAILey

Thanks for listening to data driven.

00:43:18 BAILey

We know you’re busy and we appreciate you.

00:43:20 BAILey

Listening to our podcast, but we have a favor to ask.

00:43:24 BAILey

Please rate and review our podcast on iTunes, Amazon Music, Stitcher or wherever you subscribe to us.

00:43:31 BAILey

You have subscribed to us, haven’t you having high ratings and reviews helps us improve the quality of our show and rank us more favorably with the search algorithms.

00:43:42 BAILey

That means more people listen to us spreading the joy and can’t the world use a little more joy these days?

00:43:50 BAILey

Now go do your part to make the world just a little better and be sure to rate and review the show.

Frank

#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv . Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan. I blog to help you become a better data scientist/ML engineer Opinions are mine. All mine.