字幕表 動画を再生する 英語字幕をプリント ROBERT CROWE: I'm Robert Crowe. And we are here today to talk about production pipelines, ML pipelines. So we're not going to be talking about ML modeling too much or different architectures. This is really all focused about when you have a model and you want to put it into production so that you can offer a product or a service or some internal service within your company, and it's something that you need to maintain over the lifetime of that deployment. So normally when we think about ML, we think about modeling code, because it's the heart of what we do. Modeling and the results that we get from the amazing models that we're producing these days, that's the reason we're all here, the results we can produce. It's what papers are written about, for the most part, overwhelmingly. The majority are written about architectures and results and different approaches to doing ML. It's great stuff. I love it. I'm sure you do too. But when you move to putting something into production, you discover that there are a lot of other pieces that are very important to making that model that you spent a lot of time putting together available and robust over the lifetime of a product or a service that you're going to offer out to the world so that they can experience really the benefits of the model that you've worked on. And those pieces are what TFX is all about. In machine learning, we're familiar with a lot of the issues that we have to deal with, things like where do I get labeled data. How do I generate the labels for the data that I have. I may have terabytes of data, but I need labels for them? Does my label cover the feature space that I'm going to see when I actually run inference against it? Is my dimensionality-- is it minimized? Or can I do more to try to simplify my set, my feature vector, to make my model more efficient? Have I got really the predictive information in the data that I'm choosing? And then we need to think about fairness as well. Are we are we serving all of the customers that we're trying to serve fairly, no matter where they are, or what religion they are, what language they speak, what demographic they might be because you want to serve those people as well as you can? You don't want to unfairly disadvantage people. And we may have rare conditions too, especially in things like health care where we're making a prediction that's going to be pretty important to someone's life. And it maybe on a condition that occurs very rarely. But a big one when you go into production is understanding the data lifecycle. Because once you've gone through that initial training and you've put something into production, that's just the start of the process. You're now going to try to maintain that over a lifetime, and the world changes. Your data changes. Conditions in your domain change. Along with that, you're doing now production software deployment. So you have all of the normal things that you have to deal with any software deployment, things like scalability. Will I need to scale up? Is my solution ready to do that? Can I extend it? Is it something that I can build on? Modularity, best practices, testability. How do I test an ML solution? And security and safety, because we know there are attacks for ML models that are getting pretty sophisticated these days. Google created TFX for us to use. We created it because we needed it. It was not the first production ML framework that we developed. We've actually learned over many years because we have ML all over Google taking in billions of inference requests really on a planet scale. And we needed something that would be maintainable and usable at a very large production scale with large data sets and large loads over a lifetime. So TFX has evolved from earlier attempts. And it is now what most of the products and services at Google use. And now we're also making it available to the world as an open-source product available to you now to use for your production deployments. It's also used by several of our partners and just companies that have adopted TFX. You may have heard talks from some of these at the conference already. And there's a nice quote there from Twitter, where they did an evaluation. They were coming from a Torch-based environment, looked at the whole suite or the whole ecosystem of TensorFlow, and moved everything that they did to TensorFlow. One of the big contributors to that was the availability of TFX. The vision is to provide a platform for everyone to use. Along with that, there's some best practices and approaches that we're trying to really make popular in the world, things like strongly-typed artifacts so that when your different components produce artifacts they have a strong type. Pipeline configuration, workflow execution, being able to deploy on different platforms, different distributed pipeline platforms using different orchestrators, different underlying execution engines-- trying to make that as flexible as possible. There are some horizontal layers that tie together the different components in TFX. And we'll talk about components here in a little bit. And we have a demo as well that will show you some of the code and some of the components that we're talking about. The horizontal layers-- an important one there is metadata storage . So each of the components produce and consume artifacts. You want to be able to store those. And you may want to do comparisons across months or years to see how did things change, because change becomes a central theme of what you're going to do in a production deployment. This is a conceptual look at the different parts of TFX. On the top, we have tasks-- a conceptual look at tasks. So things like ingesting data or training a model or serving the model. Below that, we have libraries that are available, again, as open-source components that you can leverage. They're leveraged by the components within TFX to do much of what they do. And on the bottom row in orange, and a good color for Halloween, we have the TFX components. And we're going to get into some detail about how your data will flow through the TFX pipeline to go from ingesting data to a finished trained model on the other side. So what is a component? A component has three parts. This is a particular component, but it could be any of them. Two of those parts, the driver and publisher, are largely boilerplate code that you could change. You probably won't. A driver consumes artifacts and begins the execution of your component. A publisher takes the output from the component, puts it back into metadata. The executor is really where the work is done in each of the components. And that's also a part that you can change. So you can take an existing component, override the executor in it, and produce a completely different component that does completely different processing. Each of the components has a configuration. And for TFX, that configuration is written in Python. And it's usually fairly simple. Some of the components are a little more complex. But most of them are just a couple of lines of code to configure. The key essential aspect here that I've alluded to is that there is a metadata store. The component will pull data from that store as it becomes available. So there's a set of dependencies that determine which artifacts that component depends on. It'll do whatever it's going to do. And it's going to write the result back into metadata. Over the lifetime of a model deployment, you start to build a metadata store that is a record of the entire lifetime of your model. And the way that your data has changed, the way your model has changed, the way your metrics have changed, it becomes a very powerful tool. Components communicate through the metadata store. So an initial component will produce an artifact, put it in the metadata store. The components that depend on that artifact will then read from the metadata store and do whatever they're going to do, and put their result into it, and so on. And that's how we flow through the pipeline. So the metadata store I keep talking about. What is it? What does it contain? There's really three kinds of things that it contains. Trained models or just artifacts themselves. They could be trained models, they could be data sets, they could be metrics, they could be splits. There's a number of different types of objects that are in the metadata store. Those are grouped into execution records. So when you execute the pipeline, that becomes an execution run. And the artifacts that are associated with that run are grouped under that execution run. So again, when you're trying to analyze what's been happening with your pipeline, that becomes very important. Also, the lineage of those artifacts-- so which artifact was produced by which component, which consumed which inputs, and so on. So that gives us some functionality that becomes very powerful over the lifetime of a model. You can find out which data a model was trained on, for example. If you're comparing the results of two different model trainings that you've done, tracing it back to how the data changed can be really important. And we have some tools that allow you to do that. So TensorBoard for example will allow you to compare the metrics from say a model that you trained six months ago and the model that you just trained now to try to understand. I mean, you could see that it was different, buy why-- why was it different. And warm-starting becomes very powerful too, especially when you're dealing with large amounts of data that could take hours or days to process, being able to pull that data from cache. If the inputs haven't changed, rather than rerunning that component every time becomes a very powerful tool as well. So there's a set of standard components that are shipped with TFX. But I want you to be aware from the start that you are not limited to those standard components. This is a good place to start. It'll get you pretty far down the road. But you will probably have needs-- you may or may not-- where you need to extend the components that are available. And you can do that. You can do that in a couple of different ways. This is sort of the canonical pipeline that we talk about so on the left, we're ingesting our data. We flow through, we split our data, we calculate some statistics against it. And we'll talk about this in some detail. We then make sure that we don't have problems with our data, and try to understand what types our features are. We do some feature engineering, we train. This probably sounds familiar. If you've ever been through an ML development process, this is mirroring exactly what you always do. Then you're going to check your metrics across that.