Name: TensorFlowデータセットを使った作業 (TensorFlow Meets) (Working with TensorFlow Datasets (TensorFlow Meets))
Uploaded: 2021-01-14T10:34:33.000Z
Duration: 8 min 11 s
Description: VoiceTubeの動画で発音を聞きながら英語表現を覚えよう！学べる英語：

I'm Laurence Moroney  and I'm really delighted

to have Ryan Sepassi from the Google AI research team.

And Ryan, you work on  TensorFlow Datasets, right?

- Could you tell us a bit more about it? - Absolutely.

and what it aims to do is standardize the interface

So we've actually shipped already  with about 30 datasets

Nice, now one of the things that I found that,

particularly when trying to learn TensorFlow,

that a lot of the codes that you see in tutorials is all about,

"Here's where you go to get the data  and then you download it

and then you unzip it here  and then you put these files here

and the labels from there," and that kind of stuff.

Now part of TFDS is really to try  and simplify all of that, right?

That's absolutely right. We noticed the same thing.

Researchers and folks  who do machine learning,

is just clean the input data  right off the bat.

And input data is in all sorts of formats,

when you're generating data  at rest to share it,

it makes sense to have it in a certain format.

But we as machine learning practitioners,

that's ready to feed into a machine learning pipeline.

And so TFDS knows the format  of all the source datasets,

pre-processes them,  puts them in a standard format,

is ready to feed into your machine learning pipeline.

So it should help both advanced practitioners

and people who are  just trying to get started,

it's like a one-liner,  to get a ton of datasets.

like for people who are just  trying to get started,

all of the tutorials all seem  to have the same old datasets, right?--

It was like MNIST handwriting,  it was Fashion-MNIST--

and they were relatively easy  to get the data and to actually use it.

Now we can actually have a little bit more variety.

Yeah, break out of the box of MNIST  and Fashion-MNIST--

Not that there's anything wrong with them--

They spurred the field on a lot, datasets always do,

but it's great to get,  yeah, some variety.

Especially for beginners, you start out  with these small datasets

to sort of larger problems, larger models,

you're not going to have  to change very much.

but I've been teaching a course on Coursera,

and one of the things  that was the feedback

when we were designing  this course was that,

"Hey, we just don't want to do  the same old datasets again.

and maybe find a dataset  from Kaggle and then work on that,

and it was like, "No, let's see  if we can build our own dataset

So we have a learning datasets  that somebody can use,

and now the whole community  can use that dataset,

And the process behind that was  pretty seamless and pretty painless.

Yeah, super glad that TensorFlow Datasets  could be helpful there.

Now I can say I'm a little spoiled that I had somebody help me.

and Google helped me  to get all the metadata

that I needed to be able  to publish my set into it.

Yeah, it should be  really straightforward.

We've tried to make it  as easy as possible

and we actually have a really  detailed guide that you can follow.

And lots of people externally  have actually now contributed datasets

If you go to the  tensorflow.org/datasets site

we have a link to a guide  to add a dataset

it's pretty straightforward  to get one in.

The essential piece is  to iterate through the source data.

So whatever the format  of the source data is--

it's NumPy arrays, it's in a binary format,

or in pickled format, or whatever it is--

you just iterate through it  and yield records

and then document exactly  what the features are--

I have a class label feature,  it's got these classes;

I've got a text feature, I want to use this vocabulary.

字幕リスト動画再生

TensorFlowデータセットを使った作業 (TensorFlow Meets) (Working with TensorFlow Datasets (TensorFlow Meets))

episode

absolutely

awesome

enormous