Name: いつになったら私のコンピュータは私を理解してくれるのでしょうか？ (When Will My Computer Understand Me?)
Uploaded: 2021-01-14T04:46:49.000Z
Duration: 2 min 48 s
Description: VoiceTubeの動画で発音を聞きながら英語表現を覚えよう！学べる英語：

to 不定詞

Katrin Erk>> It's so incredibly to process language which is ridiculous

法助動詞 A2

because little children can speak. Little kids learn language

動名詞

by being in the middle of the world. So, they have all these

visions, sound and they have language to go along with that.

The way that computers learn language nowadays is

a huge pile of The New York Times and said, "OK, here you go, now learn language."

One problem with human language is that it's horribly

ambiguous. Take, for example, the word "run." You can

run your car into a bog. Run-ing, running.

Traditionally what people have done is try to get the computer to

pick up on patterns using a dictionary. So,

here is this clear list of senses, so, run has say, 20

if you had one occurrence like "he ran the company" you would have to say,

"OK, that is sense #12 and not sense #11 and not sense #13."

Trouble is, all of those meanings are somewhat related

but, they're still different because you draw different conclusions.

Now, what's the poor computer to do. So, I'm thinking

maybe we can't distinguish senses as clearly say, here's where one

sense begins, here's where the next sense stops. What I'm doing

is to represent each time you use a word like run

with the context in which it appears. So, all of this context

you can put into numbers and then present such a context as a point

in a high dimensional space. And in order to represent

words as these points in high dimensional space I need a whole

lot of data. I need to have all that context and I need to have it in a form

that I can count it, which means, 100 million words is good,

a billion is better, 2 billion or 3 billion words, yeah,

then you can actually get decent models. But if you want to

compute with that amount of data and you do this on a single desktop machine

then you better be prepared to wait for a long time and I did that for a while.

I'd start an experiment, wait 3 weeks to see how it got out,

With TACC I can do the same things in a couple hours. So, we need super-computing because

all of natural language processing these days but, in particular when you want to do

stuff with word meanings you need to use a lot of data and I mean a lot of data.