字幕表 動画を再生する 英語字幕をプリント Katrin Erk>> It's so incredibly to process language which is ridiculous because little children can speak. Little kids learn language by being in the middle of the world. So, they have all these visions, sound and they have language to go along with that. The way that computers learn language nowadays is more like if you sat a baby down with a huge pile of The New York Times and said, "OK, here you go, now learn language." One problem with human language is that it's horribly ambiguous. Take, for example, the word "run." You can run a race, run a company or you run your car into a bog. Run-ing, running. Traditionally what people have done is try to get the computer to pick up on patterns using a dictionary. So, here is this clear list of senses, so, run has say, 20 and here's the list. And then, if you had one occurrence like "he ran the company" you would have to say, "OK, that is sense #12 and not sense #11 and not sense #13." Trouble is, all of those meanings are somewhat related but, they're still different because you draw different conclusions. Now, what's the poor computer to do. So, I'm thinking maybe we can't distinguish senses as clearly say, here's where one sense begins, here's where the next sense stops. What I'm doing is to represent each time you use a word like run with the context in which it appears. So, all of this context you can put into numbers and then present such a context as a point in a high dimensional space. And in order to represent words as these points in high dimensional space I need a whole lot of data. I need to have all that context and I need to have it in a form that I can count it, which means, 100 million words is good, a billion is better, 2 billion or 3 billion words, yeah, then you can actually get decent models. But if you want to compute with that amount of data and you do this on a single desktop machine then you better be prepared to wait for a long time and I did that for a while. I'd start an experiment, wait 3 weeks to see how it got out, that's painful, really painful. With TACC I can do the same things in a couple hours. So, we need super-computing because all of natural language processing these days but, in particular when you want to do stuff with word meanings you need to use a lot of data and I mean a lot of data.
A2 初級 いつになったら私のコンピュータは私を理解してくれるのでしょうか? (When Will My Computer Understand Me?) 159 4 Nana Chen に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語