字幕表 動画を再生する 英語字幕をプリント This is all a conspiracy, don't you know that, it's a conspiracy. Yes, yes, yes! Good evening, my fellow Americans. Fate has ordained that the men who went to the moon to explore in peace will stay on the moon to rest in peace. That President Nixon video you just watched is a deep fake. It was created by a team at MIT as an educational tool to highlight how manipulated videos can spread misinformation - and even rewrite history. Deepfakes have become a new form of altering reality, and they’re spreading fast. The good ones can chip away at our ability to discern fact from fiction, testing whether seeing is really believing. Some have playful intentions, while others can cause serious harm. People have had high profile examples that they put out that have been very good, and I think that moved the discussion forward both in terms of, wow, this is what's possible with this given enough time and resources, and can we actually tell at some point in time, whether things are real or not? A deep fake doesn't have to be a complete picture of something. It can be a small part that's just enough to really change the message of the medium. See I would never say these things, at least not in the public address. But someone else would. Someone like Jordan Peele. A deep fake is a video or an audio clip that's been altered to change the content using deep learning models. The deep part of the deep fake that you might be accustomed to seeing often relies on a specific machine learning tool. A GAN is a generative adversarial network and it's a kind of machine learning technique. So in the case of deep fake generation, you have one system that's trying to create a face, for example. And then you have an adversary that is designed to detect deep fakes. And you use these two together to help this first one become very successful at generating faces that are very hard to detect by using another machine learning technique. And they just go back and forth. And the better the adversary, the better the producer will be. One of the reasons why GANs have become a go-to tool for deep fake creators is because of the data revolution that we’re living in. Deep learning has been around a long time, neural networks were around in the '90s and they disappeared. And what happened was the internet. The internet is providing enormous amounts of data for people to be able to train these things with armies of people giving annotations. That allowed these neural networks that really were starved for data in the '90s, to come to their full potential. While this deep learning technology improves everyday, it’s still not perfect. If you try to generate the entire thing, it looks like a video game. Much worse than a video game in many ways. And so people have focused on just changing very specific things like a very small part of a face to make it kind of resemble a celebrity in a still image, or being able to do that and allow it to go for a few frames in a video. Deep fakes first started to pop up in 2017, after a reddit user posted videos showing famous actresses in porn. Today, these videos still predominantly target women, but have widened the net to include politicians saying and doing things that haven't happened. It's a future danger. And a lot of the groups that we work with are really focused on future dangers and potential dangers and being abreast of that. One of these interested groups has been DARPA. They sent out a call to researchers about a program called Media Forensics, also known as MediFor. It's a DARPA project that's geared towards the analysis of media. And originally it started off as very much focused on still imagery, and detecting, did someone insert something into this image? Did someone remove something? It was before deep fakes became prominent. The project focus changed when this emerged. At SRI International, Aaron and his team have been working across disciplines to create a multi-pronged approach for detecting deep fakes. The system they’ve developed is called SAVI. So our group focused on speech. And in the context of this SAVI program, we worked with people in the artificial intelligence center who are doing vision. And put our technologies together to collaborate on coming up with a set of tools that can detect things like, here's the face. Here's the identity of the face. It's the same person that was earlier in the video. The lips are moving, okay. And then we use our speech technology and say, "Can we verify that this piece of audio and this piece of audio came from the same speaker or a different speaker?" And then put those together as a tool that would say, "If you see a face and you see the lips moving, the voice should be the same or you wanna flag something." However, there is always a worry that making these detection systems more available could unintentionally provide deep fake creators with workarounds. If released, the methods meant to catch the altered media, could potentially drive the next generation of deep fakes. As a result, these detection systems have to evolve. In its newest iteration, Aaron gave us a run through of how various aspects of the system work, without giving too much away. This is an explicit lip sync detection. What we're doing here is we're learning from audio and visual tracks what the lip movement should be given some speech and vice versa. And we're detecting when that deviates from what you would expect to see and hear. While some techniques can work well on their own, most fair better when combined into a larger detection system. So in this video you'll see Barack Obama giving a speech about Tom Vilsack, one of his departing cabinet members. And we're running this live through our system here, which is processing basically to identify two kinds of information. The top one where it says natural is a model that's detecting is this natural or some type of synthesized or generated speech, essentially a deep fake. In the bottom, is detecting identity based on voice, so we have a model of Barack Obama so it's saying this continues to verify as Obama and this will continue like this until now we get Jordan Peele imitating Barack Obama. We're entering an era in which our enemies can make it look like anyone is saying anything at any point in time. And that whole section here was Jordan Peele. He’s natural, but he’s not Obama. I would say for detection of synthesis or voice conversion, we're in the sub 5% error rate for what I would call laboratory conditions. And probably in the real world, it would be higher than that. That's why having these multi-pronged things is really important. However, technology is only part of the equation. How we as a society respond to these altered pieces of content is as important. The media tends to focus on the technological aspects of things rather than the social. The problem is less the deep fakes and more the people who are very willing to believe something that is probably not well done because it confirms something that they already believe. Reality becomes an opinion rather than fact. And it gives you license to misbelieve reality. It's really hard to predict what will happen. You don't know if this is going to be something that five years from now people actually nail down or if it's 40 years from now. It's one of those things that is still sort of exciting, interesting and new and you don't know what the limitations are yet.
B1 中級 機械学習がディープフェイクスの欺瞞的な世界をどのように動かすか (How Machine Learning Drives the Deceptive World of Deepfakes) 6 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語