n-waves.github.io

A month since release of GPT-2

autor: Piotr Czapla

Remember the GPT-2 from OpenAI? (you can read my first reaction here). More than a month has passed since the release, but it is still all around the news, and the fact that they haven't released the pre-trained model sparked quite a vivid discussion in the Deep Learning community.

I'm erring towards waiting with a release as the fake text is a bit different for beast than fake videos or images. With images, we had years to accommodate to the change thanks to Photoshop. With text, we haven't had anything like that before so it is harder to spot mistakes. Sarah Constantin is doing a great job at explaining in her article: Humans Who Are Not Concentrating Are Not General Intelligences. If you haven't seen this article make sure to give it a read; it will be worth your time.

But I'm not sure that the 6 months of the quarantine time offered by Open AI is justified. We already have good models that can simulate Wikipedia, have a look at this generated article about Jeremy Howard.

Moreover, it was argued that it is enough to invest 100k to train such models as GPT-2 so it won't be long before some other larger players replicate the models. There is a risk that if we don't release it soon enough, we won't be able to create models that actually can help fight fake news like the research done by MIT, IBM, and Harward. The researchers created a model that estimates the likelihood of the text being written by AI. It requires the weights of the GPT-2 model to detect it and it will only work as long as the GPT-2 makes mistakes. It isn't much, but it is a move in the right direction. I would love to see that being implemented in browsers.

Overall I'm excited to see the NLP space evolving so rapidly; after all these kinds of models were bound to appear sooner or later. I just hoped that it will appear more gradually.