Tumblr: The cure to LLMs and AI image synthesisers

There was a pretty big uproar recently when Fartomatt announced that Automattic had signed a deal with Midjourney and OpenAI to provide user-created data to their AI processes. A toothless "opt out" button was added to everyone's settings menu to appease people and give the impression that the average schmoe has a say in how their data is handled, but it is a mere illusion.

However, here's the thing. Tumblr was part of the very first data-scrape when these models were being trained in the first place. The whole of the internet was upended into a funnel and forcefed into these banks of computers under the mistaken assumption that "more is better". The only thing that has changed there is, if these companies want to continue using Tumblr to train their models, they need to pay Matt a certain number of millions to access that data. Without getting too far into how unreliable 95% of the internet is when it comes to basic information, Tumblr is quite possibly the most toxic poison you could feed to a large language model. Midjourney thought it was going to get reams and reams of slick, airbrush-style art of realistic Steven Universe characters and handmade indie manga that looks like it came off the document scanner. What they're getting is... well... this, mainly.

Get drinked idiot

It bears mentioning that the title of this piece is "Get drinked idiot". And you thought LLMs were bad at English grammar and punctuation before the Tumblr deal.

The approach to AI was based upon a faulty premise: that a computer could ever make sense of the chaotic mess that occurs inside the human cranium. In order to create the appearance of this without doing any actual work in creating an artificial intelligence, the developers of GPT2 and 3 simply made a complicated averaging program. Computers aren't so great at abstract thought, but they're really good at weighted averages; and since the appearance of progress was all they needed to put forward in order to attract venture capitalists, that's all this so-called "AI" is. There's no intelligence behind it, just the blockchain. Oh, yeah, talking of—isn't it interesting how AI suddenly took off after NFTs crashed and burned? The same computers that were crunching numbers day and night to find an open space on the blockchain to mint a new NFT are now doing this. Generative AI doesn't bloody work; it's been proving that it doesn't work time after time, sometimes several times in the span of a single day. Eventually, investors are going to realise this and AI will crash harder than NFTs ever did.

Eventually, ChatGPT is going to make a weighted average of the entire internet, decide that the answer is "bibby", and all the techbros who dumped their entire fortunes into this failing technology are going to go looking for patient zero. When they do, they'll arrive at Matt Mullenweg's Amazing Data Ruiner, and that will be the end of his rather transphobic career, running off into the sunset, hotly pursued by Sam Altman, David Holz, and the entire OpenAI board of directors all carrying torches and pitchforks.

--2 March 2024--

HOME