AI will end anonymity on the Internet

tl;dr AI slop is here to stay. We cannot detect AI content, thus we must mark human content. This requires to identify the human to a varying degree. The fundamental technology is there, and will be integrated in browsers.

AI slop is here to stay

I tried to search for C6 envelope templates on the web – hardly a topic to bring forth the next billionaire. Nonetheless, two-thirds of the webpages were clearly AI generated. (Current research estimates ⅓ of new websites to be AI slop.)

My ad-blocker saved me, but seemingly even slop sites for such a benign topic are economically viable. This won’t stop once Claude et al. charge profitable fees (read: > $ 1000 per user per month) for their service. Such a slop site is there for the long game. The economical incentive doesn’t change if it takes a local model on a tiny machine a week to create the site.

Detecting AI content is impossible

AI can create content at superhuman speed, overwhelming any manual review system. Therefore, we would need to rely on automated detection systems. The only realistic chance are AI systems that detect AI content. We can only lose this arms race: While creating the slop site, the “author” can run it against AI detectors and fine-tune until it passes.

Watermarking does work technically for higher bandwidth content (images, audio, video), but the “authors” easily can use custom AI systems that don’t create watermarks.

We need to recognize genuine human content

We can’t detect AI content, but still want to enjoy human-made content. But why?

In the C6 envelope example above, none of the AI slop sites hosted actual templates. They contained lots of generic text on the topic, and linked to templates on, or stole templates from, other sites. They made it harder to find what I was actually looking for.

Skimming a site for downloadable templates at least doesn’t take too much mental effort. But for a lot of slop sites, we can only determine their uselessness after reading for some time. They consume our mental resources without providing value.

At least such slop sites are only after our money (via ads). Others want to misinform us. Recognizing such attempts could very well prove to be vital for our societies.

Future progress on AI also needs to recognize human-written texts, as LLMs trained on generated output collapse.

Current AI crawlers overload lots of websites by bombarding them with thousands of requests. These crawlers ignore the website’s rules (robots.txt) about the part of the website they should leave alone. A website can terminate any connection to non-human counterparts.

If you can’t detect the enemy, identify the allies

If we can’t detect the bad apples, we need to identify the good ones. We can’t rely on “I’m a human, pinky swear” promises, as any AI can forge such statements. We need a (mostly) tamper-proof system of identification.

Thankfully, we don’t need to invent such a system from scratch: the public key infrastructure maintained and enforced by CA/Browser forum serves a similar purpose. It makes sure that visiting https://www.wikipedia.org/ shows us the contents of Wikipedia, without anybody tampering with the contents. It is fully integrated in the browser, we don’t need to do anything special.

It is a very complex, but mostly working system, that avoids all-powerful gatekeepers: the “trust roots” are called certificate authorities, and we can have lots of them; any website can participate; and all browsers support the standard.

https gives us users the trust that we see the genuine website – but how can the website trust the user? We also have existing technology for this: the Web Authentication API standard defines how a website can trust a browser to identify the user in front of it, e.g. by user presence, meaning pressing a physical button.

These technologies are building blocks for a future standard that reliably identifies humans. To fight AI slop this standard doesn’t need to be bullet-proof; it must only make AI slop more expensive than troll farms and scam centers. (Such a standard might even help against the latter exploitations, if it included revocation of misused anonymous identities.)

This would not stop humans from using AI to create their content, and that’s ok: Use cases like translation, grammar fixes, and picture touch-up, are legitimate and helpful AI applications. As long as a human is involved, and takes responsibility for the result, this doesn’t turn the tide of AI slop.

All your identity are belong to us?

Does this mean every website will know everything about every visitor? Thankfully, no. Standards like ISO 18013-5 (also used in the European age verification app) allow for different levels of attestation. Similar technology will be part of every web browser, the same as https is now. The “trust root” will be countries, the same as for official documents like driver’s license or passport.

Each website can choose the level of attestation it requires:

unknown: No attestation at all – same as today’s public websites.This will be the Internet’s 4chan. It exists, some may visit it for specific use-cases, but it will generally be ignored. Search engines will not show content from such sites (at least by default).
anonymous: The website only asks that the user is a human, it doesn’t care who the user is, and never gets this information.This will be the default for most websites, akin to a reliable version of today’s “I’m not a bot” checkboxes. The aforementioned standards ensure that only serious collusion between browsers, internet providers, website owners, and countries might lead to the actual identity of the visitor.
pseudonymous: The website asks for an alias (or username) of the human user. The same website always gets the same alias for the same user, but a different website will get a different alias.
This will be for websites like forums that in general don’t care who the user is, but want to prevent the same user to create multiple accounts. Depending on the implementation, the website together with the originating country might reveal the actual identity of the visitor.
identified: The website asks for a proven identity of the human user.This is already used today for websites like banks, company-internal systems, or government agencies. The website owner already knows the user’s identity. The user also expects to be identified – nobody wants anonymous logins to their bank account!

The end of free Internet?!

Lots of people consider anonymous Internet usage as standard. This might not be warranted – most countries (including countries that claim to value freedom) have technical abilities to identify users via their IP address. Actual anonymity requires serious effort and constant vigilance on any potentially identifying information.

I’m not saying I like this trajectory. I argue we won’t have a good choice – either drown in AI slop or give up some level of anonymity. Done right, we might win at least a bit of consolation: actual anonymity would still be possible with the right tools; anonymous websites could be really anonymous, also towards governments; pseudonymous forum access could be sufficient to catch CSAM offenders and prevent governments from enforcing more stringent surveillance measures; and identifying to banks and government websites could become a lot less of a hassle.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30