Look, we can debate the proper and private way to do Captchas all day, but if we remove the existing implementation we will be plunged into a world of hurt.
I run tucson.social - a tiny instance with barely any users and I find myself really ticked off at other Admin’s abdication of duty when it comes to engaging with the developers.
For all the Fediverse discussion on this, where are the github issue comments? Where is our attempt to convince the devs in this.
No, seriously WHERE ARE THEY?
Oh, you think that just because an “Issue” exists to bring back Captchas is the best you can do?
NO it is not the best we can do, we need to be applying some pressure to the developers here and that requires EVERYONE to do their part.
The Devs can’t make Lemmy an awesome place for us if us admins refuse to meaningfully engage with the project and provide feedback on crucial things like this.
So are you an admin? If so, we need more comments here: https://github.com/LemmyNet/lemmy/issues/3200
We need to make it VERY clear that Captcha is required before v0.18’s release. Not after when we’ll all be scrambling…
EDIT: To be clear I’m talking to all instance admins, not just Beehaw’s.
UPDATE: Our voices were heard! https://github.com/LemmyNet/lemmy/issues/3200#issuecomment-1600505757
The important part was that this was a decision to re-implement the old (if imperfect) solution in time for the upcoming release. mCaptcha and better techs are indeed the better solution, but at least we won’t make ourselves more vulnerable at this critical juncture.
There are other options.
I’m just a hobbyist, but I have built a couple websites with a few hundred users.
A stupidly simple and effective option I’ve been using for several years now, is adding a dummy field to the application form. If you add an address field, and hide it with CSS, users won’t see it and leave it blank. Bots on the other hand will see it and fill it in, because they always fill in everything. So any application that has an address can be automatically dropped. Or at least set aside for manual review.
I don’t know how long such a simple trick will work on larger sites. But other options are possible.
Fun fact, I purposefully goaded the bots into attacking my instance.
Turns out they aren’t even using the web form, they’re going straight to the register api endpoint with python. The api endpoint lives at a different place from the signup page and putting a captcha in front of that page was useless in stopping the bots. Now, we can’t just challenge requests going to the API endpoint since it’s not an interactive session - it would break registration for normal users as well.
The in-built captcha was part of the API form in a way that prevented this attack where the standard Cloudflare rules are either too weak (providing no protection) or too strong (breaking functionality).
In my case I had to create some special rules to exclude python clients and other bots while making sure to keep valid browser attempts working. It was kind of a pain, actually. There’s a lot of Lemmy that seems to trip the optional OWASP managed rules so there’s a lot of “artisanally crafted” exclusions to keep the site functional.
Anyways, I guess my point is form interaction is just one way to spam sites, but this particular attacker is using the backend API and forgoing the sign-up page entirely. Hidden fields wouldn’t be useful here, IMO.
Couldn’t the bots just be programmed to not fill out that field? Or not fill out any field flagged as hidden?
You’d think so.
But it’s not flagged as hidden. Instead you use CSS to set display as none. So the bot needs to do more than look at the direct HTML. It needs to fully analyze all the linked HTML, CSS, and even JavaScript files. Basically it needs to be as complex as a whole browser. It can’t be a simple script anymore. It becomes impracticality complicated for the not maker.
This might work against very generic bots, but it won’t work against specialized bots. Those wouldn’t even need to parse the DOM, just recreate the HTTP requests.
Which is why you’d need something else for popular sites worth targeting directly. But there are more options than standard capta’s. Replacing them isn’t necessarily a bad idea.
This is what I’m worried about. As the fediverse grows and gains popularity it will undoubtedly become worth targeting. It’s not hard to imagine it becoming a lucrative target for things like astroturfing, vote brigading etc bots. For centralized sites it’s not hard to come up with some solutions to at least minimize the problem. But when everyone can just spin up a Lemmy, Kbin, etc instance it becomes a much, much harder problem to tackle because instances can also be ran by bot farms themselves, where they have complete control over the backend and frontend as well. That’s a pretty scary scenario which I’m not sure can be “fixed”. Maybe something can be done on the ActivityPub side, I don’t know.
That’s where simple defederation happens. It’s mostly why behaww cut off lemmy.world.
What if you have 100s or 1000s of such instances? At some point you defeat the entire purpose of the federation.
I foresee islands of instances not federated with each other, like cities without a road connecting them. There could be an island of instances that will be full of bots and illegal shit with open sign ups, and there will be other islands with stricter requirements, effectively no bots run by people who want good social media.
That’s when you go to a federation white-list, instead of black-list.
Yes, but it would take more work specific to this problem, which if it’s not a widespread technique would be viewed as impractical.
Thanks for sharing that tip, I’m working with someone doing a small instance and we aren’t for sure we want to be allowing applications, but if we do this is good to think about!
When you automate a browser process like signing up, you very likely manually set in your code the fields you want to fill, not sure why a bot would do that automatically… I don’t think this would be effective at all
The bots for the most part are generic. They fill in all fields with randomly generated nonsense mostly. If the site is large enough you could make a bespoke script, which is why I’m not sure how well it will scale to large sites.
But that’s only the simplest option. Annother I’ve see is using a collection of movie posters, you have the user pick the title from 5 or 6 options. There are lots of simple ways to defeat bots of all kinds.