It says “opt-out” in the title.
Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.
Spent many years on Reddit and is now exploring new vistas in social media.
It says “opt-out” in the title.
Indeed. Firefox already has “sponsored links” and such in the built-in homepage, I simply disable those when I first install it and get on with life.
Big projects like Firefox need big money to support it. If you don’t want it to be beholden to Google it needs to find ways to earn some on its own.
It’s true, go ahead and read the ToS. It only grants a license to Reddit to use your content. It explicitly says:
You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
And then goes on to enumerate what you’re licensing them to do with it. There’s also a section titled “Changes to these Terms” about how they can change the ToS going forward.
No problem. I’m not a lawyer myself, mind you, but I’ve encountered issues like these enough times over the years that I feel I’ve got a pretty good layman’s grasp. Plus I’ve actually read some of these ToSes and considered them from the perspective of the company running the site, which I suspect most people arguing about this stuff haven’t actually done.
I wish the Fediverse sites running without rigorous ToSes well, of course, but I suspect failing to establish clear rights to use the content people post on them is likely to end up biting them in the long run. At least the bigger ones. Hobby-level websites get away with a lot because they don’t have significant money on the line.
You could ask a lawyer, I suppose. But the basic gist of this is “we don’t know what we might need to do with this data in the future, so we put ‘we can do anything with this data’ into the ToS so that we know that if the need arises we won’t find ourselves unable to do what we need to do with it.” Any website that doesn’t do this could find itself unable to implement new features or comply with new laws they didn’t think of when crafting the original ToS.
At the very minimum a ToS needs to have some way to update and apply retroactively to old data, which ends up being “we can do anything with this data” with extra steps.
Have you not experimented with LLMs? They come up with new things all the time.
A user’s data still belongs to the user when they post it on sites like Reddit and such, too. The ToS doesn’t take ownership away from them, at least not in any case that I’ve seen. It just gives the site the license to use it as well.
If it makes you feel better, the thing that annoys me most is not so much that this is happening but more how everybody is suddenly surprised by it and complaining about it. The data-harvesting itself doesn’t really harm anyone.
I’m just venting, really. I know it’s not going to make a real difference.
I suppose if you go waaaay back it was different, true. Back in the days of Usenet (as a discussion forum rather than as the piracy filesharing system it’s mostly used for nowadays) there weren’t these sorts of ToS on it and everything got freely archived in numerous different places because that’s just how it was. It was the first Fediverse, I suppose.
The ironic thing is that kbin.social’s ToS has no “ownership” stuff in it either. For now, at least, the new ActivityPub-based Fediverse is in the same position that Usenet was - I assume a lot of the other instances also don’t bother with much of a ToS and the posts get shared around beyond any one instance’s control anyway. So maybe this grumpy old-timer may get to see a bit of the good old days return, for a little while. That’ll be nice.
Well, a large part of my frustration stems from the “I’ve seen this for decades” part - longer than many of the people who are now raising a ruckus have been alive. So IMO it’s always been this way and the “social contract we’ve adapted to” is “the social contract that we imagined existed despite there being ample evidence there was no such thing.” I’m so tired of the surprised-pikachu reactions.
Combined with the selfish “wait a minute, the stuff I gave away for fun is worth money to someone else now? I want money too! Or I’m going to destroy my stuff so that nobody gets any value out of it!” Reactions, I find myself bizarrely ambivalent and not exactly on the side of the common man vs. the big evil corporations this time.
I wouldn’t really trust that promise, frankly. I just checked their terms of service and it has the usual clause:
You must own all rights, title, and interest, including all intellectual property rights, in and to, the User Content you make available on the Services. ASSC requires licenses from you for that User Content to operate the Services. By posting User Content on the Services, you grant ASSC a royalty-free, perpetual, irrevocable, non-exclusive, sublicensable, worldwide license to use, reproduce, distribute, perform, publicly display or prepare derivative works of your User Content.
Which isn’t really surprising, it’s standard boilerplate for a reason. They don’t want to be caught in a situation where they can’t function legally any more. They say they won’t sell the company or your data, and they might even believe that right now, but who knows what the future might bring? They have the ability to do so if the circumstances arise.
Hardly. They earn money by being paid by their users, but they can earn more money by being paid by their users and also selling their users’ data. The goal is more money, so it makes sense for them to do that. It’s not crazy.
From the WordPress Terms of Service:
License. By uploading or sharing Content, you grant us a worldwide, royalty-free, transferable, sub-licensable, and non-exclusive license to use, reproduce, modify, distribute, adapt, publicly display, and publish the Content solely for the purpose of providing and improving our products and Services and promoting your website. This license also allows us to make any publicly-posted Content available to select third parties (through Firehose, for example) so that these third parties can analyze and distribute (but not publicly display) the Content through their services.
Emphasis added. They told you what they could do with the content you gave them, you just didn’t listen.
I’m sorry if I’m coming across harsh here, but I’m seeing this same error being made over and over again. It’s being made frequently right now thanks to the big shakeups happening in social media and the sudden rise of AI, but I’ve seen it sporadically over the decades that I’ve been online. So it bears driving home:
Are you serious? We’re speaking in the Fediverse right now. It’s notable in its difference. Though instances have their own TOSes, so it’d be pretty trivial to set one up to harvest content for AI training as well.
Indeed. I frequently use LLMs as brainstorming buddies while working on creative things, like RPG adventure planning and character creation. I want the AI to come up with new and unexpected things that never existed before.
If I have need of the AI to account for “ground truths” then I use things like retrieval-augmented generation or database plugins that inject that stuff into the context.
They’re giving you services in exchange for your contents.
Does nobody even think about TOS any more? You don’t have to read any specific one, just realize the basic universal truth that no website is going to accept your contents without some kind of legal protection that allows them to use that content.
Academic Torrents has Reddit data up to December 2023. This data isn’t live-updated, my understanding is that it’s scraped when it’s first posted. That’s how services like removeddit worked, it would show the “original” version of a post or comment from when it was scraped rather than the edited or deleted version that Reddit shows now.
The age isn’t really the most important thing when it comes to training a base AI model. If you want to teach it about current events there are better ways to do that than social media scrapes. Stuff like Reddit is good for teaching an AI about how people talk to each other.
I didn’t say that everything in Star Trek was AGI, just that you can find examples there.
I use quotation marks there because what is often referred to as AI today is not whatsoever what the term once described.
The field of AI has been around for decades and covers a wide range of technologies, many of them much “simpler” than the current crop of generative AI. What is often referred to as AI today is absolutely what the term once described, and still does describe.
What people seem to be conflating is the general term “AI” and the more specific “AGI”, or Artificial General Intelligence. AGI is the stuff you see on Star Trek. Nobody is claiming that current LLMs are AGI, though they may be a significant step along the way to that.
I may be sounding nitpicky here, but this is the fundamental issue that the article is complaining about. People are not well educated about what AI actually is and what it’s good at. It’s good at a huge amount of stuff, it’s really revolutionary, but it’s not good at everything. It’s not the fault of AI when people fail to grasp that, no more than it’s the fault of the car when someone gets into it and then is annoyed it won’t take them to the Moon.
Which is why nobody trains on ONLY AI generated data.
Really, experts have thought of this stuff already. Because they’re experts. Synthetic data means that the amount of “real” data required is much less, so giant repositories like Reddit aren’t so important.
I find a ton of uses for quick Python scripts hammered out with Bing Chat to get random stuff done.
It’s also super useful when brainstorming and fleshing out stuff for the tabletop roleplaying games I run. Just bounce ideas off it, have it write monologues, etc.