AI Megathread

Griatch

@Faraday said in AI Megathread:

Now I realize that copyright laws and internet regulations are imperfect, but imagine what the world would look like if everyone had just folded over Napster.

More seriously, LLMs are far, far worse than Napster, which hurt recording companies way more than it hurt actual musicians. I’m not taking a stance on the ethics of pirating, but there’s a difference between people copying things that others have made and people outright displacing human creators.

So, if I understand you right, it’s not unethical training sourcing that is the issue for you (as it seems to be for Faraday), but the societal implications of the tech itself?
That’s a valid view. But while we can regulate and fix ethics of training sets, we won’t realistically stop AI being used and possibly upending a lot of people’s jobs in the same way as was done by countless new technologies in the past.

I’m not saying I want people to lose their jobs, I’m just saying this is something we need to learn and adapt to rather than hope that the genie can be put back in its bottle.

Faraday

@Griatch said in AI Megathread:

That’s a valid view. But while we can regulate and fix ethics of training sets, we won’t realistically stop AI being used and possibly upending a lot of people’s jobs in the same way as was done by countless new technologies in the past.

What makes most of these models powerful is the breadth of their training data. Yes, you can make an open-source model, but by definition they don’t work as well. They can’t make Pikachu fighting a duck with a lightsaber because both Pikachu and lightsabers are copyrighted/trademarked properties.

Also, given how poorly people understand intellectual property law, I really question whether these models are truly being trained only on things in the public domain. There’s such a widespread mentality of “well it’s on the internet and doesn’t have a copyright tag attached so it must be free right?” Plus people re-uploading copyrighted stuff to sharing sites with a different license. Maybe they are - I haven’t dug into it - but color me doubtful.

I’m not suggesting that we can - or should - stop all uses of a new technology. I’m just suggesting that the current hype wave that’s overselling what the tech can actually do, coupled with the unethical nature of the training sets, is creating a perfect storm of badness.

Rinel

@Griatch said in AI Megathread:

@Rinel said in AI Megathread:

@Faraday said in AI Megathread:

Now I realize that copyright laws and internet regulations are imperfect, but imagine what the world would look like if everyone had just folded over Napster.

More seriously, LLMs are far, far worse than Napster, which hurt recording companies way more than it hurt actual musicians. I’m not taking a stance on the ethics of pirating, but there’s a difference between people copying things that others have made and people outright displacing human creators.

So, if I understand you right, it’s not unethical training sourcing that is the issue for you (as it seems to be for Faraday), but the societal implications of the tech itself?

Both are issues for me, though if you pressed me I’d say I’m more worried about the effects. If it didn’t have an economic effect, it would be a lot more like pirating media to me.

I very strongly support the implementation of strict regulations on how the models are trained, with requirements that all training data be listed and freely discoverable by the public.

I’m not saying I want people to lose their jobs, I’m just saying this is something we need to learn and adapt to rather than hope that the genie can be put back in its bottle.

One of the reasons I use MJ is to understand what’s going on, so I get what you mean, but we can still shackle the genie for a while.

Griatch

@Faraday You talk as if it’s a clear-cut thing that these models are based on “theft”. Legally speaking, I don’t think this is really established yet - it’s a new type of technology and copyright law has not caught up.

If you (the human) were to study Picachu (as presented in publicly available, but copyrighted images) and learn in detail how he looks, you would not be breaching copyright. Not until you actually took that knowledge and made fan-art of him would you be in breach of copyright (yes, fan-art is breaching copyright, it’s just that it’s usually beneficial to the brand and most copyright holders seldomly enforce their copyright unless you try to compete or make money off it).

In the same way, an AI may know how Picachu looks, but one could argue that this knowledge does not in itself infringe on copyright - it just knows how Picachu looks after all, similarly to you memorizing his looks by just looking.

One could of course say that this knowledge inherently makes it easier for users of the AI to breach copyright. If you were to commission Picachu from a human artist, both you and the artist could be on the hook for copyright infrigement.

So would that put both the AI (i.e. the company behind the AI) and the commissioning human in legal trouble the moment they write that Picachu prompt? It’s interesting that the US supreme court has ruled that AI-generated art cannot be copyrighted in itself. So this at least establihes that the AI does not itself has a person-hood that can claim copyright (which makes sense).

Now, I personally agree with the sentiment that it doesn’t feel good to have my works be included in training sets without my knowledge (yes, I’ve found at least 5 of my images in the training data). But my feelings (or the feelings of other artists) don’t in itself make this illegal or an act of thievery. That’s up to the legal machinery to decide on, and I think it’s not at all clear-cut.

Griatch

@Rinel said in AI Megathread:

I very strongly support the implementation of strict regulations on how the models are trained, with requirements that all training data be listed and freely discoverable by the public.

Proprietary models like Midjourney and OpenAI don’t release any of this stuff, alas. But if you stick to OSS models, like Stable Diffusion, you can freely search their training data here (they also use other public sources). There are tens of thousands of LLM models for various purposes and active research on hugging face alone; they tend to be based on publicly available training data sets.

Faraday

@Griatch said in AI Megathread:

You talk as if it’s a clear-cut thing that these models are based on “theft”. Legally speaking, I don’t think this is really established yet - it’s a new type of technology and copyright law has not caught up.

I do, yes. Obviously the courts have not weighed in yet on the specific lawsuits at play, but that doesn’t prevent people from drawing their conclusions based on available evidence and knowledge of the laws.

I have seen with my own eyes these tools generate images and text that are very clearly copyright-infringing.

Arguing that they are somehow absolved of all responsibility because of how the users use the tools is like arguing that a pirate website or Napster bears no responsibility for being a repository of pirated material because it’s the users who are uploading and downloading the actual files. That has historically not worked out too well for the app makers. It’s the reason YouTube errs on the side of copyright claims - they don’t want to get drawn into that battle.

I also don’t personally find any weight to the argument that AI is ‘just learning like humans learn’. That’s like arguing that NFL teams should be allowed to use Mark Rober’s kicking robot in the Super Bowl because “it kicks just like a human does”.

Faraday

Just came across this latest insanity and felt obliged to share.

As of today, there are about half a dozen books being sold on Amazon, with my name on them, that I did not write or publish. Some huckster generated them using AI. This promises to be a serious problem for the book publishing world.

A brief update: After going back a few times with Amazon on this issue, I was notified the books would not be removed based on the information I provided. Since I do not own copyright in these AI works and since my name is not trademarked, I’m not sure what can be done.

It did eventually get sorted out, but only because this particular author had lawyers to advocate for them with Amazon.

Rinel

@Faraday said in AI Megathread:

I also don’t personally find any weight to the argument that AI is ‘just learning like humans learn’.

It’s demonstrably false, as I put forward in the mermaid argument earlier. You can show a human a mermaid and tell them to make one who is half octopus instead of half fish. You can’t do that with LLMs. You have to phrase the imput differently when trying to generate novel ideas, because LLMs /cannot learn/. They aren’t sapient. They aren’t even sentient. The fact that you can use certain tools to end up with an approximate result with an LLM doesn’t mean the AI is learning.

sao

I disagree that the law hasn’t caught up. The law of transformative versus derivative work is directly applicable to the theory behind the training data and its use. What hasn’t caught up is legislation, but it’s already illegal under existing common-law standards, it’s just that that’s difficult to enforce because it’s case by case and a lot of the actual practice of it is stupidly based on who can afford a fancy IP lawyer and who is going to believe a shifty agreement is lawful just because it was signed.

I got into an argument about this just the other day on wyrdhold but the innocent bystanders were screaming and crying about the crossfire so I had to stop.

The element of human creativity to create a new thing is already the basis of the legal distinction between transformative (new art) and derivative (copied art) work.

Faraday

@sao said in AI Megathread:

The element of human creativity to create a new thing is already the basis of the legal distinction between transformative (new art) and derivative (copied art) work.

Very true. It also staggers me just how many folks cry “but it’s transformative!” like that’s a defense. Transformative art is by default copyright infringement. Fair use is an exception that requires specific criteria. Transformation alone is not enough.

That’s why people still need permission to make a movie from a book, or a video game from a movie, or to record a cover song, even though all of these things are “transformative”. (YT’s rules for covers using ContentID makes things murky, but still gives the rights holder the control to block it, because it’s copyright infringement.)

In other AI news - grocery store app generates deadly “recipes”.

https://www.theguardian.com/world/2023/aug/10/pak-n-save-savey-meal-bot-ai-app-malfunction-recipes

Other instances have involved everything from the dangerous (undercooked meat) to the nonsensical.

Hopefully people will eventually learn that LLMs cannot be trusted for accurate information.

Rinel

@Faraday said in AI Megathread:

Fair use is an exception that requires specific criteria. Transformation alone is not enough.

And determining what is fair use is an absolute fucking mess. I’ve had tons of people get mad at me when I say that fanfic and fanart are generally not fair use, because they’ve been told that if you aren’t selling it then it’s fine. It’s not fine just because you aren’t selling it!

Don’t get me wrong, I support fanart and fanfic and even write fanfic, but I’m well aware that I’m operating in a grey area of the law. I just don’t care about the law when it comes to that sort of thing, because the law is overly restrictive.

As a total aside to this largely tangential post, one of the funnier things to emerge out of this common misconception is the extreme taboo people have on selling fanfic, while fanartists routinely sell their work.

Trashcan

I made an account just to come rant on this topic and then posted in the wrong thread so now I’m here.

I think everyone broadly agrees that plagiarism is morally wrong. Plagiarism has two aspects: 1) the theft of someone else’s work and 2) the misattribution of that work to someone who did not produce it. Both aspects are wrong individually. Why is it that with AI, people are willing to hedge around the second one just because the first one has been rendered fuzzy and unclear?

Transparency is required. If I copy-pasted the world of Popular Franchise X and did a Find-Replace for recognizable words and changed those to something else, and claimed that I had created an Original Theme, everyone would get that this was Wrong. If I did the same thing but said “yes this is shamelessly ripped from Franchise X”, there might be opinions on whether it’s lazy and not worth engaging with, but transparency would have rendered this down from Clearly Unethical all the way to Sort of Low Effort, Isn’t It?.

There is no money to be made in Mushing; we are all doing this for the pure pleasure of reading other people’s writing and having our writing be read. What we receive is entertainment and validation, and the balance of those two vs. how annoying we are OOC makes up our entire reputation in this community. I don’t buy that you can explain away the unethical nature of undue validation being rendered with “but you were entertained and isn’t that enough?” No. You’ve robbed me of a whole half of this experience. At least be honest about it and let me decide if half is enough.

Faraday

@Trashcan said in AI Megathread:

Why is it that with AI, people are willing to hedge around the second one just because the first one has been rendered fuzzy and unclear?

The argument given by many AI defenders is that generative AI is not plagiarism because “the AI is just learning the way humans do”.

For example, if I read a whole lot of articles about D-Day, developed a coherent understanding of the lead-up, events, and effects of D-Day, and then wrote a completely original article about D-Day, all while citing my sources and taking care to quote directly when using other peoples’ words–that’s fine, right? That’s not plagiarism.

The problem is that people think that’s how generative AI works. It isn’t. It doesn’t have an understanding of D-Day because it doesn’t have any actual intelligence. It doesn’t really know what D-Day is. It can’t distinguish fact from fiction. It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).

At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details. And, critically, it doesn’t and can’t cite its sources because it literally doesn’t know where it’s getting its stuff from. All the data just went into a giant blender of words and concepts.

It may not be the exact same process as copy/paste human plagiarism, but the net output is the same.

Pavel

@Faraday said in AI Megathread:

then it fills out the details

Often wildly incorrectly, too. Which isn’t the point you were making, but it does help reinforce the idea that the AI (which is still a stupid name for the thing) doesn’t know anything.

Tributary

AI is just statistics. There’s nothing intelligent about it, really. AI just looks at the statistics surrounding patterns and does some math to model those patterns.

These tools are not new, and they are not as poorly understood as a lot of fans seem to think. I have a textbook printed in 2006 that has exercises for students to write neural networks, among other things.

Many aspects of biological memories [as compared to computer memories] are not understood.

In 1943 McCulloch and Pitts recognized that a network of simple neurons was capable of universal computation. That means that such a network could, in principle, perform any calculation that could be carried out with the most general computer imaginable. [More precisely, such a network can calculate any computable function int he sense of a general purpose Turing machine.] This attracted a good deal of interest from researchers interested in modeling the brain.

Giordino, Nicholas J. and Hisao Nakanishi, 2006, Computational Physics, Second Edition, Pearson Education, Inc., Upper Saddle River, NJ.

The first edition came out in 1997, and if it’s in a textbook, it’s not cutting edge. None of this is cutting edge. It’s really just that now we have the processing power to do the calculations required and the memory in which to store it.

That said, I found it impossible to land a job in data analytics despite having masters degrees in math and physics because (in part) people seem to love the idea of math and statistics being far more mysterious than they actually are.

Pavel

@Tributary said in AI Megathread:

people seem to love the idea of math and statistics being far more mysterious than they actually are

They are, to the people with MBAs who inevitably end up running things for some reason. (I’m terrible at math and stats, but I understand some of the principles enough to know it’s not entirely magic.)

Faraday

So funny tangent about plagiarism…

Not only does ChatGPT plagiarize other authors’ work, it even plagiarizes itself. I asked it “how do I build a skill system in AresMUSH” and then asked it the same for Evennia.

For each I got a fairly bland summary of tips that apply to all skill systems everywhere (because literally that’s how it built the info - from the blender of concepts associated with everything it’s ever scanned about “building skill systems”)… but notably it was the SAME SUMMARY.

(For Evennia)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay, completing quests, or other in-game actions. You’ll need to implement a mechanism for characters to spend those points to increase their skill ratings.

(For AresMUSH)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay or completing quests. Create mechanisms for characters to spend these points to increase their skill ratings.

That’s just a snippet. The rest of its advice was pretty identical too.

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

(*) - The seed value is normally behind the scenes and randomized to make the responses appear more random/original, but under the hood it’s there and can be controlled. Like how you can use a seed value in Minecraft to build the same world as someone else.

Sage

@Faraday said in AI Megathread:

It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).
At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details.

Which is why what it does is not really plagiarism. Now I’m not going to get into all the metaphysics of how humans ‘know things’ and how capable we are of creativity, and I’m not going to argue about whether the output of ChatGPT is any good.

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster). If ChatGPT were to take significant passages from someone’s work and pass it off as it’s own, then sure, that would be plagiarism, but that’s not what it is really doing.

I’m also not going to say that it is ok for ChatGPT to be trained on works without payment to the creators of those works. I’m not sure that falls under the terms of ‘fair-use’. (I’m not sure it doesn’t, either. I need more time to fully consider the situation, but considering that OpenAI plans to make money from it, I’m leaning towards ‘not’).

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

Trashcan

@Sage said in AI Megathread:

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster).

If you do not disclose to other players that you are using AI in your “workflow”, whether or not you’re plagiarizing the people whose content was used to build the LLM, you ARE plagiarizing the LLM because you are “passing off the words of another as one’s own”.

Transparency is required.

Edited to clarify: where your “workflow” includes copy-pasting from the output of an LLM.

Pavel

@Sage said in AI Megathread:

I’m not sure that falls under the terms of ‘fair-use’.

Unfortunately, fair use remains one of those issues that will only be truly decided in the courts.

@Trashcan said in AI Megathread:

If you do not disclose to other players that you are using AI in your “workflow”

I think that depends entirely on what you use it for. If you use it to sketch out a very rough (probably generic) idea, but then put in the work to turn the idea into something actually workable and suitable? That’d be the same, to my mind, as using a name generator.

But if you used it to write an entire character description, or an entire lore file? That’s different.