AI Megathread

Faraday

Why is it that with AI, people are willing to hedge around the second one just because the first one has been rendered fuzzy and unclear?

The argument given by many AI defenders is that generative AI is not plagiarism because “the AI is just learning the way humans do”.

For example, if I read a whole lot of articles about D-Day, developed a coherent understanding of the lead-up, events, and effects of D-Day, and then wrote a completely original article about D-Day, all while citing my sources and taking care to quote directly when using other peoples’ words–that’s fine, right? That’s not plagiarism.

The problem is that people think that’s how generative AI works. It isn’t. It doesn’t have an understanding of D-Day because it doesn’t have any actual intelligence. It doesn’t really know what D-Day is. It can’t distinguish fact from fiction. It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).

At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details. And, critically, it doesn’t and can’t cite its sources because it literally doesn’t know where it’s getting its stuff from. All the data just went into a giant blender of words and concepts.

It may not be the exact same process as copy/paste human plagiarism, but the net output is the same.

Pavel

@Faraday said in AI Megathread:

then it fills out the details

Often wildly incorrectly, too. Which isn’t the point you were making, but it does help reinforce the idea that the AI (which is still a stupid name for the thing) doesn’t know anything.

Tributary

AI is just statistics. There’s nothing intelligent about it, really. AI just looks at the statistics surrounding patterns and does some math to model those patterns.

These tools are not new, and they are not as poorly understood as a lot of fans seem to think. I have a textbook printed in 2006 that has exercises for students to write neural networks, among other things.

Many aspects of biological memories [as compared to computer memories] are not understood.

In 1943 McCulloch and Pitts recognized that a network of simple neurons was capable of universal computation. That means that such a network could, in principle, perform any calculation that could be carried out with the most general computer imaginable. [More precisely, such a network can calculate any computable function int he sense of a general purpose Turing machine.] This attracted a good deal of interest from researchers interested in modeling the brain.

Giordino, Nicholas J. and Hisao Nakanishi, 2006, Computational Physics, Second Edition, Pearson Education, Inc., Upper Saddle River, NJ.

The first edition came out in 1997, and if it’s in a textbook, it’s not cutting edge. None of this is cutting edge. It’s really just that now we have the processing power to do the calculations required and the memory in which to store it.

That said, I found it impossible to land a job in data analytics despite having masters degrees in math and physics because (in part) people seem to love the idea of math and statistics being far more mysterious than they actually are.

Pavel

@Tributary said in AI Megathread:

people seem to love the idea of math and statistics being far more mysterious than they actually are

They are, to the people with MBAs who inevitably end up running things for some reason. (I’m terrible at math and stats, but I understand some of the principles enough to know it’s not entirely magic.)

Faraday

So funny tangent about plagiarism…

Not only does ChatGPT plagiarize other authors’ work, it even plagiarizes itself. I asked it “how do I build a skill system in AresMUSH” and then asked it the same for Evennia.

For each I got a fairly bland summary of tips that apply to all skill systems everywhere (because literally that’s how it built the info - from the blender of concepts associated with everything it’s ever scanned about “building skill systems”)… but notably it was the SAME SUMMARY.

(For Evennia)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay, completing quests, or other in-game actions. You’ll need to implement a mechanism for characters to spend those points to increase their skill ratings.

(For AresMUSH)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay or completing quests. Create mechanisms for characters to spend these points to increase their skill ratings.

That’s just a snippet. The rest of its advice was pretty identical too.

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

(*) - The seed value is normally behind the scenes and randomized to make the responses appear more random/original, but under the hood it’s there and can be controlled. Like how you can use a seed value in Minecraft to build the same world as someone else.

Sage

@Faraday said in AI Megathread:

It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).
At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details.

Which is why what it does is not really plagiarism. Now I’m not going to get into all the metaphysics of how humans ‘know things’ and how capable we are of creativity, and I’m not going to argue about whether the output of ChatGPT is any good.

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster). If ChatGPT were to take significant passages from someone’s work and pass it off as it’s own, then sure, that would be plagiarism, but that’s not what it is really doing.

I’m also not going to say that it is ok for ChatGPT to be trained on works without payment to the creators of those works. I’m not sure that falls under the terms of ‘fair-use’. (I’m not sure it doesn’t, either. I need more time to fully consider the situation, but considering that OpenAI plans to make money from it, I’m leaning towards ‘not’).

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

Trashcan

@Sage said in AI Megathread:

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster).

If you do not disclose to other players that you are using AI in your “workflow”, whether or not you’re plagiarizing the people whose content was used to build the LLM, you ARE plagiarizing the LLM because you are “passing off the words of another as one’s own”.

Transparency is required.

Edited to clarify: where your “workflow” includes copy-pasting from the output of an LLM.

Pavel

@Sage said in AI Megathread:

I’m not sure that falls under the terms of ‘fair-use’.

Unfortunately, fair use remains one of those issues that will only be truly decided in the courts.

@Trashcan said in AI Megathread:

If you do not disclose to other players that you are using AI in your “workflow”

I think that depends entirely on what you use it for. If you use it to sketch out a very rough (probably generic) idea, but then put in the work to turn the idea into something actually workable and suitable? That’d be the same, to my mind, as using a name generator.

But if you used it to write an entire character description, or an entire lore file? That’s different.

Sage

@Trashcan Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

You are referring to a use case, which is not necessarily a good metric of whether a tool has value. It’s like arguing that a hammer is a terrible tool because it can be used to hit someone in the head.

Trashcan

@Sage
People are out here hitting people in the head with the hammer, so at the moment I am trying to establish common ground that we can all agree on, like “hitting people in the head with the hammer is bad”.

I recognize the debate on whether the hammer itself is bad or not is more nuanced. I think we can all agree slugging people in the head with the hammer is probably wrong, regardless of whether the hammer is made of fair trade rubber or blood diamonds.

Pavel

@Sage said in AI Megathread:

Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

If I write something in a paper, and don’t cite where I found that information, that’s treated as plagiarism - even if that work is myself from a previous writing. Because I’m taking their idea, without giving them credit for it.

Plagiarising is defined thusly: “to steal and pass off (the ideas or words of another) as one’s own : use (another’s production) without crediting the source”(Merriam-Webster, 2023).

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

References

Merriam-Webster. (2023, August 9). Definition of plagiarizing. Merriam-Webster.com. https://www.merriam-webster.com/dictionary/plagiarizing

Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313–313. https://doi.org/10.1126/science.adg7879

Park, Y. J., Kaplan, D. M., Ren, Z., Hsu, C.-W., Li, C., Xu, H., Li, S., & Li, J. (2023). Can ChatGPT be used to generate scientific hypotheses? ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2304.12208

Faraday

@Sage said in AI Megathread:

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

In the case where I said that ChatGPT is “plagiarizing itself”, that was meant to be tongue-in-cheek. You can’t, by definition, plagiarize yourself.

But in the broader sense of “is what ChatGPT does plagiarism”, I disagree for the same reasons Pavel cited here:

@Pavel said in AI Megathread:

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

We can quibble about the exact lines between plagiarism, copyright infringement, trademark infringement, etc. but it’s just semantics. Fundamentally it’s all about profiting off the work of others without proper attribution, permission, and compensation. Even if a million courts said it was legal (which I highly doubt but we’ll see), you would still not convince me that it wasn’t wrong.

Pavel

@Faraday said in AI Megathread:

You can’t, by definition, plagiarize yourself

Technically you can, though it’s less about “using someone else’s ideas or words” and more about using pre-existing work without acknowledgement. It’s a silly name, as far as I’m concerned, but it’s a thing.

Sage

@Pavel said in AI Megathread:

If I write something in a paper, and don’t cite where I found that information, that’s treated as plagiarism.

By your reasoning, if I say “George Washington was the first president of the United States of America” then it is plagiarism, despite the fact that this is simply something I know. Yes, I did originally learn it from another source, but since I have not stated the source, that is plagiarism.

No, that is simply poor referencing.

Yes, I’m aware that there are professors out there who will decry that as ‘plagiarism’, but guess what? Professors don’t actually get to decide the definition of words beyond the boundaries of their classrooms, and they really need to get over it, because with such a definition nearly everything they produce is also rife with ‘plagiarism’ (but of course since no student would ever dare call them out on this fact they remain relatively insulated from such a fact).

After all, you must have read the phrase ‘If I write something in a paper’, or something close which gave you the idea, and you didn’t cite that, did you? How about the concept of a citation? You forgot to include where that came from, so does that mean you are passing it off as ‘your own’?

The idea that not citing yourself being plagiarism is clearly ludicrous because, as your definition shows, the ideas or words need to originate with another person for it to be plagiarism, yet, yes, those professors will mark it as plagiarism (while not properly citing themselves in most of their handouts to their students).

What those professors, and you by extension, are doing is playing the role of Humpty Dumpty in Lewis Caroll’s “Through the Looking Glass”;

‘When I use a word,’ Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean–neither more nor less.’

Sage

@Pavel said in AI Megathread:

Technically you can

No, you can’t. You can self-plagiarize, but you cannot plagiarize yourself. As a compound word self-plagiarize has its own meaning.

A pineapple is not related to pine or apple trees, after all.

Pavel

@Sage I’m going to assume your hostile tone isn’t intentional.

All it comes down to, whatever your semantics may be, is an ethical code. That is to say a code imposed by an authority or administrative body of some kind. That code defines the words as far as it’s concerned. So plagiarism for me is different to you because I habitually operate under a different ethical code to you. It’s a perspective issue.

And ultimately that’s what is likely to happen vis a vis ChatGPT and other such artful devices: Some authority will impose an ethical code on its use (many universities are now requiring its use to be cited and indicated on the title page of the paper, for instance), and will define whatever terms it wants. Ludicrous or not.

Sage

@Pavel Hostility was not intended. Instead I was mainly trying to show that the definition that most professors attempt to apply is not only nonsensical, it is also hypocritical.

And yes, the authority does get to define the code as far as it is concerned. That is basically what I stated. I will even agree that they not only get to, but they often need to because words tend to not have absolutely agreed upon definitions (whether Pluto is/was a planet is a good example of the need for specific definition in certain contexts).

However, we are not talking about ‘as far as they are concerned’. ChatGPT is not one of their students, so their definition does not apply as to whether ChatGPT should generally be considered to be plagiarizing.

After all, if the ones in charge decided, for some absurd reason, to call a lack of proper citations ‘murder’ you would not expect the police to arrest students and for them to be tried, would you? Yet I don’t think either of us would argue that they couldn’t call it that, just that it would be foolish for them to do so.

I agree that academic authorities absolutely can, and even should, include rules for its use (or banning its use) in their codes. Again, that is not my argument in the slightest (just as it is not my argument that OpenAI should be free to load whatever material they chose into their models)

My argument is purely as to whether ChatGPT can, itself, be considered to be plagiarizing according to the commonly accepted definitions of plagiarism.

Pavel

@Sage said in AI Megathread:

My argument is purely as to whether ChatGPT can, itself, be considered to be plagiarizing according to the commonly accepted definitions of plagiarism.

I would say both yes and no. It doesn’t have intention, so it doesn’t plagiarise per se. But those who use it have intent, and given that they quite obviously are using other peoples’ work without giving due credit, then it would be plagiarism.

Citing sources isn’t just to give proper credit. It’s to give readers somewhere to look whenever you use information that isn’t found within whatever they’re currently reading. It isn’t only the copying of others’ work that plagiarism stands against, but ensuring the capacity for information transfer and learning.

For instance, using your example of Georgie Washers being the first president of the United States. Someone is going to come across that piece of information, today, for the first time. Not here, sure, but somewhere. The idea of citing that information is to give people somewhere else to look in order to find out more.

But that’s more an interesting tid-bit than anything of relevance.

ChatGPT uses other peoples’ work to produce ‘new’ work, without providing credit or reference to those other works. But it’s also not a being with intentionality. So maybe it can plagiarise, but anyone who uses it in any setting that acknowledges the idea of plagiarism is plagiarising.

ETA:

@Sage said in AI Megathread:

Instead I was mainly trying to show that the definition that most professors attempt to apply is not only nonsensical, it is also hypocritical.

This might be true some places, but in my academic experience the application of the definition is usually thus: If you provide a pertinent fact that the current paper you’re writing doesn’t demonstrate to be true (or prove, in laymen’s terms), then you have to cite either where you learned that fact or a reliable source that does demonstrate it to be true.

Self-plagiarism (which is plagiarising yourself, according to Dictionary.com) is mostly considered bad because it’s just rehashing the same thing you’ve said before and not giving anything new. Especially for students, wherein written work is supposed to assess what you know etc. I think it’s a stupid name but not a terrible concept.

bored

@Faraday said in AI Megathread:

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

Technical caveat: depending on the settings, this isn’t always true. Some setups or techniques will make outputs non-reproducible (or more accurately, practically non-reproducible unless you can replicate the entire state of the machine at the time of image creation, including hardware). Early xformers library for SD was an example of this. Interestingly, techniques that cause this problem (you want reproducibility because determinism improves your results) are performance-enhancing ones; xformers is an NVIDIA optimization. Performance is always going to be desirable, so it’s not unclear that branches of development might not pursue this kind of technology.

It will be interesting to see how the legal stuff shakes out in the long term, because I don’t see that this division is clear cut. If you replicated an artist’s exact steps, you could also replicate their art pixel-for-pixel. It’s impractical for a painter, sure: we’re bad at fluid dynamics, so this would be akin to knowing the exact hardware state above. But there are plenty of all-digital artists nowadays. Replicating PS work is trivial. In fact, the software is already keeping a record of the steps required to generate the pixels, and you can step back and forth through them with undo-redo. Isn’t that the same thing?

At what point is the # of steps taken by an ‘AI artist’ to generate a ~~unique~~ interesting result sufficient to represent ‘creativity’?

Rinel

@bored said in AI Megathread:

At what point is the # of steps taken by an ‘AI artist’ to generate a unique result sufficient to represent ‘creativity’?

when the steps include not using automated image generation and making the art themselves