AI Megathread

Faraday

So funny tangent about plagiarism…

Not only does ChatGPT plagiarize other authors’ work, it even plagiarizes itself. I asked it “how do I build a skill system in AresMUSH” and then asked it the same for Evennia.

For each I got a fairly bland summary of tips that apply to all skill systems everywhere (because literally that’s how it built the info - from the blender of concepts associated with everything it’s ever scanned about “building skill systems”)… but notably it was the SAME SUMMARY.

(For Evennia)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay, completing quests, or other in-game actions. You’ll need to implement a mechanism for characters to spend those points to increase their skill ratings.

(For AresMUSH)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay or completing quests. Create mechanisms for characters to spend these points to increase their skill ratings.

That’s just a snippet. The rest of its advice was pretty identical too.

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

(*) - The seed value is normally behind the scenes and randomized to make the responses appear more random/original, but under the hood it’s there and can be controlled. Like how you can use a seed value in Minecraft to build the same world as someone else.

Sage

@Faraday said in AI Megathread:

It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).
At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details.

Which is why what it does is not really plagiarism. Now I’m not going to get into all the metaphysics of how humans ‘know things’ and how capable we are of creativity, and I’m not going to argue about whether the output of ChatGPT is any good.

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster). If ChatGPT were to take significant passages from someone’s work and pass it off as it’s own, then sure, that would be plagiarism, but that’s not what it is really doing.

I’m also not going to say that it is ok for ChatGPT to be trained on works without payment to the creators of those works. I’m not sure that falls under the terms of ‘fair-use’. (I’m not sure it doesn’t, either. I need more time to fully consider the situation, but considering that OpenAI plans to make money from it, I’m leaning towards ‘not’).

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

Trashcan

@Sage said in AI Megathread:

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster).

If you do not disclose to other players that you are using AI in your “workflow”, whether or not you’re plagiarizing the people whose content was used to build the LLM, you ARE plagiarizing the LLM because you are “passing off the words of another as one’s own”.

Transparency is required.

Edited to clarify: where your “workflow” includes copy-pasting from the output of an LLM.

Pavel

@Sage said in AI Megathread:

I’m not sure that falls under the terms of ‘fair-use’.

Unfortunately, fair use remains one of those issues that will only be truly decided in the courts.

@Trashcan said in AI Megathread:

If you do not disclose to other players that you are using AI in your “workflow”

I think that depends entirely on what you use it for. If you use it to sketch out a very rough (probably generic) idea, but then put in the work to turn the idea into something actually workable and suitable? That’d be the same, to my mind, as using a name generator.

But if you used it to write an entire character description, or an entire lore file? That’s different.

Sage

@Trashcan Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

You are referring to a use case, which is not necessarily a good metric of whether a tool has value. It’s like arguing that a hammer is a terrible tool because it can be used to hit someone in the head.

Trashcan

@Sage
People are out here hitting people in the head with the hammer, so at the moment I am trying to establish common ground that we can all agree on, like “hitting people in the head with the hammer is bad”.

I recognize the debate on whether the hammer itself is bad or not is more nuanced. I think we can all agree slugging people in the head with the hammer is probably wrong, regardless of whether the hammer is made of fair trade rubber or blood diamonds.

Pavel

@Sage said in AI Megathread:

Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

If I write something in a paper, and don’t cite where I found that information, that’s treated as plagiarism - even if that work is myself from a previous writing. Because I’m taking their idea, without giving them credit for it.

Plagiarising is defined thusly: “to steal and pass off (the ideas or words of another) as one’s own : use (another’s production) without crediting the source”(Merriam-Webster, 2023).

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

References

Merriam-Webster. (2023, August 9). Definition of plagiarizing. Merriam-Webster.com. https://www.merriam-webster.com/dictionary/plagiarizing

Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313–313. https://doi.org/10.1126/science.adg7879

Park, Y. J., Kaplan, D. M., Ren, Z., Hsu, C.-W., Li, C., Xu, H., Li, S., & Li, J. (2023). Can ChatGPT be used to generate scientific hypotheses? ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2304.12208

Faraday

@Sage said in AI Megathread:

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

In the case where I said that ChatGPT is “plagiarizing itself”, that was meant to be tongue-in-cheek. You can’t, by definition, plagiarize yourself.

But in the broader sense of “is what ChatGPT does plagiarism”, I disagree for the same reasons Pavel cited here:

@Pavel said in AI Megathread:

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

We can quibble about the exact lines between plagiarism, copyright infringement, trademark infringement, etc. but it’s just semantics. Fundamentally it’s all about profiting off the work of others without proper attribution, permission, and compensation. Even if a million courts said it was legal (which I highly doubt but we’ll see), you would still not convince me that it wasn’t wrong.

Pavel

@Faraday said in AI Megathread:

You can’t, by definition, plagiarize yourself

Technically you can, though it’s less about “using someone else’s ideas or words” and more about using pre-existing work without acknowledgement. It’s a silly name, as far as I’m concerned, but it’s a thing.

Sage

@Pavel said in AI Megathread:

If I write something in a paper, and don’t cite where I found that information, that’s treated as plagiarism.

By your reasoning, if I say “George Washington was the first president of the United States of America” then it is plagiarism, despite the fact that this is simply something I know. Yes, I did originally learn it from another source, but since I have not stated the source, that is plagiarism.

No, that is simply poor referencing.

Yes, I’m aware that there are professors out there who will decry that as ‘plagiarism’, but guess what? Professors don’t actually get to decide the definition of words beyond the boundaries of their classrooms, and they really need to get over it, because with such a definition nearly everything they produce is also rife with ‘plagiarism’ (but of course since no student would ever dare call them out on this fact they remain relatively insulated from such a fact).

After all, you must have read the phrase ‘If I write something in a paper’, or something close which gave you the idea, and you didn’t cite that, did you? How about the concept of a citation? You forgot to include where that came from, so does that mean you are passing it off as ‘your own’?

The idea that not citing yourself being plagiarism is clearly ludicrous because, as your definition shows, the ideas or words need to originate with another person for it to be plagiarism, yet, yes, those professors will mark it as plagiarism (while not properly citing themselves in most of their handouts to their students).

What those professors, and you by extension, are doing is playing the role of Humpty Dumpty in Lewis Caroll’s “Through the Looking Glass”;

‘When I use a word,’ Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean–neither more nor less.’

Sage

@Pavel said in AI Megathread:

Technically you can

No, you can’t. You can self-plagiarize, but you cannot plagiarize yourself. As a compound word self-plagiarize has its own meaning.

A pineapple is not related to pine or apple trees, after all.

Pavel

@Sage I’m going to assume your hostile tone isn’t intentional.

All it comes down to, whatever your semantics may be, is an ethical code. That is to say a code imposed by an authority or administrative body of some kind. That code defines the words as far as it’s concerned. So plagiarism for me is different to you because I habitually operate under a different ethical code to you. It’s a perspective issue.

And ultimately that’s what is likely to happen vis a vis ChatGPT and other such artful devices: Some authority will impose an ethical code on its use (many universities are now requiring its use to be cited and indicated on the title page of the paper, for instance), and will define whatever terms it wants. Ludicrous or not.

Sage

@Pavel Hostility was not intended. Instead I was mainly trying to show that the definition that most professors attempt to apply is not only nonsensical, it is also hypocritical.

And yes, the authority does get to define the code as far as it is concerned. That is basically what I stated. I will even agree that they not only get to, but they often need to because words tend to not have absolutely agreed upon definitions (whether Pluto is/was a planet is a good example of the need for specific definition in certain contexts).

However, we are not talking about ‘as far as they are concerned’. ChatGPT is not one of their students, so their definition does not apply as to whether ChatGPT should generally be considered to be plagiarizing.

After all, if the ones in charge decided, for some absurd reason, to call a lack of proper citations ‘murder’ you would not expect the police to arrest students and for them to be tried, would you? Yet I don’t think either of us would argue that they couldn’t call it that, just that it would be foolish for them to do so.

I agree that academic authorities absolutely can, and even should, include rules for its use (or banning its use) in their codes. Again, that is not my argument in the slightest (just as it is not my argument that OpenAI should be free to load whatever material they chose into their models)

My argument is purely as to whether ChatGPT can, itself, be considered to be plagiarizing according to the commonly accepted definitions of plagiarism.

Pavel

@Sage said in AI Megathread:

My argument is purely as to whether ChatGPT can, itself, be considered to be plagiarizing according to the commonly accepted definitions of plagiarism.

I would say both yes and no. It doesn’t have intention, so it doesn’t plagiarise per se. But those who use it have intent, and given that they quite obviously are using other peoples’ work without giving due credit, then it would be plagiarism.

Citing sources isn’t just to give proper credit. It’s to give readers somewhere to look whenever you use information that isn’t found within whatever they’re currently reading. It isn’t only the copying of others’ work that plagiarism stands against, but ensuring the capacity for information transfer and learning.

For instance, using your example of Georgie Washers being the first president of the United States. Someone is going to come across that piece of information, today, for the first time. Not here, sure, but somewhere. The idea of citing that information is to give people somewhere else to look in order to find out more.

But that’s more an interesting tid-bit than anything of relevance.

ChatGPT uses other peoples’ work to produce ‘new’ work, without providing credit or reference to those other works. But it’s also not a being with intentionality. So maybe it can plagiarise, but anyone who uses it in any setting that acknowledges the idea of plagiarism is plagiarising.

ETA:

@Sage said in AI Megathread:

Instead I was mainly trying to show that the definition that most professors attempt to apply is not only nonsensical, it is also hypocritical.

This might be true some places, but in my academic experience the application of the definition is usually thus: If you provide a pertinent fact that the current paper you’re writing doesn’t demonstrate to be true (or prove, in laymen’s terms), then you have to cite either where you learned that fact or a reliable source that does demonstrate it to be true.

Self-plagiarism (which is plagiarising yourself, according to Dictionary.com) is mostly considered bad because it’s just rehashing the same thing you’ve said before and not giving anything new. Especially for students, wherein written work is supposed to assess what you know etc. I think it’s a stupid name but not a terrible concept.

bored

@Faraday said in AI Megathread:

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

Technical caveat: depending on the settings, this isn’t always true. Some setups or techniques will make outputs non-reproducible (or more accurately, practically non-reproducible unless you can replicate the entire state of the machine at the time of image creation, including hardware). Early xformers library for SD was an example of this. Interestingly, techniques that cause this problem (you want reproducibility because determinism improves your results) are performance-enhancing ones; xformers is an NVIDIA optimization. Performance is always going to be desirable, so it’s not unclear that branches of development might not pursue this kind of technology.

It will be interesting to see how the legal stuff shakes out in the long term, because I don’t see that this division is clear cut. If you replicated an artist’s exact steps, you could also replicate their art pixel-for-pixel. It’s impractical for a painter, sure: we’re bad at fluid dynamics, so this would be akin to knowing the exact hardware state above. But there are plenty of all-digital artists nowadays. Replicating PS work is trivial. In fact, the software is already keeping a record of the steps required to generate the pixels, and you can step back and forth through them with undo-redo. Isn’t that the same thing?

At what point is the # of steps taken by an ‘AI artist’ to generate a ~~unique~~ interesting result sufficient to represent ‘creativity’?

Rinel

@bored said in AI Megathread:

At what point is the # of steps taken by an ‘AI artist’ to generate a unique result sufficient to represent ‘creativity’?

when the steps include not using automated image generation and making the art themselves

Pavel

@bored said in AI Megathread:

or more accurately, practically non-reproducible unless you can replicate the entire state of the machine at the time of image creation, including hardware

When you say the entire hardware state, is that meaning things like the exact composition of the silicon in a chip, or is it more macro-scale, like having the same graphics card?

Sage

@Pavel said in AI Megathread:

Citing sources isn’t just to give proper credit. It’s to give readers somewhere to look whenever you use information that isn’t found within whatever they’re currently reading.

Absolutely, and if you want to criticize ChatGPT for not proving links as to where it got the information I wouldn’t argue against that (though it isn’t clear to me if the LLM is even capable of doing that given how it constructs its replies).

It isn’t only the copying of others’ work that plagiarism stands against, but ensuring the capacity for information transfer and learning.

I think that’s the crux of the problem. The commonly accepted definition of plagiarism is very much concerned with the copying of other people’s work. Its etymological roots even come from the latin word for kidnapping.

In academia, however, the definition and purpose have changed over time to the point where issues such as citation are falling within its purview in those specific circles. As this occurs the academic definition drifts further and further from the commonly accepted one, and this becomes problematic.

That’s all sort of beside the point, however, because for this discussion we are not talking about an academic paper or a student. We are talking about the term as it is generally applied.

But it’s also not a being with intentionality.

I would definitely argue that a machine’s lack of intentionality is not the reasoning behind my thinking. I could quite definitely construct a program whose output I feel would meet the commonly accepted definition of plagiarism (feed in a block of text and the program swaps out words for synonyms). I just don’t think the LLM, as has been described, meets such a definition.

N.B.: I will add that I could be wrong due to a misunderstanding of the LLM, either from a mistake on my part or untrue statements on the part of OpenAI, but then we get into another whole Pandora’s box of how we ‘know things’. I also agree completely that students using it without attribution are guilty of plagiarism themselves

I think it’s a stupid name but not a terrible concept.

Agreed on both points. There’s quite definitely a reason for the existence of such a term, it’s just that the term should not really lead to the implication that it does, because of the general definitions involved.

It is sort of like if people used the term ‘self-kidnapping’ for when someone takes a sick day even though they are not really ill.

I think at the end of the day we probably have more in common with our feeling of ChatGPT than our differences. It is just that I am somewhat opposed to people simply stating that ‘ChatGPT plagiarizes’ without at least providing more context (such as the specific use of plagiarism in this instance).

Ironically, this is because of, as you put it, ‘information transfer and learning’. Without the context it is too likely that the average person will read the sentence and assume it to mean that ChatGPT is functioning like my theoretical program, copying large blocks of text and merely swapping around some words a bit without acknowledgement given to the original writer, as opposed to them understanding that what you are saying is that it does not provide references to where its information has come from.

Tributary

@bored If you replicate an artist’s exact steps with paint and brush on canvas, you still wouldn’t have an original work. You’d have a copy. It wouldn’t be original at all. Indeed, it’s one of the ways the Renaissance painters trained their apprentices. There are paintings which were collaborative efforts and those where it is not clear who actually did the work.

But paintings are not done in pixels. Really, most original digital works only consist of pixels in that they are a model of paint. The value of digital art comes not in its representations on storage media but in the original idea expressed by the artist. In this way, they are just like paintings in that only the one who put the original composition together gets credit for creativity.

Image generating AIs are essentially creating collages of other works, works that they largely are not licensed to use. And while collage can have value as an art form, the fact that something is a collage is part of its acknowledgement as a creative work.

Tributary

@Sage said in AI Megathread:

Ironically, this is because of, as you put it, ‘information transfer and learning’. Without the context it is too likely that the average person will read the sentence and assume it to mean that ChatGPT is functioning like my theoretical program, copying large blocks of text and merely swapping around some words a bit without acknowledgement given to the original writer, as opposed to them understanding that what you are saying is that it does not provide references to where its information has come from.

Are you suggesting that ChatGPT does not just regurgitate paragraphs it finds on the internet? Because it certainly does. Ask @Faraday about how it just vomits up stuff from her website when asked about Ares.