AI Megathread

Sage

@Trashcan Don’t really know about Bing and it’s implementation. What I do know is this:

completion = openai.ChatCompletion.create(
   model="gpt-3.5-turbo-16k",
   messages = [
     {"role": "user",
      "content": "What is AresMUSH"
     }
   ],
   temperature=1.25,
   max_tokens=1024,
   top_p=1,
   frequency_penalty=0,
   presence_penalty=0
 )

 d = completion.choices
 print(d)

gives me the following result:

[<OpenAIObject at 0x113f32450> JSON: {
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "AresMUSH is a text-based roleplaying game engine that allows users to create and \
     run their own roleplay games online. It is built on the MUSH (Multi-User Shared \
     Hallucination) platform, which is a type of virtual world or multi-user dungeon. \
     AresMUSH provides features and tools for creating and managing game settings, \
     characters, and storylines, and allows multiple players to interact with each \
     other in real-time through text-based chat and roleplay. It provides a flexible \
     and customizable framework for creating a wide range of roleplaying settings \
     and experiences."
  },
  "finish_reason": "stop"
}]

It could be that the results @Faraday is reporting is because it’s an older model, something to do with how they set the variables for the question, or maybe there’s some other aspect to Bing’s implementation. I don’t really know.

Please note, I am not accusing anyone of lying. I’m just showing my results.

Faraday

@Sage said in AI Megathread:

It could be that the results @Faraday is reporting is because it’s an older model

I’ve seen this behavior just using the basic chatGPT website. It’s not just related to AresMUSH either - many other folks have observed the same thing. It’s even the basis for various lawsuits against OpenAI.

The point isn’t that everything ChatGPT generates is copied verbatim from some other website. The issues run deeper than that.

bored

@Pavel This is where I admit I’m not smart enough to give you a total answer, but it’s certainly more than just brand of card, because the non-reproducibility can be on the same machine across different sessions.

One of the home installs for Stable Diffusion has a sort of ‘setup test’ where they have you reproduce an image from their text settings: you should get the same picture as in the readme (it happens to be a famous anime girl, because naturally). And then it has some troubleshooting: if she looks wrong in this way, maybe you set this wrong. The images you get are never going to be totally unrelated, they’re similar, but they’re similar along a dimensionality that isn’t the way humans think, it’s similarity in the latent space which is how AI images are represented before they get translated to pixel format. So ‘similar’ might be the same character but in a different pose, or with an extra limb.

That difference feels egregious to a human viewer but might be mathematically quite close. This is why I chose to include that grid picture with my example because it demonstrates what small changes around fixed values can do. The ‘AI art process’ is a lot of this, looking for interesting seeds and then iteratively exploring around them with slight variations. The degree of fault on that xformers thing was generally within this range.

@tributary I’m really not here to argue philosophical things like ‘the value of art’ or what machine intelligence is compared to human intelligence (the ‘it doesn’t understand’ raised frequently in the thread). AI is a terrible term that we’ve just ended up stuck with for legacy reasons, but it tends to send these discussions off on tangents. My point was that many (maybe all) processes are replicable in theory, but the practicality is variable (taken out to the extremes, if we’re doing philosophical wanking, you get ‘does free will exist or are we just executing eons-old chemical reactions writ large?’, etc).

I’m not sure I’m totally convinced that the steps a person takes using AI software are inherently less valuable than the steps someone takes using Photoshop (and again, what if you use BOTH?). That doesn’t mean that I’m against ethical standards for the models, but I’m also not convinced that even the ‘careless’ ones like SD are really egregious ‘theft’ as they exist (finetune models are a different question altogether). They’re certainly no moreso than the work of patreon fan artists who sell work of copyrighted characters (and who are a LARGE part of the force advocating against AI).

Sage

@Faraday said in AI Megathread:

I’ve seen this behavior just using the basic chatGPT website. It’s not just related to AresMUSH either - many other folks have observed the same thing. It’s even the basis for various lawsuits against OpenAI.

I can’t really comment on any lawsuits since I don’t have all the necessary information. However, I will point out that just because a lawsuit exists doesn’t mean it’s very good. Judges tend to only throw out lawsuits when they are really, really bad. (N.B.: I’m also not saying that the basis for the lawsuit is bad. Just that the existence of a lawsuit is not a very good indicator of anything beyond a passing measure of merit).

I will also say that I am not maintaining that OpenAI has a right to just scrape up all this material and use it. That has to do with copyright law, though.

The point isn’t that everything ChatGPT generates is copied verbatim from some other website. The issues run deeper than that.

And I’m not trying to argue their are no issues with ChatGPT. I’m simply saying that I don’t think that the model, as described, necessarily falls into the classical definition of ‘plagiarism’. It does not look like it is really copying ideas from any particular source but instead is just sort of guessing how to string words together to answer a question, which is what any of us already do. It’s not really trying to pass off any information conveyed as ‘its own’ (though it certainly lacks attribution).

The fact that OpenAI is taking information from others and using it to profit (by using it to train the AI) does seem to be a problem, but then isn’t Google doing something similar? Of course Google does provide attribution through the link back and this provides a highly useful service to sites that allow Google to take their information, so the cases certainly aren’t identical.

I’m just saying I’m not sure that calling what it does ‘plagiarism’, at least in the classical sense, is the argument that should be made (N.B.: I am only talking about ChatGPT, not about any of the image generating routines, which probably work very differently from the LLM).

Pavel

@Sage said in AI Megathread:

I’m just saying I’m not sure that calling what it does ‘plagiarism’, at least in the classical sense, is the argument that should be made

And yet we are disagreeing with you and providing reasons. I don’t think either party is going to convince the other, and we shall continue talking in circles.

@bored said in AI Megathread:

does free will exist or are we just executing eons-old chemical reactions writ large?

On this I do have an answer I quite like: Maybe it does, maybe it doesn’t, but I need to eat either way.

Sage

@Pavel said in AI Megathread:

And yet we are disagreeing with you and providing reasons.

Well, no, you aren’t actually providing reasons. You are moving the argument from the vernacular to specific minority case usage where it now becomes correct, and there’s nothing really wrong with that, provided you supply that context when you make the initial statement.

Now, if you supplied a reason why you were correct in the common vernacular and I missed it, I apologize. If there were statements made to the effect of ‘under the academic definition of plagiarism, what ChatGPT does is plagiarism’ (and by this I mean as the opening statement, not a supporting statement in a follow up post), then again, I apologize.

However, I have not seen you provide a single reason why you are disagreeing with me when the term is used in the ‘classical sense’ (which is what you have implied by taking that specific quote of mine and then stating you have supplied reasons).

Please note, I do not mean for this to come across as hostile. I am simply trying to point out that what you are trying to imply, at least to the best of what I can see, is not correct.

Pavel

@Sage Technically, I said we are supplying reasons that we disagree, not that we’re correct. And while you may not intend your words to come across as hostile, they’re certainly marching towards arrogance whether intended or not.

I have, in fact, said that ChatCPT is incapable of plagiarism. It is incapable of thought entirely, much less the capacity for intent or making claims at all.

I would further argue, in fact, that plagiarism in the ‘classical sense’ is at least somewhat subjective. But if we are to use the definition you selected earlier, “to steal and pass off (the ideas or words of another) as one’s own” then yes, it does that too. Every “idea” it has is someone else’s. It doesn’t think. It merely reproduces, verbatim or in essence, others’ ideas. It cannot have ideas of its own.

ETA:

Further, I would say that an alternate Merriam-Webster definition suits the ‘classical sense’ better: present as new and original an idea or product derived from an existing source

Trashcan

@Sage
Bing is using GPT-4 and I can see in your snippet that you’re using GPT-3.5.

Going back to my original post: “plagiarism” got dragged in here because in “the common vernacular” it’s used as a thing society mostly agrees is unethical. In the common vernacular, it has two parts: theft of words/ideas, and claim of those words/ideas as own’s own.

People can (demonstrably) argue until they are blue in the face about whether the first part is happening.

It is not arguable that if you copy/paste from ChatGPT or some other LLM, and do not disclose that you have done so, you’re doing part #2, not because ChatGPT wrote it but because you didn’t. I’m not going to name and shame because that’s not what this post is about, but that is already happening in our community.

In a community fundamentally based on reading and writing with other people, it should be an easy ask for transparency if that is not what’s occurring. People deserve to know when they’re reading the output of an LLM and not their well-written friend Lem.

Sage

@Trashcan said in AI Megathread:

It is not arguable that if you copy/paste from ChatGPT or some other LLM, and do not disclose that you have done so, you’re doing part #2

Absolutely true, however, blaming ChatGPT for that is like blaming the hammer that someone swings.

I’m not saying I think people should be using ChatGPT. I’m not saying that what OpenAI has done to create it is ‘ok’. I’m just addressing what seems to me to be a piece of misinformation floating around, that ChatGPT itself does nothing but plagiarism (in the common sense of the word).

Pavel

@Sage said in AI Megathread:

that is like blaming the hammer that someone swings

Maybe. But to stretch the metaphor a little, ChatGPT is a warhammer rather than your bog standard tool hammer.

Whether intentionally designed that way or not, that’s how people are using it. And I think that matters a whole lot more than the exact definitions of words - I’m more a descriptivist anyway.

ETA: I’ll gladly step back from “ChatGPT plagiarises” verbiage if it can tell me where it gleaned whatever piece of information it’s currently telling me. Right now it’s incapable of actually citing its sources, even going so far as making them up.

Trashcan

@Sage
Bring me another hammer that people are smacking others in the head with and I’ll go off about that too.

Sage

@Trashcan https://www.fox61.com/video/news/local/police-officer-connecticut-middletown-attack-hammer/520-bb812ab0-cf5e-492f-bf99-cfbbc13807cb
https://www.cbs8.com/article/news/crime/man-attacked-with-hammer-city-heights-park/509-17be1fcf-3cc3-4d8d-90be-133503a3847f
https://abc7ny.com/tag/hammer-attack/
https://abc13.com/tag/hammer-attack/
https://abc7chicago.com/tag/hammer-attack/
https://www.lapdonline.org/newsroom/man-attacked-killed-with-hammer-r09336ah/
https://bronx.news12.com/exclusive-video-group-attacks-man-with-hammer-after-argument-on-brooklyn-bus-police-say

I should note that I am relatively sure these are all separate incidents and not multiple links to the same incident, although it is possible I accidentally duplicated one or two.

Pavel

ANYWAY

To what extent do you (general you) want players to tell you when they’re using generative virtual intelligence? If we were to put it on a 0-10 scale with 0 being ‘maybe used it like a name generator once’ with 10 being ‘literally every pose is written by ChatGPT’, where would you say it needs to be mentioned?

Faraday

@Sage said in AI Megathread:

Absolutely true, however, blaming ChatGPT for that is like blaming the hammer that someone swings.

YouTube is a site for sharing videos. Yes, sometimes people upload copyrighted stuff, but there are (admittedly imperfect) systems in place for dealing with it when that happens. More critically, YouTube recognizes that uploading stuff owned by someone else is a problem.

Napster was a site for sharing music. Its very nature abetted and encouraged music piracy. There was utter disregard for the rights of the musicians/studios. Uploading stuff owned by someone else was its core design feature.

Generative AI is far closer to Napster than YouTube. The flaw is not in how it’s used, like it’s a hammer that can be used for good or bad. The flaw is in how it’s built.

Trashcan

@Sage Hitting people in the head with hammers is wrong, as I’ve been saying. Those of you with hammer access, do not yield to temptation.

@Pavel
Ignoring any of the issues with how LLMs currently function for the sake of avoiding that aspect of the discussion:

I wouldn’t care much if it’s used for brainstorming and purely gathering prompts and ideas. I wouldn’t like but wouldn’t really care about their use in things like descs (no one reads them). Lore is similar, because I often expect that to be a group effort anyway, and it’s functional more than evocative. I would prefer to know.

Moving up the scale, anything where it’s being copy-pasted or heavily influencing the text of IC content that is meant to be engaged with either directly (e.g. poses in a scene) or emotionally (e.g. vignettes), I think disclosure becomes required and I would be genuinely upset to discover it absent some sort of disclosure.

Pavel

@Trashcan said in AI Megathread:

I wouldn’t care much if it’s used for brainstorming and purely gathering prompts and ideas

I’ve definitely used it for that, though mostly for characters that might be nice to play one day when I get that itch. Mostly a rough outline, the kind of thing you’d read on a casting call rather than anything with actual character.

Though in my real life, I mostly use it to assemble my scrawled class notes into something more comprehensible. So I fear I’m not the best use-case example for it.

Rinel

@Pavel said in AI Megathread:

ANYWAY

To what extent do you (general you) want players to tell you when they’re using generative virtual intelligence? If we were to put it on a 0-10 scale with 0 being ‘maybe used it like a name generator once’ with 10 being ‘literally every pose is written by ChatGPT’, where would you say it needs to be mentioned?

I don’t care if people use name generators. I want to know if anything more than that is used. And if people want me to say when I use name generators, sure, I’ll accede to that as part of this new paradigm (sometimes I use name generators to get coherent names; sometimes I go to a popular names table and roll some dice).

Faraday

Judge upholds copyright office rule that works generated by AI cannot be copyrighted, stressing “Human authorship is a bedrock requirement.”

Faraday

More news possibly of interest:

New York Times considers joining the slew of lawsuits against OpenAI over illicit use of their content

Also an interesting debate on fair use by a judge and copyright lawyer/expert. Notably, they point out:

The Supreme Court [in its 1985 decision in Harper & Row v The Nation] explained that harm to the rightsholder’s legitimate expectation of copyright revenues was the most significant factor in the fair use evaluation.

In a different Supreme Court case, the court decided based on copyright’s two fundamental objectives:

the enrichment of public knowledge and financial incentivisation to authors to create. Campbell essentially explains that the fair use zone lies in the circumstance where those two objectives are not at cross-purposes; the enrichment of public knowledge should not justify the fair use defense if it is accomplished by significant impairment of the rightsholder’s legitimate entitlement to profit from the distribution of the work.

Sage

@Faraday, firstly, this is in no way an attempt to say you are in any way incorrect. I’m responding to you simply to keep this as a linked thread.

The last sentence of the first paragraphed you liked is interesting to me;

An application for a work created with the help of AI can support a copyright claim if a human “selected or arranged” it in a “sufficiently creative way that the resulting work constitutes an original work of authorship,” [the copyright office] said.

Now ignoring for the moment what ‘selected or arranged’ and ‘sufficiently creative way’ means, it sounds like a very valid concern for people like screen writers isn’t the complete replacement of their job by AI but the use of AI to reduce the number of writers required to make a show. I think most of us agree that the technology is still miles away from being able to spit out a script by itself, but what about AI carrying enough of the initial load that the production companies are able to reduce their writers room from 15 people to 10? Likewise, it seems like reporters are at risk of having their job numbers reduced and their jobs transformed as they spend more of their time editing copy initially produced by AI.

Of course the counterpoint to that argument is that this always happens with technological advancement. Farriers were far more in demand 150 years ago than mechanics were.

As for the second article, this is more or less part of the issue I’ve been trying to consider. Assuming that they can get LLMs to stop copying large blocks of text, what is the harm to the rightholder’s expectation of copyright revenue?

I think at the end of the day what is really going to be required will be new laws that codify more precisely all expectations and limitations on how generative AIs are allowed to harvest and use information because they earlier existing laws will require too much work to make them fit well into the new framework.