On 20 March the Rust Project put out this blog post, discussing their regular user survey in which they asked Rust users what they like about Rust, what annoys them about Rust, and generally to keep in touch with the feelings of Rust users to guide the project into the future.

Before long though, some readers were distracted from the findings and intended message wondering if the post had been generated by an LLM. Some said they quickly got a feeling that the post had been LLM generated.

In fact the post had initially been drafted as LLM generated text. The author had doctored it up heavily, taken authorship responsibility over it, and published it. Once the LLM use became known reactions became harsh. Some readers were loudly upset. Other members of the Rust Project team advocated for a retraction, and the author acquiesced, although at the time of this writing the post is still up.

The negative reaction connected with some common sentiments:

When given LLM generated text to read, readers may feel offended that the sender expects them to spend time reading something the sender wasn’t willing to take the time and trouble to write themselves.
When readers aren’t informed that what they’ve read was machine generated, they may feel a betrayal of trust.
LLM use in education has raised concerns about plagiarism and students not contributing their own thoughts to assignments.
LLM use in scientific publication has raised questions about how it impacts originality and credit for authorship.
What is the difference between creation and generation?

That collision of ideas made me wonder:

If you generate text with an LLM, how much do you have to change it for it to become your own creation?

The authorship problem

In scientific writing authorship can be a fraught subject. Working relationships, collaborations, and friendships have ended over disagreements about whether someone’s contributions to a paper was sufficient to be granted authorship credit.

I’ve heard people suggest you might subject the portions of a paper generated by an LLM to the same standards as a person, and ask what to do if you think the machine should be credited as an author. I not heard the inverse: if someone generates text using an LLM, what and how much does the person need to do for the text to become their own work?

How much plagiarism is plagiarism?

Copy and paste is obviously plagiarism. Swapping out certain words so the text isn’t quite identical anymore is still plagiarism but less extreme. Is rewriting individual sentences without changing their structure or meaning still plagiarism? What about individual paragraphs?

I think for a work to not be plagiarism it must reflect the author’s own thought process throughout --- their own stream of consciousness. Under that thinking, using an LLM to generate a starting point destroys any chance that the final work could ever fully reflect the author’s own thought process everywhere: no matter how much you change the flesh, if the skeleton is still the LLM’s, it’s still not your own.

Rewriting individual sentences, or even whole individual paragraphs, wouldn’t be enough. The author’s stream of consciousness still wouldn’t be captured. I think it would take a complete restructuring and rewriting to make the work truly reflect the author’s own thinking. Anything less would leave the ghost of the LLM generation within, consciously or subconsciously shaping the editor’s thinking along what was generated rather than flowing from their own stream of consciousness unaltered.

No LLM can generate a stream of consciousness. LLMs are not conscious.

No amount of editing is enough to become creation.

Responsibility isn’t the same

Being responsible for something is different than it being your own work. Lots of people take responsibility for things they didn’t personally create.

From the beginning, the author of the Rust Project blog post took full responsibility for it and always acted in good faith. They were hurt by the response, which at times was sharp, because they cared. But the readers also felt hurt, because they felt deceived.

I always prefer to have someone disclose when they used an LLM to generate anything. That’s certainly better than concealing it and readers figuring out that you did because your LLM was leaking.

Conclusion

I think there are some definite lessons that can be taken from this episode.

If you use an LLM, don’t try to hide it. Take responsibility for and be clear about what you did and how you did it. Concealing the fact that you used an LLM for something is simply deceptive.

Be honest with yourself about what you created and what the machine generated. LLMs are not producing a stream of conscious thought, they aren’t conscious. Using one to generate a foundation means building on a foundation you didn’t create. Passing off machine generated text as your own creation is deceiving yourself too.

Save travels, Anna. You are forever missed.

Can editing create

The authorship problem

How much plagiarism is plagiarism?

Responsibility isn’t the same

Conclusion