Not a problem because it's possible to check if the code is verbatim from the tr...

AlotOfReading · on June 30, 2021

It's not clear to me that verbatim would be the only issue. It might produce lines that are similar, but not identical.

The underlying question is whether the output is a derivative work of the training set? Sidestepping similar issues is why GCC and LLVM have compiler exemptions in their respective licenses.

visarga · on June 30, 2021

If simple snippet similarity is enough to trigger the GPL copyright defense I think it goes too far. Seems like GPL has become an obstacle to invention. I learned to run away when I see it.

AlotOfReading · on June 30, 2021

It's not limited to similar or identical code. The issue applies to anything 'derived' from copyrighted code. The issue is simply most visible with similar or identical code.

If you have code from an independent origin, this issue doesn't apply. That's how clean room designs bypass copyright. Similarly if the upstream code waives its copyright in certain types of derived works (compiler/runtime exemptions), it doesn't apply.

klipt · on June 30, 2021

So if you work on an open source project and learn some techniques from it, and then in your day job you use a similar technique, is that a copyright violation?

Basically does reading GPL code pollute your brain and make it impossible to work for pay later?

If so you should only ever read BSD code, not GPL.

throwawayboise · on June 30, 2021

> Basically does reading GPL code pollute your brain and make it impossible to work for pay later?

It seems to me that some people believe it does. Some of the "clean room" projects specifically instructed developers to not even look at GPL code. Specific examples not at hand.

visarga · on July 1, 2021

I start seeing Ballmer's point of view. It's like cancer.

astrange · on June 30, 2021

Microsoft appears to believe this (or maybe just MacBU) because I've met employees who tell me they're not allowed to read any public code including Stack Overflow answers.

the_gipsy · on June 30, 2021

This has nothing to do with GPL. Copyright is copyright. You can’t even count on public domain everywhere in the world.

radmuzom · on June 30, 2021

If that's the case then GPL code should not have been used in the training set. Open AI should have learned to run away when they saw it. The GPL is purposely designed to protect user freedom (it does not care about any special developer freedom) which is it's biggest advantage.

woah · on June 30, 2021

Don't come in here with your common sense