@Starver, I am not talking about AI using sampling, I am talking about your AI product being trained built on copyrighted material in the first place. If you build your product on copyrighted information that is a problem.
Again, it may be "a problem", but that's not how copyright law works.
Indeed. Or else nobody should be allowed to be creative in any way whatsoever unless they were a lifelong hermit. No "on the shoulders of giants", or anything like that.
What we're doing is putting AIs through a School Of Life (maybe with overtones of a degree level study of <Foo> Appreciation with more emphasis on course materials than any professorial opinions being imparted.
If ChatGPT effectively read Reddit (or, rather, web-pages that Reddit users mentioned) to build up its LLM of the world, then it 'sampling' conceptual information from many people who might consider (indeed, websites often do assert) their literal output as copyright, but the point is that it's much the same as ChatGPT being like a user who reports that they think that they recall once having heard something (often quite distorted, and at least sometimes plain wrong due to no contextual understanding beyond what words often sit near to what other words, albeit cleverly so), rather than straight out going out and pasting what some source says and claiming it as its own 'thoughts' (or doing what happened with the Shetland Times and Shetland News, and probably still happens plenty today). It is so unable to directly cite sources that, if effectively
asked to provide citations, it constructs something that looks sufficiently citation-like but isn't actually a practical one at all. (If such a chatbot were additionally required to provide its true sources for everything it spewed out, then it'd be hard to do less than narrow it down to thousands of 'sources' for how it constructed a hundred-word output, and much of that would be more to do with why it did/did not make use of the Oxford Comma or go with its choice of "isn't"/"is not". The fact there is expected to be board-cordinates of a certain form in a chess-question's answer is nothing that can be claimed to be an Intellectual Property, and much of the rest of the output is just a glorified Markovian chain that reflects statistically what words should be returned given any particular query.
A 'popBot' might similarly have the experience (compressed) of having heard every week's Top 40 blare out of the radio for a number of decades, which does not in itself pose a copyright issue. And it isn't using didetic recall/replay of any of those songs to perform any actual identifiable non-original works. (The "Liam Gallagher" voice in the 'AIsis' song is a separate issue, in the lines of DeepFake, but of dubious prior coverage when it comes to performance rights.)
All of which is to say that there may be issues (like with on-demand re-release of classic TV/radio content, the available sources might or might not be effectively licenced/denied for use in a situation which wasn't even considered by anyone, decades ago, in ways that technically may need untangling/renegotiating) with exactly how the corpus was 'fed', but we can't just assume that it was an illegal torrent-dump or bootlegging operation. From then on, is the dissassembly and reassembly into a new product really something that would have a George Harrison/Chiffons case to answer? Not as far as the AI is concerned, and its 'parents' may be able to successfully argue not. If only because it would shut out much non-'copying' technical processes. But these are the things that lawyers may be making money (and/or reputations) over, as times pass. At least until there are fully-accredited AI lawyers!