May 23, 2022•522 words
Following up on my blog entry Systemic Improvements to License Preambles , I found that many of my ideas are addressed already in the REUSE guidelines and the SPDX documentation on short identifiers. I was familiar with the SPDX work already, but I didn't focus on it.
On thinking and reading more - this time with explicit awareness of the prior art - some interesting questions remained.
How to contact the developers or copyright holders when there is a need for extended permissions? For example, commercial use of code with a non-commercial license. Right now the email provided with the copyright statement is the default method.
How can I set or get a complete list of copyright holders? The copyright statement in the header only applies to the creator of the project. Each subsequent contributor has their own copyright. Who are they? Were they acting in a personal capacity or did their employer own the rights? If their employer owned the rights, who was it? How can a contributor supply both their own name and their employer? Right now the best approach would be to mine git history for committer identities.
Does the file mix code from different sources, with different copyright holders and license selections? This is particularly relevant to snippets pasted from sources like StackOverflow. There should be a way to mark the beginnings and endings, source, copyright holders, and permissions. Snippets are probably the biggest source of complexity. There should exist a way to verify that copied code is coming from a safe source.
Where is the repo or canonical source on the internet for this file? It might have a tag to identify the source. If you look at the human-visible footer for my transcription of this folk song, you'll notice "Original Musescore file at ...github URL."
How can this file be identified uniquely? How can I get an identifier that persists across modifications, copies, forks, and hosts? A simple solution is a random number embedded in the file at the time of creation.
When I change the license of third-party code, should I record and publish the licensing history? This would apply when the original license permits relicensing. For example, MIT can be converted to Apache 2.0, as far as I understand.
When is it ok to modify the syntax (but not semantics) of license statements (in whatever format)? Can you convert a long license statement such as the classic GPL 1.0 blurb to the equivalent shortform SPDX identifier? Can an SBOM tool that attempts to identify the license on each file save its conclusions into the file? Is there a way to mark such output as tentative?
Should the header be "Copyright", “copyright”, "(c)", "©" or nothing? A single canonical answer is the best path. Ambiguity is bad. A simple solution is a messaging campaign to standardize on just one marker.
Is there a machine-friendly way to mark unlicensable content like generated files? A simple solution is a new license blurb and short-form identifier saying something like "This file contains uncopyrightable information."