Patterns in Open Source License Compliance
July 19, 2022•247 words
I have been investigating problems incorporating third-party sources into proprietary code bases. The goal is to help companies follow the rules when they work with open source.
I'm looking at snippets, not standalone packages tidily encapsulated in their own directories. This kind of copying is free range and messy. It's everywhere. Copy-pasta is just how developers do their jobs. They snarf code anywhere and everywhere.
That means compliance problems are super common. Managers mainly aren't aware - without tooling they don't have much visibility into which code came from where.
Developers need training in how to incorporate third party code, including Stack Overflow, open source repos with licenses, and sources without license statements. I imagine most of them just don't know how to handle these situations.
Developers need to start caring about compliance problems in third party code. If failures cost them and successes benefitted them they would do the work. CIs should run bots (like Github Actions) to recognize snippets from third party sources and flag it for examination by code reviewers. Reviewers need to know how to evaluate the reports.
Engineering management needs to drive these changes. They need to check whether job candidates know how to comply with copyright. They need to train developers in good practices. They need to ask for compliance checking in pull requests.
These aren't hard problems. They aren't high tech. However they are very large scale: adoption of new practices across the software industry is not a small project.