Last month, we explored why licenses are important because of how they protect open source creators' creativity and innovation through the codification of freedoms. Open source is a complex system, and its licenses reflect long- and hard-fought battles to preserve the ecosystem’s ability to thrive in the face of competing interests. A key point of contention was whether developer attention (and developer perception of importance) could generalize to the significance of open source licenses.
Catch up on previous entries in the Influential Articles series.
While licenses save open source creators from a ton of headaches around contracts, liability, IP agreements, and more, developers are often called upon to select which open source projects to use—and consequently which licenses they find acceptable. We have to ask ourselves, then, how well-equipped are developers to make judgement calls on legal documents?
Do Software Developers Understand Open Source licenses? by Almeida, Daniel A. and Murphy, Gail C. and Wilson, Greg and Hoye, Mike
In 2017, Almeida, Murphy, Wilson, and Hoye published a study on developers' understanding of open source licenses (pdf, archive) in the Proceedings of the International Conference on Program Comprehension (ICPC). They chose the GPL 3.0, LGPL 3.0, and MPL 2.0 as proxies for the breadth of open source software licenses that may be applied to open source projects. Using these licenses, the authors crafted different scenarios to gauge lay interpretation of how the different licenses apply and compared developers' answers against those of an intellectual property legal expert.
For those not intimately familiar with the ins and outs of open source licenses, the GPL, LGPL, and MPL differ in several significant ways, which I’ll summarize here (though my standard “I am not a lawyer” disclaimer applies):
May: 🆗 May not: ❌ Must: ✅ Not specified: 😶
|Sublicense||Use trademark||State changes||Link with dissimilarly-licensed software||Full text|
|GPL||❌||😶||✅||❌||GPL 3.0 archive|
|LGPL||❌||😶||✅||🆗||LGPL 3.0 archive|
|MPL||🆗||❌||😶||😶||MPL 2.0 archive|
Other differences between the licenses exist, but these are the ones that are germane to the research.
The authors initial set of scenarios evaluate developers' judgement of a dependency along the following lines of inquiry:
- license of dependency
- distribution of software developed using dependency
- licensing of software developed using dependency
- sale of proprietary software developed using dependency
Subsequent scenarios address developers' interpretation of licenses based on these factors:
- modifications to existing project
- sale of proprietary software using modifications to existing project
- changes of license to existing project
Developers were asked to evaluate each scenario where the license of the dependency and the license of the software differed, which yields a nice matrix of responses.
“Shaoqing believes there are unhappy users out there willing to pay for a premium email client. To get to market faster, she decided to use an open source implementation of the Simple Mail Transfer Protocol (SMTP). If the SMTP implementation is released under the GNU GPL 3.0, would Shaoqing be allowed to fork the SMTP project and change the fork’s license to the the following licenses in order to use it in her commercial e-mail client?” (link, archive)
For each scenario, developers answered the question based on the three licenses in question with three possible answers.
Sample response grid:
Yes No Unsure GNU GPL 3.0 GNU LGPL 3.0 Mozilla Public License 2.0
Additionally, respondents had the opportunity to provide context and reasoning around their answers in free-form text boxes if they answered that they were unsure.
Based on the research analysis, developers do rather well when faced with questions involving only one license. They understood the nuances of the various license terms and their applicability to different scenarios. In fact, based on the researchers' established threshold, participants correctly judged questions of licensing approximately 62% of the time.
“This rate of matching the legal expert’s opinion is encouraging as it suggests that participants understood many aspects of the open source licenses used in the scenarios. Participants also matched the opinion of the legal expert whenever only one open source license is in use in the scenario (e.g., S2-GPL-GPL or S7- MPL-MPL, etc.)” (page 5)
But, when there were a mixture of licenses involved, developers were less frequently correct and more frequently answered that they were unsure of the correct answer. For “unsure” answers, apart from general lack of certainty about the scenario, the most common themes that emerged were concerns about license interactions, questions about the scenario, and lack of familiarity with relicensing or dual licensing. These themes were identified by analyzing comments that the participants entered in the free-form text boxes.
Unsure answers that didn’t fall into those themes still yielded interesting insights.
“This uncertainty often had to do with details related to the licenses, such as ‘don’t know if GPL allows it’ and ‘don’t know to which point GPL is viral’” (page 8)
Single-license scenarios are rarely what developers encounter when building large software, and different factors determine when different requirements of various licenses go into effect. Still, the results are promising in showing that developers do seem aware of when they lack the knowledge to make a sound judgement.
An aside on methodology
Good research doesn’t get enough love. Without going too over-the-top with it, I want to point out aspects of this paper that I really, really appreciate.
We have all the sections to understand the full context of the research:
- details on the survey (including a link to the full text!)
- distribution and recruitment mechanisms
- completion rate (an impressive 45%)
- demographic information of respondents
- assumptions and considerations
The authors break down their analysis into quantitative and qualitative sections, which gives the reader (me, and hopefully you) good cues of how to process the information. We get timely information about how results are coded, and helpful charts and tables. Observations are presented along the way and restated in summary, which helps with reader comprehension.
Finally, there’s a “threats to validity section” which is simply excellent. While the 45% completion rate seems high to me, the authors acknowledge that those who did not complete it may have had less confidence in their ability to judge the scenarios, making the results of the research more representative of license nerds than of the average developer. The section also recognize various types of bias that may have impacted the results.
I have more trust in the research and my ability to evaluate it because of these details, so major thanks to the authors.
TL;DR your company needs an open source lawyer
This study was conducted in 2017, a whole 5 years ago! There are over 100 open source licenses (archive) that developers may encounter when using open source software. And with software becoming ever-more complex, developers should not be expected to make these judgement calls that have legal ramifications. As the summary states:
“Open source software is not a self-contained world with a specific set of developers involved and a small set of open source licenses with well-defined interactions. Many closed-source, commercially-oriented software projects rely on open source software. Many open source licenses exist with different ramifications depending on how the software with different licenses interact (i.e., via dynamic linking, copying of source code, etc.).” (page 10)
Open source licensing is an incredibly complex topic. Going back to last month’s article, no, developers should not have to care about the ins and outs of licenses and license enforcement…but being aware of the general parameters is a great complement to the expertise of a lawyer with experience in open source.
Which you should have.