|
Dynamic linking and derivative works
ByMichael Kerrisk April 24, 2013
Armijn Hemel is an engineer in the Netherlands who was for several
years actively
involved with GPL enforcement at gpl-violations.org. In 2012, he retired
from his core role there, and now concentrates on his own consultancy
business that specializes in licensing compliance issues. In a talk at the
2013 Free Software Legal and Licensing Workshop, he focused on his research
into dynamically linked ELF binaries. The aim of that talk was to present
the results of some of his experimental investigations of the dynamically
linked binaries that are shipped in (mainly) embedded systems and kickstart
some discussion on the implications of dynamic linking for the creation of
derivative works under the GPL license. In passing, he also spent some time
discussing a tool that he uses to automate the kinds of binary analysis
that he presented.
Investigating binaries
Armijn briefly explained ELF (Executable and Linking Format) and the
advantages of dynamic linking. He then provided a simplified summary of
the operation of dynamic linking for his lawyer audience, noting that, at
build time, two pieces of information are recorded in a program
binary: the symbols (functions, variables) that the program needs from each
library, and the list of libraries that the program needs at run time (the
"dependency list"). At run time, the undefined symbols are then
resolved by the dynamic linker, which uses the dependency list to find the
libraries that may have the needed symbols.
The key point is that the build-time and run-time environments may be
different, Armijn said. This is important because, in his opinion, dynamic
linking moves questions about derivative works and the application of the
GPL license into run time, because it is only at run time that libraries
are linked with a program. The dynamically linked libraries that are used
at run time could indeed be different—and have different
licenses—from the libraries that were specified during the static
linking phase. This implies that (depending who you ask) declaring the
wrong dependencies in a binary could trigger license compliance issues.
For his examples, Armijn showed some results of examining the firmware
for one embedded device, the Trendnet TEW-636APB wireless access
point. (The device and firmware are now a few years old, but serve to
illustrate the points well.) The device came with source code, making it
possible to verify his findings against the build scripts and code. The example program used for the analysis was the
dynamically linked busybox program provided in that firmware,
"because BusyBox is everyone's favorite whipping boy".
Armijn pointed out that if one scans all of the binaries in a
filesystem to obtain the list of undefined and defined symbols in each
binary, it is then (in principle) easy to perform manual symbol
resolution. For any particular binary, one can do this by first using the
command readelf -W --dyn-syms (coupled with suitable
grep commands) to obtain (1) the list of undefined symbols from
the binary and (2) the list of symbols defined by each of the declared
dependencies of that binary. One can then match the undefined symbols from
the binary with the defined symbols in the declared dependencies. As a
result of that process, one can determine any leftover symbols that
could not
be resolved and any superfluous dependencies that were declared by the binary.
Armijn then showed an example of the results he found via this process
when examining the busybox binary in the Trendnet TEW-636APB
firmware. The declared dependencies of the busybox binary
were as follows:
$ readelf -a busybox | grep NEEDED
0x00000001 (NEEDED) Shared library: [libutility.so]
0x00000001 (NEEDED) Shared library: [libnvram.so]
0x00000001 (NEEDED) Shared library: [libapcfg.so]
0x00000001 (NEEDED) Shared library: [libaplog.so]
0x00000001 (NEEDED) Shared library: [libcrypt.so.0]
0x00000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED) Shared library: [libc.so.0]
Many developers believe that the existence of such a dynamic dependency
list in a binary is sufficient to indicate that the binary is a derivative
work, Armijn said. However, he doesn't believe that argument holds
any weight.
The first four busybox dependencies are interesting, Armijn said. Having
looked at thousands of firmware blobs, he did not recognize the
filenames, suggesting that they are proprietary modules. Furthermore, those
modules have no corresponding source code in the source code release, and
since the GPL-licensed busybox links to those four shared objects,
that would imply that there is a license violation.
"Or maybe not." Applying the process described
above to the binaries showed that the libnvram.so and
libaplog.so binaries are not used by busybox, although
they are needed to satisfy the dynamic linker. On the other hand,
busybox did use libutility.so and
libapcfg.so, so that, in Armijn's opinion there was still a
license violation.
Of course, libnvram.so and libaplog.so might still be
used by other libraries that busybox depends upon. And, in fact,
libnvram.so is used by libapcfg.so, so that that the
combination of busybox and libnvram.so might also be a
derived work of busybox. The diagram to the right shows the
complete set of dependencies for the busybox binary. In the
diagram, the black arrows represent dependencies between modules that are
both declared and used to resolve symbols, while the dashed blue arrows
represent dependencies that are declared but unused (i.e., unneeded for the
purpose of symbol resolution).
Looking at the source code release showed the line in a makefile that created the
superfluous dependencies for busybox:
LIBRARIES += -lutility -lnvram -lapcfg -laplog
The situation can be still more complicated, Armijn noted. Sometimes
there are hidden dependencies—dependencies that are used to satisfy
symbol resolution but not declared. In the busybox example, he
discovered a number of such hidden dependencies. For example, the
libapcfg.so binary depends on libutility.so and
libaplog.so, but the binary does not declare those
dependencies. (Shared libraries can have dependency lists in the same way
that programs do.) The only reason that the program works is that the
declared dependencies of the busybox binary ensure that the hidden
dependencies of libapcfg.so (and some other binaries) are all
satisfied.
Armijn pointed out further oddities that he has observed in the
wild. He has seen examples of two libraries that offer the same symbols
with both libraries being declared as dependencies—in one case, this
was because the code of a smaller library was completely embedded inside a
separate, larger library. He has also seen cases where libraries do not
implement all the symbols that are required by the main program; the result
is that the program may successfully run or it may crash, depending on
whether a particular symbol needs to be (lazily) resolved at run time.
Binary Analysis Tool
Manually doing this sort of analysis using readelf is
possible, but time consuming. For that reason, Armijn makes use of a suite
of tools that greatly ease the task. That suite, which goes under the
overall moniker Binary
Analysis Tool (BAT), automates the inspection of filesystem images to
ease the task of discovering binaries that are linked against GPLed
binaries and issues such as superfluous dynamic dependencies.
The potential user base of BAT is varied, including groups performing
GPL enforcement (as Armijn has done in the past), consultants who work with
companies to help ensure that they comply with licensing obligations (as
he does now), and companies who want to ensure that their products are
license-compliant. The difficulty of the last task becomes clear when one
realizes that those companies may have no easy way of checking the
compliance of binaries delivered to them by third parties.
BAT consists of a set of Python scripts that are licensed under the
Apache License v2. Since a common use case for BAT is to examine firmware
blobs, the tool includes hooks that attempt to automatically decompress
such blobs using a range of standard compression tools. A configuration
file allows the user to specify further user-defined hooks for breaking
apart blobs. The tool performs its inspection using various techniques,
including string matching and symbol table matches (rather than
disassembly, which is a legally gray area in some jurisdictions); the main
algorithm is described in this article
[PDF].
There are two primary tools in the BAT suite, bruteforce.py
and batgui. bruteforce.py is a command-line tool that
performs the binary analysis. bruteforce.py takes two inputs: a
configuration file (a default configuration file is provided with the
installation), and a filesystem image. It produces two outputs, a results
file that is used by batgui and an XML file produced on standard
output. The XML file can be used by some reporting tools. In addition, the
script produces various output files in a directory specified in the
configuration file.
The results file can be viewed using the batgui script. The
analysis information is displayed in two panes. On the left is a
hierarchical tree structure representing the analyzed filesystem. On the
right, information is displayed for the currently selected file from the
left-hand pane. The right-hand pane has a number of tabs, one of which
("ELF analysis") shows the declared and unused dependencies when a binary
is selected.
The tool is somewhat rough around the edges, but usable for its stated
purpose; it is under active development, principally by Armijn, although
outside contributions
of various kinds are welcomed.
There are several near-term goals for improving the tool,
including detecting unfulfilled dependencies, detecting duplicate
dependencies (where a binary has dependencies on two different libraries
that provide the same symbols), and supporting other languages (such as
Java). The bruteforce.py script already generates dependency
graphs of the form shown in the first part of this article. Those graphs
are placed in the subdirectory specified in the
configuration file; the graphs are not (yet) displayed by batgui,
but supporting that feature is planned.
Derivative work: yes or no?
Having looked at a lot of firmware blobs by now, Armijn has discovered the
superfluous-dependency issue (or hidden dependencies) in nearly all of
them. So, his question to the audience was: does dynamic
linking always make a derivative work? Is the act of dynamic
linking—even if the dependency satisfies no
symbols—enough, on its own, to constitute a derivative work? After all,
he noted, if the dependency is not satisfied at run time, the program will
not even start.
The Free Software Foundation GPLv2 FAQ is
unequivocal that all forms of linking against a GPL-licensed binary
create a derivative work, so that the terms of the GPL apply to all of the
linked modules; many developers agree with this interpretation.
However, Armijn pointed out that there is considerable debate on
whether dynamic linking creates a derivative work, especially among
lawyers. For example, Lawrence Rosen even goes so far as to argue that no kind
of linking (dynamic or static) creates a derivative
work per se, as defined by the GPL.『And to be honest, I think he
has a fair
point.』
A wide range of opinions came back from the Workshop audience.
Claus-Peter Wiedemann was of the opinion that static or dynamic linking is
just one indicator for a derived work. Other questions need to be answered
as well. Is there a functional dependency between the modules? What data is
exchanged between the two modules? Can the library be exchanged with
another one? Is the interface standard or proprietary? In response, Armijn
asked the following rhetorical question: "So how does one get this message out
to developers, many of whom consider that if they run ldd on a
binary, then the revealed dependencies are sufficient to prove that this is
a derivative work?"
Daniel German suggested that Armijn was looking too hard for a problem. The
superfluous dependency problem seems to be just an unintentional error
(rather than the intentional creation of a derivative work); people could
just fix it, and then everyone is happy.
James Bottomley pointed out what he considered a key difference between
dynamic and static linking, which hinges on who creates the derivative
work. With static linking it is the distributor who creates the derivative
work. With dynamic linking, it is the end user who (at run time) creates
the derivative work. Thus, in his opinion, other legal arguments would need
to be brought forward to claim that the distributor is distributing a
derivative work.
In response to James, another audience member countered that if one
distributes a complete package that includes the shared libraries, it's
hard to argue that the result is not the distribution of a derivative work,
even if the linking is done dynamically. On the other hand, if one supplies
just a piece of code that is then compiled and built by the user, then it
may be more legally viable to claim that a derivative work has not
been created. Till Jaeger made a similar point, noting that the decision
about whether or not a derivative work has been created can't be based on
technical factors alone, but rather on factors such as the relationship
between pieces of code. "Suppose I add an interface to a GPL-licensed
program to implement new functionality in a shared library; then of course
it is a derivative work [regardless of the type of linking]."
Readers interested in further legal perspectives about dynamic linking
and derivative works can also consult Chapter 6 [PDF]
of Lawrence Rosen's book, Open Source Licensing, as well as this
viewpoint [PDF] from another lawyer, Andrew Katz. A discussion of the
GPL and derivative works can be found in this
article from the University of Washington. As noted in the IFOSSLR article Software
Interactions and the GNU General Public License and the associated
Working Paper
on the legal implications of linking [ODT] by the FSF Europe's
Software Interactions working group, even if it is argued that dynamic linking does not
create a derivative work, other arguments can come into play, such as
interdependency and "collectivity"—co-distribution of a program and
the GPLed library against which it dynamically links.
Concluding remarks
Armijn's presentation was thought provoking, especially insofar as it
raised the subtleties around dynamic linking and derivative works. Although
the Free Software Foundation is adamant that linking of any kind creates a
derivative work, in the end it is the courts that will decide that
point. It is interesting that as one gets closer to the legal system (i.e.,
when one talks to lawyers), opinion about whether (dynamic) linking creates
a derivative work is rather less clear cut. That is not to say that
dynamically linked binaries might one day be legally interpreted as not
being derivative works; rather, the decision about what constitutes a
derivative work is likely to be argued and decided on much more than single
technical factors such as how two pieces of code are linked together.
(Log in to post comments)
At least be creative enough to debate what it means to agree to the license terms.
Actually, what it means to agree to the license terms is not interesting, since agreeing to those terms has no legal significance. There's no contract involved. The copier meets the conditions and thereby has a license to copy, prepare a derivative work, etc. Or he doesn't and he doesn't. There is no subsequent obligation such as when someone agrees to pay interest on a loan and then he has to do it.
Now debating what it means to comply with the conditions of the license - that would be worth debating.
For example, you can get a piece of Titanic, translate it into Klingon and use it in a classroom (fair use).
If your point is that there are exceptions to the restriction in US law on preparing derivative works, I agree. So my blanket statement that you need permission was incorrect (oversimplified, actually).
It's worth noting that the exceptions are not unique to derivative works. They apply to public performances and plain old copies as well.
Perhaps you would like to provide a shred of an argument for this? Otherwise, no one reading this thread, including me, is going to believe you, because the thread contains multiple shreds of argument that the opposite is true.
I wonder if you're confusing copyright in general with GPL, because GPL gives everyone permission to create derivative works with no strings attached as long as they don't distribute the result.
In fact, there's another shred of evidence for the proposition that you need permission to create a derivative work. GPL is full of clauses explicitly giving people permission to modify the program. That strongly suggests the copyright lawyers who wrote it believed permission was required.
|