Why Bundled Libraries are a problem
These are some of the reasons why it's painful for us to have bundled libraries in the distribution.
- When a security flaw is discovered in a library and bundling is not allowed, The library can be fixed in a single package, that package rebuilt, and when users download it, all the applications that use it are immediately protected. When bundling is allowed, the distribution has to find all the packages that the library occurs in by auditing source code or running a special tool over all elf files in all packages, then all of those packages have to be fixed, all of those packages have to be built, and users have to download and update each of the ones that they are using on their system before they are protected. There is much more work when bundled libraries are involved.
- With security issues, people want to remove as much lag as they can between announcement of a problem and the fix being available for users. When libraries are unbundled, tools like vendor-sec can be used to alert distributions of problems that need patching in their packages before the announcement is made and then they can fix them with zero days of vulnerability. If bundling of libraries occurs, then the problem becomes how to get fixes out to all affected packages. If the distribution patches those packages, they must be careful to not leak the fact that there is a security vulnerability before they are allowed (which means they need to be careful who they share the information and what information they share with others). OTOH, if they do not patch the packages bundling libraries, then those packages are not protected on zero day, but only afterwards.
- When a security flaw appears, the program has to either update to a non-affected version of the library or backport a fix. This can be problematic when the code of the library has undergone many API and code changes since the version that is being bundled and the security fixing patch is very widespread. Many conflicts can arise that need time to fix when trying to backport the fixes but porting the application code to the new API version can also take a lot of time.
- We cannot implicitly trust an upstream application to be on top of security issues that are released in the packages that they care about. What happens if you are not following upstream development and don't know that a security release has been made? What happens if the developer that is responsible for watching bundled upstream development goes on vacation or quits your project? What happens if your application ceases active development? What happens if upstream stops active development and security fixes start originating from distro patches?
Forking is occurring. Once an application starts bundling libraries, it's easy for the project to include local patches to the library to add features that upstream doesn't have or fix bugs that upstream hasn't addressed. This has several negative effects.
- When a security issue appears, it becomes harder to fix the application bundling the library. If you attempt to upgrade to a newer version, you have to make sure your important local modifications get ported to the new version. If you attempt to backport, you have to merge the upstream fix to your own code-base which may have conflicts with the local modifications.
- When working with the library that comes from upstream, there is a community of people who are interested in that library to fall back on for help. When working on your own private copy that community may not be interested in helping you work on your modified sources since they don't have control or knowledge of what your modified sources do.
- Forking dilutes one of the strengths of open-source development. Instead of a project getting stronger with more people supplying patches to help drive the project and build a bigger community, the community of people interested in it are splintering, developing more and more divergent code-bases, solving the same problem over and over in different ways in different private copies of the library. Instead of everyone benefiting, everyone has to pay.
Bugfixes are usually of lesser importance from security issues but share the same issues of hanging onto lingering problems that have been fixed in the main package.
- Old versions of code linger on. If the application can bundle its own version of a library, the incentive to port to newer versions of the library are reduced. This exacerbates the problems of security and bugfix issues. Instead of progressively porting to newer versions of a library as time goes on, porting to newer versions becomes a chore that has to be performed at the same time as addressing a security flaw. This puts time pressure on the project when the work could have been spread out over a longer period if only the porting had been done all along.
Although licensing issues can crop up in any project, projects which bundle code from different sources together are a special source of concern. They make auditing for license issues a larger project.
Treatment of Bundled Libraries
Here are the steps to be followed when a package bundles a library but can be built against the system version:
- Bundled libraries (and/or their source code) must be explicitly deleted during %prep. Build scripts may need to be patched to deal with this situation. Whenever possible, the patching should be done in a way to conditionalize use of the bundled libraries, so that it can be sent upstream for consideration.
- It is not necessary to remove bundled libraries from the source tarball unless there is a legal reason to do so
Requirement if you bundle
- If you bundle a library, you are required to add a virtual provide to your spec file to note that you are bundling. This allows us to search for packages that may be affected by bugs or security issues in older versions of the library. The notation should look like this:
Provides: bundled(zlib) = 1.1.14
bundled() denotes that this is a bundled library virtual provide rather than something that other packages would want to depend on. Inside the paranthesis, the binary package that provides the library is listed. (For instance,
libpng). The version notes which version of the library was bundled. If there's been a lot of incomplete backporting of changes from newer versions of the library, it can be hard to establish what version to use here. A very general rule of thumb is to use the oldest version that seems reasonable as the reason we're doing this is to tell when a library contains issues that have been fixed in newer upstream versions.
A list of known virtual provides for bundled libraries is maintained on a separate page.
As this is a place where we have to convince upstream that there's a problem, it's good to be able to point out that this is a problem for all distributions, not just Fedora. Here's links to other distribution's policies::
- Debian -- http://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles
- Talk given at pycon with a large section on not bundling libraries http://pycon.blip.tv/file/2072580/