From Fedora Project Wiki

Revision as of 11:06, 15 April 2016 by Rathann (talk | contribs) (→‎Security: s/boost/upstream/)

Why Bundled Libraries are a problem

These are some of the reasons why it's painful for us to have bundled libraries in the distribution.

Security

  • When a security flaw is discovered in a library and bundling is not allowed, The library can be fixed in a single package, that package rebuilt, and when users download it, all the applications that use it are immediately protected. When bundling is allowed, the distribution has to find all the packages that the library occurs in by auditing source code or running a special tool over all elf files in all packages, then all of those packages have to be fixed, all of those packages have to be built, and users have to download and update each of the ones that they are using on their system before they are protected. There is much more work when bundled libraries are involved.
  • With security issues, people want to remove as much lag as they can between announcement of a problem and the fix being available for users. When libraries are unbundled, tools like vendor-sec can be used to alert distributions of problems that need patching in their packages before the announcement is made and then they can fix them with zero days of vulnerability. If bundling of libraries occurs, then the problem becomes how to get fixes out to all affected packages. If the distribution patches those packages, they must be careful to not leak the fact that there is a security vulnerability before they are allowed (which means they need to be careful who they share the information and what information they share with others). OTOH, if they do not patch the packages bundling libraries, then those packages are not protected on zero day, but only afterwards.
  • When a security flaw appears, the program has to either update to a non-affected version of the library or backport a fix. This can be problematic when the code of the library has undergone many API and code changes since the version that is being bundled and the security fixing patch is very widespread. Many conflicts can arise that need time to fix when trying to backport the fixes but porting the application code to the new API version can also take a lot of time.
  • We cannot implicitly trust an upstream application to be on top of security issues that are released in the packages that they care about. What happens if you are not following upstream development and don't know that a security release has been made? What happens if the developer that is responsible for watching bundled upstream development goes on vacation or quits your project? What happens if your application ceases active development? What happens if upstream stops active development and security fixes start originating from distro patches?

Forking

Forking is occurring. Once an application starts bundling libraries, it's easy for the project to include local patches to the library to add features that upstream doesn't have or fix bugs that upstream hasn't addressed. This has several negative effects.

  • When a security issue appears, it becomes harder to fix the application bundling the library. If you attempt to upgrade to a newer version, you have to make sure your important local modifications get ported to the new version. If you attempt to backport, you have to merge the upstream fix to your own code-base which may have conflicts with the local modifications.
  • When working with the library that comes from upstream, there is a community of people who are interested in that library to fall back on for help. When working on your own private copy that community may not be interested in helping you work on your modified sources since they don't have control or knowledge of what your modified sources do.
  • Forking dilutes one of the strengths of open-source development. Instead of a project getting stronger with more people supplying patches to help drive the project and build a bigger community, the community of people interested in it are splintering, developing more and more divergent code-bases, solving the same problem over and over in different ways in different private copies of the library. Instead of everyone benefiting, everyone has to pay.

Bugfixes

Bugfixes are usually of lesser importance from security issues but share the same issues of hanging onto lingering problems that have been fixed in the main package.

Old Code

  • Old versions of code linger on. If the application can bundle its own version of a library, the incentive to port to newer versions of the library are reduced. This exacerbates the problems of security and bugfix issues. Instead of progressively porting to newer versions of a library as time goes on, porting to newer versions becomes a chore that has to be performed at the same time as addressing a security flaw. This puts time pressure on the project when the work could have been spread out over a longer period if only the porting had been done all along.

Licensing

Although licensing issues can crop up in any project, projects which bundle code from different sources together are a special source of concern. They make auditing for license issues a larger project.

When a Bundled Library is Discovered

Bundling of libraries is a serious problem. If a package that is in the distribution is discovered to have bundled libraries we need to take care of it. First, open a bug report against the package. Then add the bug to the Duplicate libraries tracker. Once that's done, if help is needed fixing the bug, ask on the mailing list. Maintainers must leave the bug open until the situation is resolved. If a patch is supplied the maintainer needs to evaluate it and actively work to apply it to the package. Note that unbundling is one of several areas in which we will locally patch to override upstreams.

Standard questions

You should consider these standard questions when you encounter a bundled library.

  • Has the library behaviour been modified? If so, how has it been modified? If the library has been modified in ways that change the API or behaviour then there may be a case for copying. Note that fixing bugs is not grounds to copy. If the library has not been modified (ie: it can be used verbatim in the distro) there's little reason to bundle.
    • Why haven't the changes been pushed to the upstream library? If no attempt has been made to push the changes upstream, we shouldn't be supporting people forking out of laziness.
    • Have the changes been proposed to the Fedora package maintainer for the library? In some cases it may make sense for our package to take the changes despite upstream not taking them (for instance, if upstream for the library is dead).
  • Could we make the forked version the canonical version within Fedora? For instance, if upstream for the library is dead, is the package we're working on that bundles willing to make their fork a library that others can link against?
  • Are the changes useful to consumers other than the bundling application? If so why aren't we proposing that the library be released as a fork of the upstream library?
  • Is upstream keeping the base library updated or are they continuously one or more versions behind the latest upstream release?
  • What is the attitude of upstream towards bundling? (Are they eager to remove the bundled version? Are they engaged with the upstream for the library? Do they have a history of bundling? Are they argumentative?)
  • Overview of the security ramifications of bundling.
  • Does the maintainer of the Fedora package of the library being bundled have any comments about this?
  • Is there a plan for unbundling the library at a later time? Include things like what features would need to be added to the upstream library, a timeline for when those features would be merged, how we're helping to meet those goals, etc.

Treatment of Bundled Libraries

Here are the steps to be followed when a package bundles a library but can be built against the system version:

  • Bundled libraries (and/or their source code) must be explicitly deleted during %prep. Build scripts may need to be patched to deal with this situation. Whenever possible, the patching should be done in a way to conditionalize use of the bundled libraries, so that it can be sent upstream for consideration.
  • It is not necessary to remove bundled libraries from the source tarball unless there is a legal reason to do so
  • Bundled libraries must NEVER end up in a package, even if they are not used.

Acceptable bundling

This section lists some cases where unbundling doesn't make sense or its cost does not justify the benefits of unbundling.

Kernel

The kernel is allowed to bundle libraries as it cannot use user space libraries. Additionally, the kernel has a unique exemption from the requirement to note what code is being bundled with comments and Virtual Provides. This has several reasons:

  • The code bundled by the kernel is stripped down to the bare minimum and sometimes modified in other ways so that it is significantly different from the upstream code.
  • The kernel code is scrutinized for security issues at a level that often exceeds the upstreams it is bundling.
  • Due to the modifications to the code that is bundled into the kernel, once the code is included it evolves separately from upstream.
  • There are many people working on the kernel capable and concerned with coding fixes for security issues, within Fedora, Red Hat, and the community at large.

The kernel maintainers are free to add virtual provides if they think it would be helpful to track security issues in the code in question but this is left to their discretion.

Note.png
Some things this case doesn't cover
This does not apply to the user-space tools that are built from the kernel srpm nor does it cover external kernel modules. For external kernel modules, Fedora does not package these but some third-parties use our guidelines. The criteria of not being able to use a userspace library should be one factor used to evaluate third party kernel module bundling but the ability of upstream to fix the code that they're bundling should also be taken into account even if that means some modules are prohibited.

Copylibs

The definition of a copylib is somewhat amorphous. At its basic level, the upstream for the library intends for you to copy the source code of the library into your program, modify it to suit your needs, and then release your software with continuous, forked modifications to that source. Just because you think you're dealing with a copylib does not make it automatically acceptable to bundle. In particular, the programming practice that is common in some Java, Mono, and scripting language circles of copying external libraries that are otherwise from a separate upstream into the program's source and distributing them together is not recommended. Bundling libraries whose upstream is dead and making bugfixes to the bundled copy is not recommended, either. As much as possible we want to have a single copy of a library in the distribution which everyone links to, so it's still recommended to package the bundled copy with bugfixes separately (ideally, asking the new upstream to fork officially).

Some of the criteria to evaluate the copylib case are:

  • Does the upstream library make actual releases? If they do, then it is likely not a copylib.
  • Does upstream define what they put together as a library or as reusable code snippets that are to be modified and incorporated as source in individual packages? If the latter, it's more likely that the library is a copylib under this definition.

Bootstrapping

Packages which depend upon bundled libraries in order to bootstrap a build may retain them for use when the package is built in bootstrapping mode. When the package is built in a normal mode, the normal guidelines apply.

Packages which are built in such a bootstrapping mode must not be tagged for a final release (or pushed as an update for any stable release). FPC will track the progress of approved bootstrapping exceptions via the ticket requesting the bootstrap bundling exception.

Conditionalized functionality

Packages which bundle specific subsets of third-party source code with the sole purpose of providing functionality that is not available in the system copy, and explicitly conditionalize that use in such a way that if the system copy provides that functionality, the bundled source code is not used, are exempt from the requirement to delete the bundled source code during %prep. Packages in this exception case MUST document this situation in a specfile comment, and verify that the functionality is properly conditionalized with each update.

Needing unreleased features

When an application needs unreleased features of a library and that library has committed to those features (usually, the changes are checked into the trunk branch of the upstream's revision control system) but the library has not yet made a release that has that code it might be acceptable to bundle that library until the Fedora packages contain the necessary extra features. Note that for this you should definitely review the standard questions regarding why the Fedora maintainer of the library feels we cannot include a backport of the feature/pre-release snapshot in our package, the timeline for the change to be merged and unbundling to occur to be answered so that we can make sure this is fixed should the package maintainer disappear or get busy with other things.

Note.png
Not for bugfixes
As noted earlier, bugfixes are rarely a good reason for this. Most of the time bugfixes should be backported to the current Fedora package rather than bundling a library.

Modified beyond a certain extent

Modification of a library should not be the only reason given to justify a bundled copy as the two questions come up: why can't these changes go back to the upstream for the library? Why isn't this library forked and released in such a way that others can benefit from the changes as well?

  • Example: recoll bundles unac but unac changed the API of unac and those changes were judged to only be of use to recoll and thus the bundling was allowed.
  • Counter example: rsync bundles zlib. However, the modified zlib is useful to others as the modified zlib is necessary in order to implement the rsync protocol. In particular, the program zsync needs to have a similarly modified zlib in order to be of use.

Reverse Bundling

Reverse bundling when a portion of an upstream codebase has been forked into its own, separate library package. These are then packaged in a separate package for Fedora.

When done for purposes of adding a backwards compat API for use by other packages this is okay (For instance, taking a module out of a newer version of the python stdlib and packaging it as a separate python module for use on older versions of python). Be sure to keep the forked code up to date with regard to the package it comes from. Apply to the FPC for a virtual Provide to use for tracking purposes.

Note.png
Only Applies To Libraries
Do not apply this case to code that is copied from another package and included as part of a larger application. This is only for code that is copied from another package into its own, separate library for the purposes of providing that API to other consumers.

When done for other purposes (for instance, splitting a useful utility function from a large monolithic upstream into its own package) this may be a factor in deciding if the bundling is acceptable but it may not be sufficient on its own.

Active upstream Security Team

  • Project is actively developed and has a responsive upstream, with new releases occurring at least yearly. Rationale: a) if a security issue does arise, we don't want to be left on our own; b) where projects have bundled code but are not fast-moving, the reward/work ratio of unbunding the code is higher.

AND

  • Project has an active security response team of its own and has demonstrated both the ability and the will to release timely security updates when issues are discovered in bundled code. Rationale: this reduces the burden on our security team, and does not put Fedora maintainers in the position of creating or carrying our own patches.

AND

  • The upstream project is actively working on unbundling.

Note that the criteria for upstream to be working on unbundling means that you should check that progress has been made periodically.

Requirement if you bundle

  • If you bundle a library, you are required to add a virtual provide to your spec file to note that you are bundling. This allows us to search for packages that may be affected by bugs or security issues in older versions of the library. The notation should look like this:
Provides: bundled(zlib) = 1.1.14

bundled() denotes that this is a bundled library virtual provide rather than something that other packages would want to depend on. Inside the paranthesis, the binary package that provides the library is listed. (For instance, zlib, bind-libs, NetworkManager-glib, libpng). The version notes which version of the library was bundled. If there's been a lot of incomplete backporting of changes from newer versions of the library, it can be hard to establish what version to use here. A very general rule of thumb is to use the oldest version that seems reasonable as the reason we're doing this is to tell when a library contains issues that have been fixed in newer upstream versions.

A list of known virtual provides for bundled libraries is maintained on a separate page.

Warning.png
No package should ever Require: a bundled() virtual dependency.
Note.png
Kernel Exemption
The kernel has an exemption from these requirements. See the Kernel section for full information]

Other distributions

As this is a place where we have to convince upstream that there's a problem, it's good to be able to point out that this is a problem for all distributions, not just Fedora. Here's links to other distribution's policies::