From Fedora Project Wiki
Line 165: Line 165:


* [http://software.llnl.gov/spack/ Spack] (Lawrence Livermore National Laboratory)
* [http://software.llnl.gov/spack/ Spack] (Lawrence Livermore National Laboratory)
  * Mostly targets POSIX HPC environments, so Windows support is likely weak
** Mostly targets POSIX HPC environments, so Windows support is likely weak
  * Mostly targets HPC use cases, so focus is on source builds optimised for particular hardware, rather than pre-built binaries usable across a broad set of hardware
** Mostly targets HPC use cases, so focus is on source builds optimised for particular hardware, rather than pre-built binaries usable across a broad set of hardware
  * Does provide build caching support, but expects these to be managed on a per-deployment target level
** Does provide build caching support, but expects these to be managed on a per-deployment target level
  * Initial verdict: decent choice for lab environment managers, not a great default choice for Fedora
** Initial verdict: decent choice for lab environment managers, not a great default choice for Fedora


=== Technologies under investigation ===
=== Technologies under investigation ===

Revision as of 02:28, 30 October 2017

Warning.png
This page is a draft only
It is still under construction and content may change. Do not rely on the information on this page.

Problem Description

Package management in Fedora is currently focused on system level packaging. This is a very "operations" oriented view of the world, as it emphasises the ability to reproduce systems exactly, while still being able to delegate the task of monitoring for and responding to CVE's to the platform provider (rather than having to track dependencies for security updates directly).

However, for many application developers and data analysts, being able to bring in new dependencies quickly and easily to solve problems is an essential requirement. It's also essential to be able to install a dependency into a local testing or analysis environment *without* affecting the operation of the system itself in any way.

Unfortunately, the relative lack of support for this model at the platform level in Fedora and other Linux distributions has resulted in these communities routing around platform level package management systems, placing significant barriers in the way of effective collaboration between developers, analysts and system administrators.

Solution: endorse layered architectures

The model that has emerged to effectively manage these conflicting requirements is to separate user & developer level dependency management tools from the system level dependency management tools used to create an integrated platform.

The base platform, its dependency management system, and the packages it contains then become primarily the responsibility of system administrators. Security, stability, reliability, & compatibility are the focus at this layer. In a Fedora context, this perspective is best represented by the Base, Server and Cloud WGs (together with the CentOS and EPEL communities).

The Environments & Stacks WG and the individual language SIGs then primarily represent the perspective of application developers and data analysts looking to work on top of that stable foundation. This also includes taking into account that many developers and analysts are writing deliberately cross platform software, and even developers and analysts exclusively targeting Fedora/RHEL/CentOS production systems may prefer (or be required) to use other operating systems on their local systems (whether that's other Linux distributions, Mac OS X, or even Windows).

The Workstation WG takes this work in the other WGs and brings it all together to provide a more cohesive developer experience for the Fedora/RHEL/CentOS ecosystem.

Audiences to be considered

There are two main audiences to be considered when looking at user level package management in a software development context. The first group is software developers (the same target audience as the Workstation WG). For these users, the priority is to have access to the standard development environment management tools for their ecosystem. For example, Maven for Java developers, pip for Python developers, gems for Ruby developers, npm for Node.js developers. Many of these users will be comfortable with consuming software directly from the upstream community, while others would prefer to be able to delegate the initial level of review to the wider Fedora community.

The second group can be usefully categorised as "data analysts" (even though the actual category is much broader than that, encompassing mathematicians, engineers, scientists, and more). These are folks that are programming not because they want to publish applications or services for others to use, but because they want to *automate their own work*. Compared to professional developers, these folks are far more likely to be making use of components written in a variety of different languages, including Python, Julia, R, FORTRAN, MATLAB/Octave, Scala, etc.

Directions to be Explored

Two primary areas of exploration have been identified for user level package management:

  • Embracing the existing practice of using language ecosystem specific tooling to deploy applications to Linux systems
  • Potentially recommending particular language independent tooling for more complex dependency management scenarios

Embracing ecosystem specific tooling (near term)

Fully embracing language specific tooling is likely to be the best way to reach the professional developer audience. ~bkabrda has started setting up a [devpi] instance as a proof of concept for running a filtered mirror of an upstream package repository that only includes packages that have been reviewed and determined to at least meet Fedora's licensing guidelines and to not be obviously malicious.

However, given the wide range of ecosystem specific packaging tools out there, it seems unlikely that this approach will scale in a sustainable way. To provide a more consistent management interface, it may be better to instead adopt a plugin-based solution like [Pulp]. Adding a new language to the "endorsed" list would then be a matter of writing a suitable Pulp plugin and integrating it with the build system, rather than setting up yet another a completely new repo management service within Fedora's infrastructure.

Many language ecosystems don't support the notion of redistribution at all. In these cases, republishing under the same name from a Fedora specific repo would be problematic. The likely near term approach to these communities will be to just place them in the "too hard" basket, and focus on ecosystems where the tools are in place to handle curated redistribution without conflict.

For more details, refer to Env_and_Stacks/Projects/LanguageSpecificRepositories.

Recommending language independent tooling (longer term)

Even if we manage to integrate language specific tooling into the review and build toolchains, it's unlikely it will be feasible to integrate every such toolset into the system management utilities. The language specific tools also don't do a good job of managing arbitrary external dependencies with associated ABI compatibility requirements.

As such, it's worth considering the possibility of recommending a particular user level dependency management toolchain, that allows software to be installed on a per-user basis, rather than as part of the integrated OS platform.

Like repackaging as RPMs for system level dependencies, the use of language independent tooling also makes it possible to handle ecosystems that don't include native support for redistribution by a system integrator.

There are three main possibilities that come to mind on that front:

Nix has many attractive properties (including support for custom build environments and per-user package installations without root access), but has the limitation of being POSIX specific. This makes it significantly less interesting to upstream communities that also encompass Windows based developers and data analysts.

By contrast, conda was created by Anaconda, Inc (formerly Continuum Analytics) specifically to tackle the problem of allowing scientists and data analysts to easily install the full Python analytical stack, including external C, C++ and FORTRAN dependencies.

Unlike nix and conda, which both rely on the conventional "source archive -> built artifact" model, conary blends in more [software version control concepts] into its design. This makes it a potentially very good fit for "fitting the brains" of software developers, as well as providing a smoother transition between the worlds of software development and operational maintenance of production services.

Some initial research into these options was undertaken over the course of 2015, and there have been some [very preliminary investigations] into the feasibility of creating a Pulp plugin to host conda packages (given that conda includes a command to generate the relevant index metadata, this seems like it would be quite a plausible approach). There have also been some similarly preliminary discussions between the Pulp and Nix development communities regarding the possibility of a pulp-nix plugin.

However, further investigation into any of these possibilities has been deferred until after the Fedora Modular Server design and development work has been completed.

Other Considerations

Software Collections

Software collections represent a hybrid model for layered architectures, where platform components are selectively upgraded within the context of the existing system level package management system.

This is a useful model, as it automatically integrates with all existing system level auditing tools. However, it is aimed primarily at folks that are already using system level packaging tools, and would like to selectively upgrade particular components. It is less interesting in the context of being able to provide consistent cross-platform dependency management instructions.

From the perspective of user level package management, a software collection becomes just another environment to target, just like targeting the default system environments directly.

Linux Containers and Docker

In a Docker context, the work of the Environments & Stacks WG largely applies to the way container images are built. From a development perspective, containers don't actually change all that much relative to any other system for dependency management - it simply shifts the complexity from deployment time to image build time.

Where Docker helps dramatically is in managing the division of responsibility between folks with an operations focus, who can specialise in providing base images and the infrastructure to run containers, and those with a development focus, who can focus on the creation and deployment of full application containers.

User Level Packaging Tools

These are some preliminary notes on the different user level packaging tools being considered.

Why a different packaging technology?

It would technically be feasible to support "user level installation" directly from RPMs. From a user experience perspective, however, "system level installation" and "user level installation" serve quite different purposes. While it's an oversimplification, the distinction can be usefully summarised as "system packages are for system administrators, user packages are for developers and data analysts".

Using a different technology thus makes it possible to choose something specifically optimised to meet the needs of developers and data analysts across a range of scenarios, without increasing the complexity of the system administrator focused dnf/rpm experience.

In a container context, the split also makes it clear whether or not a package is considered ready for use on the host or in a super privileged container (i.e. it's available as an RPM), or if it is only recommended for use in the development of pre-integrated services.

Options briefly explored

This section lists additional options that weren't covered in the first draft of the document but should be considered for future drafts:

  • Spack (Lawrence Livermore National Laboratory)
    • Mostly targets POSIX HPC environments, so Windows support is likely weak
    • Mostly targets HPC use cases, so focus is on source builds optimised for particular hardware, rather than pre-built binaries usable across a broad set of hardware
    • Does provide build caching support, but expects these to be managed on a per-deployment target level
    • Initial verdict: decent choice for lab environment managers, not a great default choice for Fedora

Technologies under investigation

As noted above, the language independent user level packaging tools being investigated are:

  • Nix
  • Conda
  • Conary

Key questions to be asked/answered

To help focus discussions at Flock 2015, we wanted to able to answer for each of these technologies:

  • how is it used by end users?
  • how can I use it on my current Fedora machine?
  • what are its benefits and shortcomings for users compared to rpm/dnf?
  • would depending on this technology restrict a given project's ability to attract contributors running other operating systems?
  • what does it take to create and maintain a package?
  • what's involved in converting an existing rpm to the new format?
  • what are its benefits and shortcomings for package maintainers compared to rpm?
  • what are the benefits to Fedora in adopting this new technology?
  • what does the new format require in terms of additional Fedora infrastructure support?
  • what other costs are there to Fedora in adopting this new technology?
  • what is the focus of the existing community around this technology?
  • what are the major technical risks/concerns with this approach?

Nix

  • Actively investigating: ???
  • Previously explored: Nick Coghlan (stopped investigating due to lack of Windows support and problems with low impact security updates)

Nix can be used for both system level package management (NixOS) and user level package management. For user level package management, system administrators must still initially install the Nix package manager as there is a globally shared cache of packages, with user specific views into that cache.

Users are able to

Relying on Nix may create barriers to cross-platform collaboration on a project, as it does not offer native Windows support.

The major focus of the community around this technology is mathematically exact reproducibility of system configurations.

The major technical risk/concern around this technology is the feasibility (or lack thereof) of providing low impact security updates to components low in the technology stack. While there is [some support] for the concept, it's clearly an addon feature - since Nix systems are immutable by design, updating components in place without rebuilding other components that depend on them needs to work around that essential immutability.

Other answers TBD

Conda

  • Investigated by: Nick Coghlan (stopped using conda personally since F22 caught up on the packages I want, but still consider it an excellent cross-platform option)

Conda is aimed specifically at handling user level package management. While it can handle installation and update of language runtimes as readily as it can handle installation and updates of individual runtime components, it isn't designed to support installation or update of even lower level components like operating system kernels. One scenario it is specifically designed to support is cases where users do *not* have local administrator access on their systems.

Relying on conda should not raise any barriers to cross-platform collaboration on a project, as it runs entirely in user space across Windows, Mac OS X and Linux.

The major focus of the community around this technology is data analysis using open source tools, as it was born out of the Scientific Python community. The related Python distribution, Anaconda, is also the distribution of choice for the Software Carpentry computational analysis workshops.

The major technical risk/concern around this technology is the fact that it was developed primarily for a data analysis focused audience, which means there aren't a lot of existing examples of it being used effectively for other more application and service development oriented use cases. There's also a potential problem around the fact that the original creators of conda, Continuum Analytics, publish a Python distribution under the name "Anaconda", conflicting with the name of the Fedora installer, and that a number of the conda recipes to create the packages included in the Anaconda distribution are not yet open source (this likely isn't a big deal, as many of those packages are also packaged for Fedora, and even for those that aren't, Continuum generally just redistribute upstream packages unmodified)

Other answers TBD

Nick Coghlan's personal perspective: I consider conda my "default option" - I've used it myself, so I know it basically works, the packaging model is essentially consistent with that of RPM (albeit with an external build script rather than inline shell snippets), and it's expressly designed for user level package management without any aspirations for use in managing an entire Linux distro. The problem we have is specifically the problem Continuum Analytics built conda to solve. While we might not want to adopt conda *itself*, but rather create a downstream derivative that uses hawkey for constraint resolution rather than pycosat, I consider the conda user experience to be the baseline for what I'd like to achieve with Fedora's user level package management - any potential problems we might encounter with it would be backend engineering related as we ensured it met our security expectations.

Resource links

Conary

  • Actively investigating: Nick Coghlan, ???

Example being investigated: running Kallithea locally (but still in the user's home directory) using Conary rather than upstream pip

Bootstrapping a conary environment is currently proving to be a "DIY adventure" as all the available documentation effectively assumes the use of either rPath Linux or Foresight Linux, which both also used conary as the system package manager. The docs also tend to assume you want a full conary+rMake+rBuilder setup, and finding a "minimum useful local setup" is interesting. I made some progress on getting the integration test suite running locally on Fedora 22 here: https://github.com/sassoftware/conary/issues/6

The main fundamental technical risk with Conary is the shift to an active server model for the repository. This offers both advantages and disadvantages, but at the very least, significantly increases the complexity of getting up and running with the packaging process. Judicious use of containerisation will help with that, but the decoupling and simplifying of the bootstrapping process will need to be accounted for (perhaps through development of a "Conary Build Environment" role for Rolekit?). It also hasn't been actively developed for the past couple of years, and thus would require porting to Python 3.

Resource Links