What is a Focus Document?
The Factory 2.0 team produces a confusing number of documents. The first round was about the Problem Statements we were trying to solve. Let’s retroactively call them Problem Documents. The Focus Documents (like this one) focus on some system or some aspect of our solutions that cut across different problems. The content here doesn’t fit cleanly in one problem statement document, which is why we broke it out.
Background on PkgDB
PkgDB is the central repository for information on all packages in Fedora. Its primary job is to manage the ACLs of a package’s dist-git branches, but it’s also an interface to request new packages in Fedora and view metadata about packages such as a summary of its purpose and external links to a package’s builds, updates, and source. It is a central part of Fedora’s workflow today.
PkgDB in general works well for the current workflow of a package’s lifecycle. Currently, a package has several branches that are tied to Fedora releases such as “f24” or “f25”. These branches have implied service level agreements (SLAs) and end of life (EOL) dates based on the Fedora release itself. There are also additional branches for Extra Packages For Enterprise Linux (EPEL), which implicitly share the EOL of its RHEL/CentOS counterpart. The SLA on EPEL branches are strict as they cannot contain any breaking changes; therefore, a package’s EPEL branches are only updated with bug fixes, security fixes, and features which do not alter the current behavior.
Although the implied SLAs and EOLs in these branches have worked well for Fedora in the past, it’s becoming increasingly difficult to juggle these different application lifecycles and their dependencies’ lifecycles under the umbrella of these limited number of SLAs and EOLs, especially when trying to keep up with upstream. Please read about the Fedora Modularity project for more argument on this position.
Why Make This Change?
Factory 2.0 is an enabler for the Modularity project and that project will change how a package’s SLA and EOL are defined. Since in a modular operating system (OS) a single package is no longer tied to the entire OS distribution, module packagers can now specify which specific packages should be included in the module. This allows for different branches with different SLAs and EOLs. For instance, you may have a bug and security fix only branch for a specific version of a package, but you may have another branch that is the same as the upstream master branch for the package. This added flexibility can allow for a package’s SLA and EOL to be defined to fulfill the needs of the module rather than just the OS release.
To further clarify this point, there are some graphics from Ralph Bean’s Factory 2.0 presentation from DevConf 2017 which show the proposed changes in branching. Below are two of those graphics; the first illustrates the current branching strategy and the one below illustrates the new branching strategy that will come with Modularity.
To further illustrate, below is a graphic of two versions of a django module and how it utilizes this new branching strategy.
- The 1.9 django module utilizes the 2.12 python-requests branch and the 1.9 python-django branch. The 2.12 python-requests branch in this case would most likely only be updated with bug fix releases (i.e. 2.12.x). The same would apply for the 1.9 python-django branch.
- On the other hand, the 1.10 django module utilizes the master branch of python-requests which means that whenever the module is built, it’ll just be using the latest packaged version available. This introduces a potential risk as their could in theory be breaking changes to python-requests, but it could be deemed that whatever benefits that the latest upstream provides are worth the risk of potential incompatibilities in the future.
These are all decisions that module packagers will make and they should be based on the SLA and EOL they specify for their module. Now whether the module packager made the correct decisions is out of the scope of this document and best practices around this should be written by the Modularity team. One thing that is not clear in this graphic is that although these branches are named after versions and it makes logical sense, they do not have to be and could be named anything that the packager of the RPM would like; therefore, modules should choose RPMs because of their EOLs and SLAs, not because of the name.
How To Make This Change
Since PkgDB is tied to the workflow of one branch per OS distribution, it needs to be heavily modified or entirely replaced to allow a packager to arbitrarily specify branch names for their package. Since these branches are no longer tied to the lifecycle of the Fedora release, they will need to have SLAs and EOLs defined by the packager for the given package. The SLAs should be defined through a variety machine readable parameters. The specific parameters will be decided later on and this should be decided upon by the Fedora community before the release of these changes. The Factory 2.0 team thought about modifying PkgDB but Pierre-Yves Chibon/pingou (the author of PkgDB) noted that with Pagure being deployed over dist-git in Fedora (as of this writing it is deployed in staging), it, alongside of PDC, could be a viable option to replace PkgDB.
With Pagure over dist-git and PDC (both with modifications), we could do away with PkgDB entirely for the new branching strategy and retrofit it for older branches (EPEL6/7 and F24/25/26) that we must support. This change would mean that ACLs would no longer be tied to branches but instead to the repository itself. One could either allow any packager with repository commit access to push to any branch, or not allow packagers to push to any branches and instead, only submit pull-requests which can then be merged by maintainers of the repository. To gain rights on the repository, you’d have to be given commit access the specific package repository.
There are other use cases for PkgDB other than ACL management. For instance, PkgDB is responsible for the implicit EOL and SLA information tied to the Fedora release branches (e.g f25), however, this new EOL and SLA information needs to be explicit and stored somewhere that is accessible programmatically. There are two approaches to solve this problem that have come up in discussions.
- The first option is to create a Product Definition Center (PDC) endpoint to contain the EOL and SLA information. These entries would most likely be requested through a ticket and upon approval, the entry in PDC would be created. Pagure would restrict the creation of new branches to those that have an EOL and SLA defined in PDC.
- The second option is to keep a separate git repository with a YAML file containing the EOL and SLA information of the package branch. This YAML file cannot be in the same repository as the package because it will then never be able to share the same codebase as another branch, which is a requirement for some packagers.
We decided that option one is the right approach, and here are video demos showing that functionality:
PkgDB also has an “Admin Actions” section which is basically a ticketing system that allows new packages to be requested and approved. This could be replaced with a Release Engineering (releng) controlled Pagure repository just for tickets. CLI tools will be provided that make and process these tickets. This method also opens the door for some of these tickets to be automatically processed by a microservice that listens for new ticket messages. This microservice is beyond the scope of this project, but it is worth mentioning as a future enhancement.
PkgDB also keeps track of orphaned packages which are just packages with no owners. Ideally this functionality would be replaced with a query to Pagure or PDC. Querying Pagure would just be an API call asking for all projects that are owned by the "orphan" user. Another feature of PkgDB is that it keeps track of retired packages. Retired packages are currently distinguished by a file named “DEAD.package” in their repository. We would continue to do this, but would also mark all of the package's branches as "not active" in PDC to allow this to be queryable programatically. PkgDB is also a source for determining the default assignee and carbon copy (CC) list for new Bugzilla bugs. The default assignee of a bug would likely be the owner of the Pagure repository, but this point is up for discussion. As to the second point, a new feature is being added to Pagure to allow a user to watch only issues and PRs, only commits, or both on a project (PR #2255). The users that are at least watching the issues in the Pagure repository would then be CC'd in Bugzilla bugs. PkgDB is also a source for generating “PackageNamefirstname.lastname@example.org” email aliases. The members to these aliases would be determined by the list of users that have commit access on a package repository in Pagure. PkgDB also determines who is allowed to edit Bodhi updates. This could be determined again by querying for the list of users with commit access in Pagure.
The drawback to these Pagure queries is that the queries could be relatively slow and would use up a lot of resources on the Pagure server(s). An idea to alleviate this is to programmatically query Pagure when package ownership changes or a package is retired. The results would then be stored in a JSON file that is accessible through the proxies or available in PDC instead. Further discussion on this topic is needed.
How This Affects RCM (WIP)
- Need a way to enforce that a module’s SLA and EOL is not greater than the lowest common denominator of it’s components
- “cvsadmins” currently process new package requests with the “pkgdb-admin” CLI tool. This will need to be done in Pagure with a new tool. We got the approval from limb on this.
- When package branches go EOL (on their own terms), RCM will need tooling to make the retirement of those branches happen. Things such as sending emails to the relevant people will need to happen.
- Does the mass branch process go away or do we need a new equivalent version of that process for modules?
Additional Tooling Changes
- PkgDB is often queried to determine what the latest active releases are. This data will need to be stored in PDC instead.
- Zodbot queries PkgDB to figure out the latest release for cookies. It will need to query PDC instead.
- Bodhi queries PkgDB to find the list of critpath packages in a collection. This will need to use PDC instead.
- Not really related to PkgDB, but to branch names: How will the branches be tied to Koji? Currently
fedpkguses the branch name to generate the correct dist tag, and checks if the build already exists based on the generated NVR. The mapping from branch names to dist tags in the tools would have to be updated.
- Internally dist-git is using branch names to select a commit policy (checking that commits reference bugs with correct flags). Are the arbitrary branches going to have no policy associated? Anyway there will likely be some work needed to update the hook.
General Work Plan
Below is an itemized list of big tasks that roughly need to happen in order. At what point do we get to decommission pkgdb? What subprojects are blocked on what other subprojects?
- (done at this publication) Have the Modularity team and/or Fedora Community devise a machine readable scale for SLAs with a series of parameters that include: bugs will be fixed, security issues will be fixed, how fast CVEs will be addressed, non-breaking features will be added, breaking changes will be added. These parameters will either be boolean values or time durations.
- (done at this publication) Add a PDC endpoint to allow storing the new EOL and SLA information (include ability to retire packages)
- Add the following features to Pagure
- (done at this publication) Query for repositories owners/admins
- Need a git hook to deny creating new branches via git if they don’t also appear in PDC. This needs to apply only to mainline dist-git branches. The “forked” dist-git branches under your personal username should allow arbitrary branches.
- Implement a strategy for retired and manually decommissioning branches
- Release a new version of Pagure that sits on top of dist-git
- Offload ticketing of new package requests to pagure.io/SOMEPROJECT
- Use a Pagure repo for this on https://pagure.io. Ask releng and “cvsadmins” if they want to use pagure.io/releng or if they want a brand new repo just for new package requests.
- Enforce that all tickets have an EOL and SLA for the specified package branch
- Automate the workflow of package requests from the chosen ticketing system
- Update the script to generate the “PackageNameemail@example.com” mail aliases from the list of package owners to use Pagure
- Restrict who can edit Bodhi updates based on the list of owners in Pagure
- Modify the bz sync script to set the owner/default cc list in Bugzilla based on pagure data
- Modify the Koji sync script that sets the packagelist owner in Koji based on Pagure data
- Lastly, decommission PkgDB