From Fedora Project Wiki

< Features

Revision as of 21:00, 30 December 2010 by Amitshah (talk | contribs) (→‎Documentation: Add link to blog post)

Yum Metadata in Git

Summary

Store the yum metadata in a git repository to speed up downloads and minimise server resource usage.

Owner

  • Email: <your email address so we can contact you, invite you to meetings, etc.>

Current status

  • Targeted release: [[Releases/<number> | Fedora <number> ]]
  • Last updated: (DATE)
  • Percentage of completion: 0%


Detailed Description

Why not let git handle yum metadata for us? The metadata is a text (or sqlite) file that lists package names, their dependencies, version numbers and so on. Since text can be very easily handled by git, it should be a breeze fetching metadata updates from a git server. At install-time (or upgrade-time), the metadata git repository for a particular Fedora version can be cloned, and on each update, all that's necessary for yum to do is invoke 'git pull' and it gets all the latest metadata. Downloads: a few KB each day instead of a few MBs.


The advantages are numerous:

   * Saves server bandwidth
   * Uses very less server resources when using the git protocol
   * Scales really well
   * Compresses really well
   * Makes yum faster for users
         o I think this is the biggest win -- not having to wait ages for a 'yum search' to finish everyday has to get anyone interested.  Makes old-time Debian users like me very happy.

There are a couple of things to consider, though:

   * Should the yum metadata be served by just one canonical git server, while the packages get served by mirrors?  Not each mirror may have the git protocol enabled nor can the Fedora project ask each mirror to configure git on the server.
         o Doing this can result in slow mirrors not able to service package download requests for the latest metadata
         o This can be mitigated by using git over http over the server
   * The metadata can keep growing
         o This can be mitigated by having a separate git repository for the metadata belonging to each release.  Multiple git repos can be set up easily for extra repositories (e.g., for external repos or for multiple version repos while doing an upgrade).


Benefit to Fedora

  • Much faster yum experience
  • Lesser server resource usage

Scope

  • Needs interaction with yum developers, Fedora admins, and possibly mirror maintainers
  • Changes might be needed to ensure that when a mirror is tried for downloading metadata, a 'git pull' is attempted on the mirror first. There should be an http backend for each mirror in case the mirror hasn't enabled git support.
  • The mirror list should include git repo names that can be added and probed via 'git remote'

How To Test

  • Test with mirrors that use the git protocol to serve yum metadata and servers that don't (should drop down to using git over http)
  • Test if upgrades go fine (git clone a new repo)
  • Test if multiple git repos work fine on one system (e.g, repo information for multiple releases)
  • Test if yum falls back to downloading the entire metadata file in case the git method is not available or git is not installed locally

User Experience

Much faster yum metadata downloads. For an F14 install, the compressed updates metadata file is more than 4MB in size. This 4MB of data is downloaded each day (or at least when the metadata is refreshed, which is quite often). No user-visible changes are needed for this to work, though.

Git will have to be installed as a dependency to yum, meaning git will have to be installed on each Fedora system to take advantage of this feature. In case that's undesirable, yum can fall back to the older behaviour of downloading the compressed metadata file.

Dependencies

  • The Fedora admins and yum developers need to be involved in pushing this through.

Contingency Plan

yum continues downloading the compressed metadata as it does today in case this feature isn't ready on time.

Documentation

Release Notes

  • yum now has the ability to use git to fetch metadata. This results in much faster user-experience as well as reduced server resource usage.

Comments and Discussion