From Fedora Project Wiki


THIS PAGE IS OBSOLETE

IntelligentMirror Proposal

InstantMirror currently works fine but involves a lot of configuration and that might actually stop someone from using it. This project is being proposed after seeing the discussions on fedora-devel , Warren's journal , InstantMirror Wiki . IntelligentMirror will involve minimal configuration for setup. Below is the IntelligentMirror design.

About IntelligentMirror

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

IntelligentMirror Design

If deployed properly in an organization or a university, IntelligentMirror can help in saving a significant amount of bandwidth which is wasted in re-downloading the updates. This design is being proposed after discussion in #yum on freenode. The design for this project is highly inspired from the InstantMirror project. This will be implemented during the coming summer.

Basically the most obvious use case of IntelligentMirror is to be used by Yum for updates. The IntelligentMirror will have a helper package which can even help in prefetching the packages from a remote repo in the local repo so that the local repository is up to date at any instant of time. The advantage of helper package will be that whenever a user requests for a package, he/she gets the latest package within no time. The helper package will be optional and off by default. The IntelligentMirror will be a single daemon waiting for package requests and will fork itself in case of multiple requests.

Method of Operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if URL is not in (RPM, metadata file)
    • Then its none of our business.
    • Let proxy handle it the normal way.
    • Done and exit.
  3. Error Check: if remote host is not reachable
    • Check: if RPM/metadata is available in cache
      1. Stream the RPM/metadata from cache.
      2. Done and exit.
    • else
      1. Throw a "No route to host" error.
      2. Done and exit.
  4. Check: if RPM/metadata is available in cache
    • Check: if RPM/metadata in cache is older than upstream
      1. Delete RPM/metadata from cache.
      2. Download and stream.
      3. Done and exit.
    • Check: if RPM/metadata matches upstream or newer than upstream
      1. Stream the RPM/metadata from cache.
      2. Done and exit.
    • Check: if RPM/metadata does not exist upstream
      1. Delete RPM/metadata from cache.
      2. Throw a "Not found" error.
      3. Done and exit.
  5. Check: if RPM/metadata is not available in cache
    • Download and stream.
    • Done and exit.

Download Process

In the above operation everything is clear except the download process. If a file is already being downloaded from upstream and another client request come in for the same file, then we have two options to continue downloading

  1. Download via only first instance (master) and let the other instances (slaves) copy the partial content to the client. The disadvantage is that the slaves will be throttled by the master's download speed.
  2. The other instance also starts downloading from upstream and append data to the local file. Stream the data to clients when download is finished.


Types of IntelligentMirror

Depending on the no. of users, there will be two types of IntelligentMirror.

IntelligentMirror for a small group

This type of IntelligentMirror is to be used by a small group of people. In this case we have to get rid of all the dependencies like squid, Apache because for a small setup people won't bother to configure squid and Apache. So, here we use a open source python based proxy server (e.g. http_replicator) and integrate IntelligentMirror with it in caching mode. So, it becomes easy to setup and doesn't require squid, Apache.

IntelligentMirror for an organization

This type of IntelligentMirror is to be used by an organization. As almost all the organizations ( more focus on institutes/universities here) use a common proxy server to access the Internet, the IntelligentMirror should be integrated with squid. There will be no difficulty in setup as the people already use squid (assuming squid is widely used in Unix/Linux world) and they know how to configure it. We can't use proxy server implemented in python here because no organization would ever agree to use a stripped down version of python based proxy server instead of squid.

IntelligentMirror Helper

This packages will be an optional part in IntelligentMirror. It will help in prefetching the most popular packages which are updating very frequently like kernel. Popular packages will be decided by analyzing the user requests for the same and will be kept in sync with the remote repo. Also, this package will pre-populate the local repo with packages like mplayer, vlc, gstreamer etc. which are normally requested by users after a fresh Fedora install. The idea is to minimize the the access time per package and the bandwidth usage. The helper package is relevant only in case of organization where the no. of users is large.

Imagine a university with thousands of Linux users and everyone is updating their system weekly. GBs of bandwidth is being wasted every week due to subsequent downloads of the same package.

Advantages

  1. This will save a lot of bandwidth and will be of special help to the universities in developing countries where bandwidth is still a scarce resource.
  2. This will help in saving a lot of time as well.

Task List

  1. Implementing IntelligentMirror which will act as a squid plugin in caching mode and will cache the packages at a specified location.
  2. Extending IntelligentMirror so that it can synchronize cached packages to alternate machine on the same local network for creating more mirrors. This is not really needed if proxy server has enough hard disk but someone may need it to decrease the load on proxy server.
  3. Implement IntelligentMirror Helper which will help in pre-populating the repo with popular packages and will keep the local repo up to date.

Suggestions

If you have any suggestions for improvements or you want to comment on any part of design, please feel free to leave your comment.

References