From Fedora Project Wiki
No edit summary
Line 196: Line 196:
volume of "pointless" updates, what is next?
volume of "pointless" updates, what is next?


I propose we look at two things right away:
Two things we may want to address right away:


1. Limit the frequency of non-critical updates to once per week in
1. Limit the frequency of non-critical updates to once per week in
Line 205: Line 205:


So, what are "critical" updates anyway?  And what norms should we apply to stable releases?
So, what are "critical" updates anyway?  And what norms should we apply to stable releases?


= Comments =
= Comments =

Revision as of 18:17, 23 April 2010

Problem

We currently provide our users with a very poor experience when updating their operating system.

Our current package-centric view of OS software updates is less than ideal. Most people are not interested in learning about updates to hundreds of tiny packages they have never heard of.

These myriad individual updates aren't even well coordinated. They arrive at the user's computer and pester her to update them in a random and seemingly continuous stream. This does not show respect for the user's time.

The many combinations of packages that are possible due to this lack of coordination means that we don't actually test any of them. There is no set that is canonical or even most likely.

So, basically, what we have today is:

  • Arcane
  • Uncoordinated and disruptive
  • Often broken, fragile, and largely untested

A very poor user experience.

Requirements

All releases

This section applies to all rawhide releases, and stable releases.

  • System components must be tested as a single unit before release
    • To avoid regressions
    • Users must have confidence in updates not breaking their system. Fear is not a good experience
  • System components must be presented to the user as a single install unit
    • Many users are not interested in details of arcane system packages; and those that are can still find out
    • The update was tested as a unit, therefore it make sense to present it as a unit
    • Most people simply install all updates anyway
  • System updates must not break any application that we define as critical (Firefox, etc)
    • Because applications are what users care about
  • Users should be able to detect whether updates are available from within the app they are using
    • This is just being nice and helpful
  • Application updates should appear to the user as a single object (ie. not a set of loosely related packages)
    • Most users do not care how the app is broken down into individual packages
    • Showing all the individual packages makes it easier to end up in broken situations where some updates are installed, others not
    • Full details will still be available for users who care
  • Applications must not install updates on their own but should work in concert with a central update installer.
    • To limit the number of potential reboots after installs
    • To offer the option of opting out of certain types of updates
    • To not have to prompt the user repeatedly in different ways
    • So applications don't expect that they can do whatever they wish
    • So we can better test the combinations of software
    • Allow users to discover or be reminded of pending updates
    • To allow users to batch update disruptions
  • Release engineering may be free to withdraw or block a package if it breaks the integration tests
    • Otherwise, why test at all ?
  • System must have a way to defer or queue updates to currently running software
    • To avoid e.g. the browser starting to misbehave because an update thats running in the background has updated the firefox package
  • System must be able to download updates in the background using idle time and bandwidth
    • This is a part of being respectful of the users time
    • Having the updates already downloaded when the user sees them makes the update experience snappy
  • System must be capable of installing update non-interactively (no prompting)
    • Otherwise we force the user to sit through a potentially lengthy update process, wasting his time
  • System must be capable of installing the updates that require restarting at shutdown time
    • This is a natural break in the user's flow
    • The process can be left unattended and it will power off the machine when complete

Stable releases

  • Batched updates must occur and appear to the user no more often than once per week
  • Tuesday suggested as the update day
  • System updates should only:
    • fix critical bugs or security vulnerabilities
    • provide hardware enablement
  • System updates may only be deferred for a short time after which they will be installed automatically
  • System updates should be able to run autonomously when the user requests a shutdown (but not necessarily on reboot since a reboot means that the user wants to resume operation and is not done with the computer)
  • Application updates may be deferred or permanently ignored at the user's discretion
    • One reason for this is that for the user the system is working fine and changes may be known to break workflows or habits or devices.
  • Application updates may add new features even in a stable release at the upstream brand's discretion as long as someone can be held responsible for fixing problems
  • Should perform more testing on or elimination of "zero-day" updates after a release
    • Thorough testing of a release is less relevant if the system is never in that state due to zero-day updates.

Rawhide

  • Nightly composes should be tested to boot, start a graphical user session, and load a web page before being pushed out
  • There should be a weekly or bi-weekly snapshot stream/repo for early adopters that do not want the nightly changes
  • Weekly snapshots should receive more testing than the nightly and a broader category of functionality should be confirmed to work
  • It should always be dog-foodable and should never be known to be severely broken
  • We should take our Alpha, Beta, etc releases more seriously


Impact

Community Organization

Having well defined and agreed upon checkpoints is one of the only ways to manage large open source communities. We agree that using a time based release process for stable releases is worthwhile but we give up on that the moment the release goes out the door.

By elevating Rawhide to something that people can actually use we can bring the community back into the Fedora development process.

QA

The Quality Assurance team's job becomes tractable. The OS can be tested as a well defined unit. If someone wants to deviate from that - they are in uncharted waters. But right now we're all in charted (untested) waters. And that is scary.

Release Quality

Giving QA a job they can actually do is only one part of this. Having a larger community testing and using (dog-fooding) Rawhide before a release is also essential. As it stands, we don't even have a majority of the core Desktop design and development teams using Rawhide until shortly before the release.

Documentation

Documentation teams will have opportunity to document release updates where required.

Increased visibility into the design and creation process (during Rawhide) should also be a big win.

Marketing

Obviously, being able to avoid embarrassing oversights is an important goal. So is being able to tell a story about why Fedora is good for you.

Certainly, having well defined checkpoints allows a marketing team to schedule work more easily.

Having marketing folks be closer to the design and development phases of Fedora should facilitate building roadmaps for future releases and creating good stories about current ones.

Increasing visibility into the Rawhide cycle allows more viral marketing to occur - and to build anticipation for a release.

Software Developers

Being able to give (third-party) software developers some idea of what they can expect to use and depend on in Fedora would be helpful. Giving developers a chance to test their own apps with newer releases is important.

There is a lot more we can and should do here.

Upstream Contributors

Making Fedora the best choice for working on upstream software really requires making Rawhide more usable. This helps build the Fedora development ecosystem.

System Administrators

Want to be able to test changes before they hit users. Want better documentation of changes. Want to see what is coming in future releases before they arrive. Have a strong incentive to see that updates are automatically or timely applied. Are overworked and would love to set up things on autopilot provided they never break things. In many cases wish to or need to delegate some authority to users to install apps or to perform updates. I hope that we can address each of these things.

Users

Basically just want things to work and not disrupt or annoy them. Which means they trust us. Until they don't - and then they are gone.


Testing

NOTE: THIS PART IS SKETCHY

All snapshots (including nightly Rawhide) should ensure they don't break the following: System boots and can get to a web page to get help for minor issues if needed

In detail this means the following must work:

  • Grub
  • Kernel
    • process input
    • detect monitor
    • display output
    • support root filesystem
    • support home filesystem
  • Dracut
    • mount root filesystem
  • Plymouth
    • only for encrypted disks?
  • Upstart
  • Xorg
    • process input
    • detect monitor
    • display output
  • GDM
    • autologin user
  • gnome-session
    • since nothing starts without it
    • handles logout
  • gnome-settings-daemon
    • xrandr
  • gnome-panel
    • displays network applet
    • allows starting web browser
  • NetworkManager
    • connect to wireless
  • Firefox
    • load website

This is essentially what Critical_Path_Packages_Proposal is about.

Discussion

Implementation

There are a lot of components to this problem. It would be prohibitively difficult to solve all of the issues at once. So, we may want to implement a solution in stages.

Phase I

Now that we have rough consensus that we should try to limit the volume of "pointless" updates, what is next?

Two things we may want to address right away:

1. Limit the frequency of non-critical updates to once per week in stable releases

2. Establish norms or rules that limit the types of changes in stable releases to ensure the releases remain stable

So, what are "critical" updates anyway? And what norms should we apply to stable releases?

Comments

Related Work