From Fedora Project Wiki
mNo edit summary
(matching release notes with #515)
 
(53 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Put swap on ZRAM =
= swap on zram =


== Summary ==
== Summary ==
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release.
Note that motivation for the change should be in the Benefit to Fedora section below, and this part should answer the question "What?" rather than "Why?". -->
Swap is good, unless it's slow. ZRAM is a compressed RAM disk, a kernel feature that presents a /dev/zramX block device that can be formatted as a swap device. The ZRAM block device can be assigned a size, but the actual RAM consumption is dynamically allocated and deallocated on demand.


This change will result in the inclusion of an upstream systemd rust-zram-generator on Fedora (all editions and spins). The presence of the generator will not enable swap-on-zram. Enabling swap-on-ZRAM will further require presence of a properly formatted configuration file.
Swap is useful, except when it's slow. zram is a RAM drive that uses compression. Create a swap-on-zram during start-up. And no longer use swap partitions by default.
 
This change proposal recommends all Fedora editions and spins opt into the feature. But at the time of this proposal the opt in candidates are:
 
- Fedora Workstation edition
- All Fedora install ISOs
 
Fedora installation ISOs have been using an Anaconda provided systemd service unit and script, to setup swap-on-ZRAM, for several years. Likewise, Fedora IoT uses swap-on-ZRAM since inception, using its own systemd unit and script (functionally similar to Anaconda but in its own package). This proposal recommends converging on a single implementation.
 
The kernel supports multiple (up to 32) swap devices. The




== Owner ==
== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: chris@fedoraproject.org
* Email: bugzilla@colorremedies.com
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
Line 28: Line 16:
* Responsible WG:
* Responsible WG:
-->
-->


== Current status ==
== Current status ==
[[Category:ChangePageIncomplete]]
[[Category:ChangeAcceptedF33]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- The Wrangler announces the Change to the devel-announce list and changes the category to Category:ChangeAnnounced (no action required) -->
<!-- After review, the Wrangler will move your page to Category:ChangeReadyForFesco... if it still needs more work it will move back to Category:ChangePageIncomplete-->
[[Category:SystemWideChange]]
[[Category:SystemWideChange]]


* Targeted release: [[Releases/<number> | Fedora <number> ]]  
* Targeted release: [[Releases/33 | Fedora 33 ]]  
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}  
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}  
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page
* FESCo issue: [https://pagure.io/fesco/issue/2408 #2408]
Bugzilla states meaning as usual:
* Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1850218 #1850218]
ASSIGNED -> accepted by FESCo with on going development
* Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/515 #515]
MODIFIED -> change is substantially done and testable
ON_QA -> change is code completed and could be tested in the Beta release (optionally by QA)
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development
-->
* FESCo issue: <will be assigned by the Wrangler>
* Tracker bug: <will be assigned by the Wrangler>
* Release notes tracker: <will be assigned by the Wrangler>


== Detailed Description ==
== Detailed Description ==


<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
==== zram Basic function ====
 
The zram† device, typically <span style=color:brown>/dev/zram0</span>, has a size set at create time during early boot, by zram-generator† per its configuration file. The memory used is not preallocated. It's dynamically allocated and deallocated, on demand. Due to compression, a full <span style=color:brown>/dev/zram0</span> uses half as much memory as its size.
 
The <span style=color:brown>/dev/zram0</span> behaves like any other block device. It can be formatted with a file system, or mkswap, which is the intention with this change proposal.
 
The system will use RAM normally up until it's full, and then start paging out to swap-on-zram, same as a conventional swap-on-drive. The zram driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. But, there is no free lunch. This means swap-on-zram is not as effective at page eviction as swap-on-drive, the eviction rate is ~50% instead of 100%. But it is at least an order of magnitude faster than drive based swap.
 
zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, this overhead is the sole cost. In practice when not used at all, feature owner has experienced ~0.04% overhead.
 
Example: A system has 16 GiB RAM. The proposed defaults suggest the <span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the workload completely fills up swap with 4 GiB of anonymous pages, what's happened? The <span style=color:red>zramctl</span> command will display the true compression ratio. If 2:1 is really obtained, it means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the actual RAM usage, and is also the net effective eviction. i.e. 4 GiB anonymous pages are evicted, but are then compressed and pinned into 2 GiB RAM, for a net memory savings of 2 GiB.
 
†</br >
[https://www.kernel.org/doc/Documentation/blockdev/zram.txt kernel.org zram.txt]
 
[https://github.com/systemd/zram-generator Github zram-generator project]
 
 
==== Overview of the Feature ====
 
Using swap is a good idea†, but no one likes it when it's slow. Anaconda and Fedora IoT have been using swap-on-zram by default for years. This builds on their prior effort.
 
 
There are three components to the change:
 
# Install `zram-generator` package†. This does not enable swap-on-zram, it only makes the generator available.</br >
# Install `zram-generator-defaults` package, which provides a default configuration. When present, swap-on-zram is set-up during startup.</br >
# Do not create swap partition/LV with default installations.
 
This proposal aims to apply all three, for all Fedora editions and spins, by default.
 
It further aims to apply the first two, for upgrades and custom installations.
 
It might be useful to only make the generator available (1), should an edition/spin wish to opt out, or as a fallback if applying the feature to upgrades fails to withstand scrutiny.
 
†</br >
There is a tl;dr section at the top. Highly recommend reading the whole article. [https://chrisdown.name/2018/01/02/in-defence-of-swap.html In defence of swap: common misconceptions]
 
==== Default zram device configuration: ====
 
During startup, create a zram device <span style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but capped† to 4 GiB, and with a higher than typical swap priority†.
 
These values seem reasonably conservative, and are based on prior work in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no hibernation case, common outside x86. Fedora IoT's implementation also sets swap-on-zram size to 50% RAM.
 
† <strike>[https://github.com/systemd/zram-generator/issues/10 RFE: should be able to set a cap on zram device size #10]</strike> (DONE)<br>
<strike>[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]</strike> (DONE)
 
==== Default installer behavior  ====
 
The installer is currently responsible for creating a swap-on-drive device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-zram. This means hibernation isn't possible, even on systems that could support it.
 
Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md Supporting hibernation in Workstation edition] for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.
 
 
==== Custom/Advance partitioning installer behavior ====
 
The user can add swap using Custom partitioning at install time. This is swap-on-drive. And the installer will also include the <span style=color:red> resume=UUID </span> kernel parameter for this swap device. No change in behavior here.
 
Since swap-on-zram is still enabled by default, there will be two swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have higher priority, thus being favored over drive based swap. The kernel is smart enough to know it can't hibernate to a zram device, and will instead use drive based swap.
 
 
==== How can it be disabled? ====
 
Immediately:</br >
<span style=color:red>sudo systemctl stop swap-create@zram0</span>
 
Permanently:</br >
<span style=color:red>sudo touch /etc/systemd/zram-generator.conf</span>
or
<span style=color:red>sudo dnf remove zram-generator-defaults</span>


== Feedback ==
== Feedback ==
==== You're enabling it on upgrades? ====
That's the current plan. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, it's recognized that (a) ''hey pal, you're messing with my customizations, not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z' in the name!'' may need more convincing.
There are possible risks.
* Workloads that expect full use of memory, and depend on 100% page eviction. These may run slower if they really need full use of memory, but some memory is used for the zram device instead. Such workloads might favor zswap.
* Workloads with low compressible pages. In the worst case, this means unnecessary work merely moving pages around.
* Workloads with memory full, and hibernation. Hibernation is already stressful to memory-management subsystem and prone to bailing out in such cases. The swap-on-zram will be favored for evictions in the attempt to free memory to create the hibernation image. It could increase instances of hibernation entry failure. This isn't a crash, it just means the attempt doesn't succeed, and the system resumes operation instead of hibernating.
While possible, it's difficult to estimate their probability. But this is a significant consideration in the conservative default zram size. Users can easily increase zram size as needed for their use case, simply by editing <span style=color:red>/etc/systemd/zram-generator.conf</span> and the change takes effect at next boot.
==== Why systemd zram-generator? ====
It's the most upstream implementation to date, is fast and lightweight. The zram-generator uses existing systemd infrastructure to setup the zram block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator†.
Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.†
Also, it's an reference implementation of a system generator written in Rust.
†</br >
[https://www.freedesktop.org/software/systemd/man/systemd.generator.html freedesktop.org About systemd generators.]</br >
[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ devel@ ''Re: swap-on-zram by default'' Zbigniew Jędrzejewski-Szmek, systemd zram-generator author/maintainer]
==== Why not a bigger zram device? ====
The main idea of being conservative is to address concerns about upgrades. It's possible some workloads will have less compressible data. Hence, not going with <span style=color:brown>/dev/zram0</span> sized to 100% of RAM at this time. Even a <span style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-drive, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the zram device size.
Conversely it's possible to be too conservative with the size, and result in more instances of OOM kill. If applying the feature to upgrades is rejected, it's probably reasonable to increase the cap to ~8GiB. Of course more feedback and testing is needed, and it will be taken into consideration.
Note that the kernel zram doc says an excessively sized zram device does come with overhead. Users's can increase the size easily post-install, a capability they don't easily have with swap-on-drive. The goal for Fedora 33 is a default that's useful and safe for the vast majority of use cases.
==== Why not zswap? ====
Zswap† is a similar idea, speed up swapping, but with a different implementation. It needs disk based swap, and uses a compressed memory cache to hold onto recently used pages, where less recently used pages are evicted to to swap.
Swap-on-zram depends only on volatile storage. This is simpler and more secure. Whereas zswap eviction of pages into swap-on-drive can leak user data. Some workloads may do better with zswap, and it's a valid future feature for this generator. One idea is that the generator could favor setting up zswap when swap-on-drive already exists; and fallback to swap-on-zram?
===== What if I'm already using zswap? =====
Feature owner recommends disabling the swaponzram feature, found in this proposal. [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/LORCBGD4I67LRFM3FCFYUHFQZJTR5K6A/ More information on devel@ reponse.]
†</br >
[https://www.kernel.org/doc/Documentation/vm/zswap.txt kernel.org zswap.txt]


<!-- Summarize the feedback from the community and address why you chose not to accept proposed alternatives. This section is optional for all change proposals, but is strongly suggested. Incorporating feedback here as it is raised gives FESCo a clearer view of your proposal and leaves a good record for the future. If you get no feedback, that is useful to note in this section as well. For innovative or possibly controversial ideas, consider collecting feedback before you file the change proposal. -->


== Benefit to Fedora ==
== Benefit to Fedora ==


<!-- What is the benefit to the distribution?  Will the software we generate be improved? How will the process of creating Fedora releases be improved?
* significantly improves system responsiveness, especially when swap is under pressure;
 
* more secure, user data leaks into swap are on volatile media;
      Be sure to include the following areas if relevant:
* without swap-on-drive, there's better utilization of a limited resource: benefit of swap without the drive space consumption;
      If this is a major capability update, what has changed?
* complements on-going resource control work, including earlyoom;
          For example: This change introduces Python 5 that runs without the Global Interpreter Lock and is fully multithreaded.
* further reduces the time to out-of-memory kill, when workloads exceed limits;
      If this is a new functionality, what capabilities does it bring?
* improves performance for both "no swap" and "existing swap" setups;
          For example: This change allows package upgrades to be performed automatically and rolled-back at will.
      Does this improve some specific package or set of packages?
          For example: This change modifies a package to use a different language stack that reduces install size by removing dependencies.
      Does this improve specific Spins or Editions?
          For example: This change modifies the default install of Fedora Workstation to be more in line with the base install of Fedora Server.
      Does this make the distribution more efficient?
          For example: This change replaces thousands of individual %post scriptlets in packages with one script that runs at the end.
      Is this an improvement to maintainer processes?
          For example: Gating Fedora packages on automatic QA tests will make rawhide more stable and allow changes to be implemented more smoothly.
      Is this an improvement targeted as specific contributors?
          For example: Ensuring that a minimal set of tools required for contribution to Fedora are installed by default eases the onboarding of new contributors.


    When a Change has multiple benefits, it's better to list them all.


    Consider these Change pages from previous editions as inspiration:
    https://fedoraproject.org/wiki/Changes/Annobin (low-level and technical, invisible to users)
    https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo (low-level, but visible to advanced users)
    https://fedoraproject.org/wiki/Changes/VirtualBox_Guest_Integration (primarily a UX change)
    https://fedoraproject.org/wiki/Changes/NoMoreAlpha (an improvement to distro processes)
    https://fedoraproject.org/wiki/Changes/perl5.26 (major upgrade to a popular software stack, visible to users of that stack)
-->


== Scope ==
== Scope ==
* Proposal owners:
<!-- What work do the feature owners have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->


* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Proposal owners:
<!-- What work do other developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
** add zram-generator package to comps and kickstarts as appropriate
** obsolete zram package (used by Fedora IoT)
** means of per edition/spin configurations, if needed
** test day, see https://pagure.io/fedora-qa/issue/632
 
* Other developers:
**Anaconda are agreeable to deprecating their built-in implementation in favor of swap-on-zram
**RFE's for zram-generator: users are not worse off if they don't happen. Open request for help, to make it possible. It's much appreciated.</br >
<strike>[https://github.com/systemd/zram-generator/issues/10 RFE: should be able to set a cap on zram device size #10]</strike> (DONE)</br >
<strike>[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]</strike> (DONE)


* Release engineering: [https://pagure.io/releng/issues #Releng issue number] (a check of an impact with Release Engineering is needed) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Release engineering: [https://pagure.io/releng/issues/9495 #9495]
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)?  Is a mass rebuild required?  include a link to the releng issue.
The issue is required to be filed prior to feature submission, to ensure that someone is on board to do any process development work and testing, and that all changes make it into the pipeline; a bullet point in a change is not sufficient communication -->


* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Policies and guidelines: N/A
<!-- Do the packaging guidelines or other documents need to be updated for this feature?  If so, does it need to happen before or after the implementation is done?  If a FPC ticket exists, add a link here. -->


* Trademark approval: N/A (not needed for this Change)
* Trademark approval: N/A
<!-- If your Change may require trademark approval (for example, if it is a new Spin), file a ticket ( https://fedorahosted.org/council/ ) requesting trademark approval from the Fedora Council. This approval will be done via the Council's consensus-based process. -->


== Upgrade/compatibility impact ==
== Upgrade/compatibility impact ==
<!-- What happens to systems that have had a previous versions of Fedora installed and are updated to the version containing this change? Will anything require manual configuration or data migration? Will any existing functionality be no longer supported? -->


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
Add <code>Obsoletes: zram < 0.4-2</code> to <span style=color:blue>zram-generator-defaults</span>. This means only systems that have <span style=color:blue>zram</span>, will get <span style=color:blue>zram-generator-defaults</span>. And it means they will have swap-on-zram enabled post-upgrade, whether or not it was previously enabled.
N/A (not a System Wide Change)  
 
Fedora Workstation has included <span style=color:blue>zram</span> since July 2019 (Fedora 31) by default. Any clean installed systems from that point will automatically be upgraded to this feature.
 
Fedora IoT has included <span style=color:blue>zram</span> from the beginning. All systems will automatically get this feature upon upgrade.
 


== How To Test ==
== How To Test ==
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this change implementation is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.


Remember that you are writing this how to for interested testers to use to check out your change implementation - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your change.
Any hardware. Any version of Fedora.


A good "how to test" should answer these four questions:
# dnf install zram-generator zram-generator-defaults
# Reboot
# Check that swap is on a zram device: zramctl, swapon
# Detailed check: journalctl -b -o short-monotonic --grep 'swap|zram'
# Check that priority is higher than existing swap if two or more are listed.


0. What special hardware / data / etc. is needed (if any)?
Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-drive are expected to continue to work too (or at least hit all the same bugs as without zram being used).
1. How do I prepare my system to test this change? What packages
 
need to be installed, config files edited, etc.?
Also, you can see the actual compression ratio achieved with the following command:</br >
2. What specific actions do I perform to check that the change is
<span style=color:red> zramctl </span>
working like it's supposed to?
 
3. What are the expected results of those actions?
 
-->
==== Test Day ====


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
[https://fedoraproject.org/wiki/Test_Day:F33_SwapOnZRAM QA: SwapOnzram Test Day] to discover edge cases, and tweak the default configuration if necessary to establish a good one-size-fits all approach.
N/A (not a System Wide Change)


== User Experience ==
== User Experience ==
<!-- If this change proposal is noticeable by users, how will their experiences change as a result?


This section partially overlaps with the Benefit to Fedora section above. This section should be primarily about the User Experience, written in a way that does not assume deep technical knowledge. More detailed technical description should be left for the Benefit to Fedora section.
The user won't notice anything displeasing. If their usual workload causes them to dread swap thrashing, they'll be surprised that thrashing doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or if they 'swapon' and see swap pointing to <span style=color:brown>/dev/zram0</span> instead of a drive partition or LV.


Describe what Users will see or notice, for example:
  - Packages are compressed more efficiently, making downloads and upgrades faster by 10%.
  - Kerberos tickets can be renewed automatically. Users will now have to authenticate less and become more productive. Credential management improvements mean a user can start their work day with a single sign on and not have to pause for reauthentication during their entire day.
- Libreoffice is one of the most commonly installed applications on Fedora and it is now available by default to help users "hit the ground running".
- Green has been scientifically proven to be the most relaxing color. The move to a default background color of green with green text will result in Fedora users being the most relaxed users of any operating system.
-->


== Dependencies ==
== Dependencies ==
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this change depends?  In other words, completion of another change owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel change)? -->


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A
N/A (not a System Wide Change)
 


== Contingency Plan ==
== Contingency Plan ==


<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "Revert the shipped configuration".  Or it might not (e.g. rebuilding a number of dependent packages).  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy.  -->
* Contingency mechanism: Don't ship the generator = big hammer, but easy. Preferable to ship the generator, but only selectively ship configuration files = scalpel, pretty easy.
* Contingency mechanism: (What to do?  Who will do it?) N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Contingency deadline: Beta freeze
<!-- When is the last time the contingency mechanism can be put in place?  This will typically be the beta freeze. -->
* Blocks release? No.
* Contingency deadline: N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks product? No.
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
 
* Blocks release? N/A (not a System Wide Change), Yes/No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks product? product <!-- Applicable for Changes that blocks specific product release/Fedora.next -->


== Documentation ==
<code>man 8 zram-generator</code><br />
<!-- Is there upstream documentation on this change, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
<code>man 5 zram-generator.conf</code>


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
(Check out the ASCII art!)
N/A (not a System Wide Change)  


== Release Notes ==
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this change, indicate them here.  A link to upstream documentation will often satisfy this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release.


Release Notes are not required for initial draft of the Change Proposal but has to be completed by the Change Freeze.  
A swap partition is not created by default at installation time. Instead, a zram device is created, and swap enabled on it during start-up. zram is a RAM drive that uses compression. See `man zram-generator` for a brief overview of its function. The swap-on-zram feature can be disabled with `sudo touch /etc/systemd/zram-generator.conf` and reenabled by removing this file, and customized by editing it. See `man zram-generator.conf` for configuration information, including a description of the default configuration plus ASCII art.
-->

Latest revision as of 02:59, 13 October 2020

swap on zram

Summary

Swap is useful, except when it's slow. zram is a RAM drive that uses compression. Create a swap-on-zram during start-up. And no longer use swap partitions by default.


Owner


Current status

Detailed Description

zram Basic function

The zram† device, typically /dev/zram0, has a size set at create time during early boot, by zram-generator† per its configuration file. The memory used is not preallocated. It's dynamically allocated and deallocated, on demand. Due to compression, a full /dev/zram0 uses half as much memory as its size.

The /dev/zram0 behaves like any other block device. It can be formatted with a file system, or mkswap, which is the intention with this change proposal.

The system will use RAM normally up until it's full, and then start paging out to swap-on-zram, same as a conventional swap-on-drive. The zram driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. But, there is no free lunch. This means swap-on-zram is not as effective at page eviction as swap-on-drive, the eviction rate is ~50% instead of 100%. But it is at least an order of magnitude faster than drive based swap.

zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, this overhead is the sole cost. In practice when not used at all, feature owner has experienced ~0.04% overhead.

Example: A system has 16 GiB RAM. The proposed defaults suggest the /dev/zram0 device will be 4 GiB. If the workload completely fills up swap with 4 GiB of anonymous pages, what's happened? The zramctl command will display the true compression ratio. If 2:1 is really obtained, it means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the actual RAM usage, and is also the net effective eviction. i.e. 4 GiB anonymous pages are evicted, but are then compressed and pinned into 2 GiB RAM, for a net memory savings of 2 GiB.


kernel.org zram.txt

Github zram-generator project


Overview of the Feature

Using swap is a good idea†, but no one likes it when it's slow. Anaconda and Fedora IoT have been using swap-on-zram by default for years. This builds on their prior effort.


There are three components to the change:

  1. Install zram-generator package†. This does not enable swap-on-zram, it only makes the generator available.
  2. Install zram-generator-defaults package, which provides a default configuration. When present, swap-on-zram is set-up during startup.
  3. Do not create swap partition/LV with default installations.

This proposal aims to apply all three, for all Fedora editions and spins, by default.

It further aims to apply the first two, for upgrades and custom installations.

It might be useful to only make the generator available (1), should an edition/spin wish to opt out, or as a fallback if applying the feature to upgrades fails to withstand scrutiny.


There is a tl;dr section at the top. Highly recommend reading the whole article. In defence of swap: common misconceptions

Default zram device configuration:

During startup, create a zram device /dev/zram0, with a size equal to 50% RAM, but capped† to 4 GiB, and with a higher than typical swap priority†.

These values seem reasonably conservative, and are based on prior work in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no hibernation case, common outside x86. Fedora IoT's implementation also sets swap-on-zram size to 50% RAM.

RFE: should be able to set a cap on zram device size #10 (DONE)
RFE: should set priority #8 (DONE)

Default installer behavior

The installer is currently responsible for creating a swap-on-drive device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-zram. This means hibernation isn't possible, even on systems that could support it.

Please see Supporting hibernation in Workstation edition for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.


Custom/Advance partitioning installer behavior

The user can add swap using Custom partitioning at install time. This is swap-on-drive. And the installer will also include the resume=UUID kernel parameter for this swap device. No change in behavior here.

Since swap-on-zram is still enabled by default, there will be two swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have higher priority, thus being favored over drive based swap. The kernel is smart enough to know it can't hibernate to a zram device, and will instead use drive based swap.


How can it be disabled?

Immediately:
sudo systemctl stop swap-create@zram0

Permanently:
sudo touch /etc/systemd/zram-generator.conf or sudo dnf remove zram-generator-defaults

Feedback

You're enabling it on upgrades?

That's the current plan. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, it's recognized that (a) hey pal, you're messing with my customizations, not cool! and (b) swap always stinks, I don't care if it has a 'Z' in the name! may need more convincing.

There are possible risks.

  • Workloads that expect full use of memory, and depend on 100% page eviction. These may run slower if they really need full use of memory, but some memory is used for the zram device instead. Such workloads might favor zswap.
  • Workloads with low compressible pages. In the worst case, this means unnecessary work merely moving pages around.
  • Workloads with memory full, and hibernation. Hibernation is already stressful to memory-management subsystem and prone to bailing out in such cases. The swap-on-zram will be favored for evictions in the attempt to free memory to create the hibernation image. It could increase instances of hibernation entry failure. This isn't a crash, it just means the attempt doesn't succeed, and the system resumes operation instead of hibernating.

While possible, it's difficult to estimate their probability. But this is a significant consideration in the conservative default zram size. Users can easily increase zram size as needed for their use case, simply by editing /etc/systemd/zram-generator.conf and the change takes effect at next boot.

Why systemd zram-generator?

It's the most upstream implementation to date, is fast and lightweight. The zram-generator uses existing systemd infrastructure to setup the zram block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator†.

Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.†

Also, it's an reference implementation of a system generator written in Rust.


freedesktop.org About systemd generators.
devel@ Re: swap-on-zram by default Zbigniew Jędrzejewski-Szmek, systemd zram-generator author/maintainer


Why not a bigger zram device?

The main idea of being conservative is to address concerns about upgrades. It's possible some workloads will have less compressible data. Hence, not going with /dev/zram0 sized to 100% of RAM at this time. Even a /dev/zram0 of 200% RAM is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-drive, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the zram device size.

Conversely it's possible to be too conservative with the size, and result in more instances of OOM kill. If applying the feature to upgrades is rejected, it's probably reasonable to increase the cap to ~8GiB. Of course more feedback and testing is needed, and it will be taken into consideration.

Note that the kernel zram doc says an excessively sized zram device does come with overhead. Users's can increase the size easily post-install, a capability they don't easily have with swap-on-drive. The goal for Fedora 33 is a default that's useful and safe for the vast majority of use cases.


Why not zswap?

Zswap† is a similar idea, speed up swapping, but with a different implementation. It needs disk based swap, and uses a compressed memory cache to hold onto recently used pages, where less recently used pages are evicted to to swap.

Swap-on-zram depends only on volatile storage. This is simpler and more secure. Whereas zswap eviction of pages into swap-on-drive can leak user data. Some workloads may do better with zswap, and it's a valid future feature for this generator. One idea is that the generator could favor setting up zswap when swap-on-drive already exists; and fallback to swap-on-zram?

What if I'm already using zswap?

Feature owner recommends disabling the swaponzram feature, found in this proposal. More information on devel@ reponse.


kernel.org zswap.txt


Benefit to Fedora

  • significantly improves system responsiveness, especially when swap is under pressure;
  • more secure, user data leaks into swap are on volatile media;
  • without swap-on-drive, there's better utilization of a limited resource: benefit of swap without the drive space consumption;
  • complements on-going resource control work, including earlyoom;
  • further reduces the time to out-of-memory kill, when workloads exceed limits;
  • improves performance for both "no swap" and "existing swap" setups;


Scope

  • Proposal owners:
    • add zram-generator package to comps and kickstarts as appropriate
    • obsolete zram package (used by Fedora IoT)
    • means of per edition/spin configurations, if needed
    • test day, see https://pagure.io/fedora-qa/issue/632
  • Other developers:
    • Anaconda are agreeable to deprecating their built-in implementation in favor of swap-on-zram
    • RFE's for zram-generator: users are not worse off if they don't happen. Open request for help, to make it possible. It's much appreciated.

RFE: should be able to set a cap on zram device size #10 (DONE)
RFE: should set priority #8 (DONE)

  • Release engineering: #9495
  • Policies and guidelines: N/A
  • Trademark approval: N/A

Upgrade/compatibility impact

Add Obsoletes: zram < 0.4-2 to zram-generator-defaults. This means only systems that have zram, will get zram-generator-defaults. And it means they will have swap-on-zram enabled post-upgrade, whether or not it was previously enabled.

Fedora Workstation has included zram since July 2019 (Fedora 31) by default. Any clean installed systems from that point will automatically be upgraded to this feature.

Fedora IoT has included zram from the beginning. All systems will automatically get this feature upon upgrade.


How To Test

Any hardware. Any version of Fedora.

  1. dnf install zram-generator zram-generator-defaults
  2. Reboot
  3. Check that swap is on a zram device: zramctl, swapon
  4. Detailed check: journalctl -b -o short-monotonic --grep 'swap|zram'
  5. Check that priority is higher than existing swap if two or more are listed.

Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-drive are expected to continue to work too (or at least hit all the same bugs as without zram being used).

Also, you can see the actual compression ratio achieved with the following command:
zramctl


Test Day

QA: SwapOnzram Test Day to discover edge cases, and tweak the default configuration if necessary to establish a good one-size-fits all approach.

User Experience

The user won't notice anything displeasing. If their usual workload causes them to dread swap thrashing, they'll be surprised that thrashing doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or if they 'swapon' and see swap pointing to /dev/zram0 instead of a drive partition or LV.


Dependencies

N/A


Contingency Plan

  • Contingency mechanism: Don't ship the generator = big hammer, but easy. Preferable to ship the generator, but only selectively ship configuration files = scalpel, pretty easy.
  • Contingency deadline: Beta freeze
  • Blocks release? No.
  • Blocks product? No.


man 8 zram-generator
man 5 zram-generator.conf

(Check out the ASCII art!)

Release Notes

A swap partition is not created by default at installation time. Instead, a zram device is created, and swap enabled on it during start-up. zram is a RAM drive that uses compression. See man zram-generator for a brief overview of its function. The swap-on-zram feature can be disabled with sudo touch /etc/systemd/zram-generator.conf and reenabled by removing this file, and customized by editing it. See man zram-generator.conf for configuration information, including a description of the default configuration plus ASCII art.