From Fedora Project Wiki
No edit summary
No edit summary
Line 1: Line 1:
= swap on ZRAM =
= swap on zram =


== Summary ==
== Summary ==


Swap is useful† except when it's slow. ZRAM† is a RAM disk that uses compression. Its size is assigned at create time, but the memory used is dynamically allocated and deallocated, on demand. The ZRAM device (/dev/zram0) behaves like any other block device. It can be formatted with a file system or mkswap, which is the intention with this change proposal.
Swap is useful† except when it's slow. zram† is a RAM drive that uses compression. Its size is assigned at create time, but the memory used is dynamically allocated and deallocated, on demand. The zram device <span style=color:brown>/dev/zram0</span> behaves like any other block device. It can be formatted with a file system or mkswap, which is the intention with this change proposal.


There are three components to the change:
There are three components to the change:


# Install systemd rust-zram-generator† package. This does not enable swap-on-ZRAM, it only makes the generator available.</br >
# Install systemd rust-zram-generator† package. This does not enable swap-on-zram, it only makes the generator available.</br >
# Install a default zram-generator configuration. When present, swap-on-ZRAM is set-up during startup.</br >
# Install a default zram-generator configuration. When present, swap-on-zram is set-up during startup.</br >
# Do not create swap partition/LV with default installations. This does not apply to upgrades or Custom partitioning.
# Do not create swap partition/LV with default installations. This does not apply to upgrades or Custom partitioning.


Line 15: Line 15:
(1) only = generator present, user can enable by creating a configuration file. Not recommended, but logically valid to ship only the generator, expecting local configuration to enable it. e.g. Fedora CoreOS.
(1) only = generator present, user can enable by creating a configuration file. Not recommended, but logically valid to ship only the generator, expecting local configuration to enable it. e.g. Fedora CoreOS.


(1) + (2) = swap-on-ZRAM is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-ZRAM is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met. Upgrades and custom installations that also create a swap-on-disk partition fit here.
(1) + (2) = swap-on-zram is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-zram is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met. Upgrades and custom installations that also create a swap-on-drive partition fit here.


(1) + (2) + (3) = swap-on-ZRAM is enabled, no disk-based swap present. All Fedora editions and spins that use Anaconda, with the  for default automatic partitioning path.
(1) + (2) + (3) = swap-on-zram is enabled, no drive-based swap present. All Fedora editions and spins that use Anaconda, with the  for default automatic partitioning path.


NOTE: Anaconda and Fedora IoT have been using swap-on-ZRAM by default for years. This builds on that prior effort.
NOTE: Anaconda and Fedora IoT have been using swap-on-zram by default for years. This builds on that prior effort.


†</br >
†</br >
Line 58: Line 58:
==== Basic function ====
==== Basic function ====


The system will use RAM normally up until it's full, and then start paging out to swap-on-ZRAM, same as a conventional swap-on-disk. The ZRAM driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. But, there is no free lunch. This means swap-on-zram is not as effective at page eviction as swap-on-disk, the eviction rate is ~50% instead of 100%. But it is orders of magnitude faster than disk based swap.
The system will use RAM normally up until it's full, and then start paging out to swap-on-zram, same as a conventional swap-on-drive. The zram driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. But, there is no free lunch. This means swap-on-zram is not as effective at page eviction as swap-on-drive, the eviction rate is ~50% instead of 100%. But it is orders of magnitude faster than drive based swap.


ZRAM has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, this overhead is the sole cost. There is no preallocation of RAM for the ZRAM device. In practice when not used at all, feature owner has experienced ~0.04% overhead.
zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, this overhead is the sole cost. There is no preallocation of RAM for the zram device. In practice when not used at all, feature owner has experienced ~0.04% overhead.


Example: A system has 16 GiB RAM. The proposed defaults suggest the /dev/zram0 device will be 4 GiB. If the workload completely fills up swap with 4 GiB of anonymous pages, what's happened? The <span style=color:red>zramctl</span> command will display the true compression ratio. If 2:1 is really obtained, it means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the actual RAM usage, and is also the net effective eviction. i.e. 4 GiB anonymous pages are evicted, but are then compressed and pinned into 2 GiB RAM, for a net memory savings of 2 GiB.
Example: A system has 16 GiB RAM. The proposed defaults suggest the <span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the workload completely fills up swap with 4 GiB of anonymous pages, what's happened? The <span style=color:red>zramctl</span> command will display the true compression ratio. If 2:1 is really obtained, it means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the actual RAM usage, and is also the net effective eviction. i.e. 4 GiB anonymous pages are evicted, but are then compressed and pinned into 2 GiB RAM, for a net memory savings of 2 GiB.




==== Default ZRAM device configuration: ====
==== Default zram device configuration: ====


During startup, create a ZRAM device <span style=color:blue>/dev/zram0</span>, with a size equal to 50% RAM, but capped† to 4 GiB, and with a higher than typical swap priority†.
During startup, create a zram device <span style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but capped† to 4 GiB, and with a higher than typical swap priority†.


These values seem reasonably conservative, and are based on prior work in Fedora. Anaconda sets swap-on-disk sized to 50% RAM in the no hibernation case, common outside x86. Fedora IoT's implementation also sets swap-on-ZRAM size to 50% RAM.
These values seem reasonably conservative, and are based on prior work in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no hibernation case, common outside x86. Fedora IoT's implementation also sets swap-on-zram size to 50% RAM.


†</br >
†</br >
Line 79: Line 79:
==== Default installer behavior  ====
==== Default installer behavior  ====


The installer is currently responsible for creating a swap-on-disk device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-ZRAM. This means hibernation isn't possible, even on systems that could support it.
The installer is currently responsible for creating a swap-on-drive device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-zram. This means hibernation isn't possible, even on systems that could support it.


Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md Supporting hibernation in Workstation edition] for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.
Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md Supporting hibernation in Workstation edition] for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.
Line 86: Line 86:
==== Custom/Advance partitioning installer behavior ====
==== Custom/Advance partitioning installer behavior ====


The user can add swap using Custom partitioning at install time. This is swap-on-disk. And the installer will also include the <span style=color:red> resume=UUID </span> kernel parameter for this swap device. No change in behavior here.
The user can add swap using Custom partitioning at install time. This is swap-on-drive. And the installer will also include the <span style=color:red> resume=UUID </span> kernel parameter for this swap device. No change in behavior here.


Since swap-on-ZRAM is still enabled by default, there will be two swaps: swap-on-ZRAM, and swap-on-disk. The swap-on-ZRAM will have higher priority, thus being favored over disk based swap. The kernel is smart enough to know it can't hibernate to a ZRAM device, and will instead use disk based swap.
Since swap-on-zram is still enabled by default, there will be two swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have higher priority, thus being favored over drive based swap. The kernel is smart enough to know it can't hibernate to a zram device, and will instead use drive based swap.




Line 104: Line 104:
==== You're enabling it on upgrades? ====
==== You're enabling it on upgrades? ====


That's the current plan. There are some difficulties with upgrades right now in Fedora. We need to use weak dependency 'Supplements:' to cause new packages to be dragged in on upgrades. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, recognized sentiments that (a) ''hey pal, you're messing with my customizations, not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z' in the name!'' may need more convincing.
That's the current plan. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, recognized that (a) ''hey pal, you're messing with my customizations, not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z' in the name!'' may need more convincing.


The dilemma is, the Fedora user base becomes fragmented without applying it to upgrades. The overall experience people are having is less consistent, and makes feedback inconsistent. All of this has to be balanced out.
If the workload cannot be compressed at all, i.e. the data is functionally random, the result is some unnecessary work. Evictions just move pages from one part of RAM to another, and there's no savings. A workload optimized for and dependent on use of all memory, plus significant swap, would be deprived of some RAM and that could make it run slower. But it's not likely to run out of memory, once the zram device is full, the swap-on-drive will be used. Hopefully we run into such cases during the test day. If we don't, we have to balance out the benefit for most users, compared to release notes being overlooked by users who have such workloads.




==== Why systemd zram-generator? ====
==== Why systemd zram-generator? ====


It's the most upstream implementation to date, is fast and lightweight. The zram-generator uses existing systemd infrastructure to setup the ZRAM block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator†.
It's the most upstream implementation to date, is fast and lightweight. The zram-generator uses existing systemd infrastructure to setup the zram block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator†.


Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.†
Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.†
Line 119: Line 119:
†</br >
†</br >
[https://www.freedesktop.org/software/systemd/man/systemd.generator.html freedesktop.org About systemd generators.]</br >
[https://www.freedesktop.org/software/systemd/man/systemd.generator.html freedesktop.org About systemd generators.]</br >
[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ devel@ ''Re: swap-on-ZRAM by default'' Zbigniew Jędrzejewski-Szmek, systemd zram-generator author/maintainer]
[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ devel@ ''Re: swap-on-zram by default'' Zbigniew Jędrzejewski-Szmek, systemd zram-generator author/maintainer]




==== Why not a bigger ZRAM device? ====
==== Why not a bigger zram device? ====


It's possible some workloads will have less compressible data. Hence, not going with /dev/zram0 sized to 100% of RAM. Even a /dev/zram0 of 200% RAM is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-disk, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the ZRAM device size.
The main idea of being conservative is to address concerns about upgrades. It's possible some workloads will have less compressible data. Hence, not going with <span style=color:brown>/dev/zram0</span> sized to 100% of RAM at this time. Even a <span style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-drive, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the zram device size.


A cap of 4GiB might be too conservative, and we'll take feedback into consideration. Note that the kernel zram doc says an excessively sized ZRAM device does come with overhead. Users's can increase the size easily post-install, a capability they don't easily have with swap-on-disk. The goal for Fedora 33 is a default that's useful and safe for the vast majority of use cases.
Conversely it's possible to be too conservative with the size, and result in more instances of OOM kill. If applying the feature to upgrades is rejected, it's probably reasonable to increase the cap to ~8GiB. Of course more feedback and testing is needed, and it will be taken into consideration.
 
Note that the kernel zram doc says an excessively sized zram device does come with overhead. Users's can increase the size easily post-install, a capability they don't easily have with swap-on-drive. The goal for Fedora 33 is a default that's useful and safe for the vast majority of use cases.




==== Why not zswap? ====
==== Why not zswap? ====


Zswap† is a similar idea, but with a totally different implementation. It is swap specific, uses a RAM cache, and requires a conventional swap partition existing already. It might be true certain workloads are better suited for using zswap. But swap-on-ZRAM depends only on volatile storage. This is simpler and it's more secure. Whereas zswap "spills over" into swap-on-disk and will leak user data if that swap device isn't encrypted. Some workloads may do better with zswap, and it's a valid future feature for a new generator, or possibly extend zram-generator to support it via the configuration file. Maybe the generator could favor zswap when swap-on-disk already exists; and fallback to swap-on-ZRAM?
Zswap† is a similar idea, but with a totally different implementation. It is swap specific, uses a RAM cache, and requires a conventional swap partition existing already. It might be true certain workloads are better suited for using zswap. But swap-on-zram depends only on volatile storage. This is simpler and it's more secure. Whereas zswap "spills over" into swap-on-drive and will leak user data if that swap device isn't encrypted. Some workloads may do better with zswap, and it's a valid future feature for a new generator, or possibly extend zram-generator to support it via the configuration file. Maybe the generator could favor zswap when swap-on-drive already exists; and fallback to swap-on-zram?


†</br >
†</br >
Line 144: Line 146:
* further reduces the time to out-of-memory kill, when workloads exceed limits;
* further reduces the time to out-of-memory kill, when workloads exceed limits;
* improves performance for both "no swap" and "existing swap" setups;
* improves performance for both "no swap" and "existing swap" setups;
* without swap-on-disk, there's better utilization of a limited resource: benefit of swap without the disk space consumption;
* without swap-on-drive, there's better utilization of a limited resource: benefit of swap without the drive space consumption;




Line 155: Line 157:


* Other developers:
* Other developers:
**Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-ZRAM
**Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-zram
**RFE's for zram-generator: users are not worse off if they don't happen. Open request for help, to make it possible. It's much appreciated.</br >
**RFE's for zram-generator: users are not worse off if they don't happen. Open request for help, to make it possible. It's much appreciated.</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be able to set a cap on zram device size #10]</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be able to set a cap on zram device size #10]</br >
Line 171: Line 173:
Add Supplements:fedora-release-common to zram-generator to pull it in on upgrades.
Add Supplements:fedora-release-common to zram-generator to pull it in on upgrades.


Existing systems without swap will have swap-on-ZRAM enabled.
Existing systems without swap will have swap-on-zram enabled.


Existing systems with swap-on-disk, will also have swap-on-ZRAM enabled (two swap devices), with higher priority for the ZRAM device. Existing swap-on-disk will not be removed.
Existing systems with swap-on-drive, will also have swap-on-zram enabled (two swap devices), with higher priority for the zram device. Existing swap-on-drive will not be removed.


'zram' package, which contains zram-swap.service and associated bash scripts, will be obsoleted to avoid conflicting/competing swap-on-zram implementations.
'zram' package, which contains zram-swap.service and associated bash scripts, will be obsoleted to avoid conflicting/competing swap-on-zram implementations.
Line 186: Line 188:
# Edit the configuration
# Edit the configuration
# Reboot
# Reboot
# Check that swap is on a ZRAM device: zramctl, swapon
# Check that swap is on a zram device: zramctl, swapon
# Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
# Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
# Check that priority is higher than existing swap if two or more are listed. ## (Enhancement is needed for this.)
# Check that priority is higher than existing swap if two or more are listed. ## (Enhancement is needed for this.)
Line 195: Line 197:
<span style=color:red>zram-fraction = 0.5</span></br >
<span style=color:red>zram-fraction = 0.5</span></br >


Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-disk are expected to continue to work too (or at least hit all the same bugs as without ZRAM being used).
Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-drive are expected to continue to work too (or at least hit all the same bugs as without zram being used).


Also, you can see the actual compression ratio achieved with the following command:</br >
Also, you can see the actual compression ratio achieved with the following command:</br >
Line 203: Line 205:
==== Test Day ====
==== Test Day ====


[https://pagure.io/fedora-qa/issue/632 QA: SwapOnZRAM Test Day] to discover edge cases, and tweak the default configuration if necessary to establish a good one-size-fits all approach.
[https://pagure.io/fedora-qa/issue/632 QA: SwapOnzram Test Day] to discover edge cases, and tweak the default configuration if necessary to establish a good one-size-fits all approach.




== User Experience ==
== User Experience ==


The user won't notice anything displeasing. If their usual workload causes them to dread swap thrashing, they'll be surprised that thrashing doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or if they 'swapon' and see swap pointing to /dev/zram0 instead of a disk partition or LV.
The user won't notice anything displeasing. If their usual workload causes them to dread swap thrashing, they'll be surprised that thrashing doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or if they 'swapon' and see swap pointing to <span style=color:brown>/dev/zram0</span> instead of a drive partition or LV.





Revision as of 05:39, 4 June 2020

swap on zram

Summary

Swap is useful† except when it's slow. zram† is a RAM drive that uses compression. Its size is assigned at create time, but the memory used is dynamically allocated and deallocated, on demand. The zram device /dev/zram0 behaves like any other block device. It can be formatted with a file system or mkswap, which is the intention with this change proposal.

There are three components to the change:

  1. Install systemd rust-zram-generator† package. This does not enable swap-on-zram, it only makes the generator available.
  2. Install a default zram-generator configuration. When present, swap-on-zram is set-up during startup.
  3. Do not create swap partition/LV with default installations. This does not apply to upgrades or Custom partitioning.

The practical combinations of the above:

(1) only = generator present, user can enable by creating a configuration file. Not recommended, but logically valid to ship only the generator, expecting local configuration to enable it. e.g. Fedora CoreOS.

(1) + (2) = swap-on-zram is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-zram is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met. Upgrades and custom installations that also create a swap-on-drive partition fit here.

(1) + (2) + (3) = swap-on-zram is enabled, no drive-based swap present. All Fedora editions and spins that use Anaconda, with the for default automatic partitioning path.

NOTE: Anaconda and Fedora IoT have been using swap-on-zram by default for years. This builds on that prior effort.


There is a tl;dr section at the top. Highly recommend reading the whole article. In defence of swap: common misconceptions

kernel.org zram.txt

Github zram-generator project


Owner


Current status

  • Targeted release: Fedora 33
  • Last updated: 2020-06-04
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>


Detailed Description

Basic function

The system will use RAM normally up until it's full, and then start paging out to swap-on-zram, same as a conventional swap-on-drive. The zram driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. But, there is no free lunch. This means swap-on-zram is not as effective at page eviction as swap-on-drive, the eviction rate is ~50% instead of 100%. But it is orders of magnitude faster than drive based swap.

zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, this overhead is the sole cost. There is no preallocation of RAM for the zram device. In practice when not used at all, feature owner has experienced ~0.04% overhead.

Example: A system has 16 GiB RAM. The proposed defaults suggest the /dev/zram0 device will be 4 GiB. If the workload completely fills up swap with 4 GiB of anonymous pages, what's happened? The zramctl command will display the true compression ratio. If 2:1 is really obtained, it means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the actual RAM usage, and is also the net effective eviction. i.e. 4 GiB anonymous pages are evicted, but are then compressed and pinned into 2 GiB RAM, for a net memory savings of 2 GiB.


Default zram device configuration:

During startup, create a zram device /dev/zram0, with a size equal to 50% RAM, but capped† to 4 GiB, and with a higher than typical swap priority†.

These values seem reasonably conservative, and are based on prior work in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no hibernation case, common outside x86. Fedora IoT's implementation also sets swap-on-zram size to 50% RAM.


RFE: should be able to set a cap on zram device size #10

RFE: should set priority #8


Default installer behavior

The installer is currently responsible for creating a swap-on-drive device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-zram. This means hibernation isn't possible, even on systems that could support it.

Please see Supporting hibernation in Workstation edition for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.


Custom/Advance partitioning installer behavior

The user can add swap using Custom partitioning at install time. This is swap-on-drive. And the installer will also include the resume=UUID kernel parameter for this swap device. No change in behavior here.

Since swap-on-zram is still enabled by default, there will be two swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have higher priority, thus being favored over drive based swap. The kernel is smart enough to know it can't hibernate to a zram device, and will instead use drive based swap.


How can it be disabled?

Immediately:
swapoff /dev/zram0

Permanently:
rm /etc/systemd/zram-generator.conf


Feedback

You're enabling it on upgrades?

That's the current plan. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, recognized that (a) hey pal, you're messing with my customizations, not cool! and (b) swap always stinks, I don't care if it has a 'Z' in the name! may need more convincing.

If the workload cannot be compressed at all, i.e. the data is functionally random, the result is some unnecessary work. Evictions just move pages from one part of RAM to another, and there's no savings. A workload optimized for and dependent on use of all memory, plus significant swap, would be deprived of some RAM and that could make it run slower. But it's not likely to run out of memory, once the zram device is full, the swap-on-drive will be used. Hopefully we run into such cases during the test day. If we don't, we have to balance out the benefit for most users, compared to release notes being overlooked by users who have such workloads.


Why systemd zram-generator?

It's the most upstream implementation to date, is fast and lightweight. The zram-generator uses existing systemd infrastructure to setup the zram block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator†.

Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.†

Also, it's an reference implementation of a system generator written in Rust.


freedesktop.org About systemd generators.
devel@ Re: swap-on-zram by default Zbigniew Jędrzejewski-Szmek, systemd zram-generator author/maintainer


Why not a bigger zram device?

The main idea of being conservative is to address concerns about upgrades. It's possible some workloads will have less compressible data. Hence, not going with /dev/zram0 sized to 100% of RAM at this time. Even a /dev/zram0 of 200% RAM is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-drive, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the zram device size.

Conversely it's possible to be too conservative with the size, and result in more instances of OOM kill. If applying the feature to upgrades is rejected, it's probably reasonable to increase the cap to ~8GiB. Of course more feedback and testing is needed, and it will be taken into consideration.

Note that the kernel zram doc says an excessively sized zram device does come with overhead. Users's can increase the size easily post-install, a capability they don't easily have with swap-on-drive. The goal for Fedora 33 is a default that's useful and safe for the vast majority of use cases.


Why not zswap?

Zswap† is a similar idea, but with a totally different implementation. It is swap specific, uses a RAM cache, and requires a conventional swap partition existing already. It might be true certain workloads are better suited for using zswap. But swap-on-zram depends only on volatile storage. This is simpler and it's more secure. Whereas zswap "spills over" into swap-on-drive and will leak user data if that swap device isn't encrypted. Some workloads may do better with zswap, and it's a valid future feature for a new generator, or possibly extend zram-generator to support it via the configuration file. Maybe the generator could favor zswap when swap-on-drive already exists; and fallback to swap-on-zram?


kernel.org zswap.txt


Benefit to Fedora

  • significantly improves system responsiveness, especially when swap is under pressure;
  • more secure, user data leaks into swap are on volatile media;
  • complements on-going resource control work, including earlyoom;
  • further reduces the time to out-of-memory kill, when workloads exceed limits;
  • improves performance for both "no swap" and "existing swap" setups;
  • without swap-on-drive, there's better utilization of a limited resource: benefit of swap without the drive space consumption;


Scope

  • Proposal owners:
    • add zram-generator package to comps for the editions/spins opting in
    • means of per edition/spin configurations, if needed
    • coordinate a test day
  • Other developers:
    • Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-zram
    • RFE's for zram-generator: users are not worse off if they don't happen. Open request for help, to make it possible. It's much appreciated.

RFE: should be able to set a cap on zram device size #10
RFE: should set priority #8

  • Release engineering: #9495
  • Policies and guidelines: N/A
  • Trademark approval: N/A


Upgrade/compatibility impact

Add Supplements:fedora-release-common to zram-generator to pull it in on upgrades.

Existing systems without swap will have swap-on-zram enabled.

Existing systems with swap-on-drive, will also have swap-on-zram enabled (two swap devices), with higher priority for the zram device. Existing swap-on-drive will not be removed.

'zram' package, which contains zram-swap.service and associated bash scripts, will be obsoleted to avoid conflicting/competing swap-on-zram implementations.


How To Test

Any hardware. Any version of Fedora.

  1. dnf install zram-generator
  2. cp /usr/share/doc/zram-generator/zram-generator.conf.example /etc/systemd/zram-generator.conf
  3. Edit the configuration
  4. Reboot
  5. Check that swap is on a zram device: zramctl, swapon
  6. Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
  7. Check that priority is higher than existing swap if two or more are listed. ## (Enhancement is needed for this.)

Suggested configuration file values:
[zram0]
memory-limit = none
zram-fraction = 0.5

Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-drive are expected to continue to work too (or at least hit all the same bugs as without zram being used).

Also, you can see the actual compression ratio achieved with the following command:
zramctl


Test Day

QA: SwapOnzram Test Day to discover edge cases, and tweak the default configuration if necessary to establish a good one-size-fits all approach.


User Experience

The user won't notice anything displeasing. If their usual workload causes them to dread swap thrashing, they'll be surprised that thrashing doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or if they 'swapon' and see swap pointing to /dev/zram0 instead of a drive partition or LV.


Dependencies

N/A


Contingency Plan

  • Contingency mechanism: Don't ship the generator = big hammer, but easy. Preferable to ship the generator, but only selectively ship configuration files = scalpel, pretty easy.
  • Contingency deadline: Beta freeze
  • Blocks release? No.
  • Blocks product? No.


Documentation

Consider adding a hint in an /etc/fstab comment? There is no man page for this, and the documentation is also minimal, besides what's in this feature proposal. It's an open question how the user should get more information on how to configure and tweak it. But then, they don't have that for swap today either. There's just institutional knowledge.

Hence, a strong test day, with a lot of people and press coverage of the feature, might help spread the word for institutional knowledge changes coming.

Ideas welcome.


Release Notes

Pending feedback and test day.