From Fedora Project Wiki

Enable EarlyOOM killing

Summary

Install earlyoom package, and enable it by default. This will cause the kernel oomkiller to trigger sooner, but will not affect which process it chooses to kill off. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.


Owner

Current status

  • Targeted release: Fedora 32
  • Last updated: 2020-01-02
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

Fedora editions and spins, enables the in-kernel OOM (out-of-memory) manager. The manager's concern is to keep the kernel itself functioning, it has no concern about user space function or interactivity. This change attempts to improve the user experience, in the short term, by triggering the in-kernel process killing mechanism, sooner. Instead of the system becoming completely unresponsive for tens of minutes, hours or days, the expectation is an offending process (determined by oom_score, same as now) will be killed of within seconds to minutes. This is an incremental improvement in user experience, but admittedly still suboptimal. There is additional work on-going to improve the user experience further.

i.e. the following preset will be modified
https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset
# enable earlyoom by default on workstation
enable earlyoom.service


Background information on this complicated problem:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
https://lwn.net/Articles/317814/

Recent discussion:
https://pagure.io/fedora-workstation/issue/98
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/XUZLHJ5O32OX24LG44R7UZ2TMN6NY47N/

Other in-progress solutions:
https://gitlab.freedesktop.org/hadess/low-memory-monitor

Benefit to Fedora

There are two major benefits to Fedora:

  • improved user experience by more quickly regaining control over one's system, rather than having to force power off in low-memory situations where there's aggressive swapping. Once a system becomes unresponsive, it's completely reasonable for the user to assume the system is lost, but that includes high potential for data loss.
  • reducing forced poweroff as the main work around will increase data collection, improving understanding of low memory situations and how to handle them better


Scope

  • Proposal owners:

Include earlyoom package and enabled it by default, both for clean installs and upgrades.

  • Other developers:

Desktop spins may choose to opt-out. Server, Cloud, IoT may choose to opt-in.

  • Release engineering: #9141 (a check of an impact with Release Engineering is needed)
  • Policies and guidelines: N/A
  • Trademark approval: N/A

Upgrade/compatibility impact

fc30/fc31->fc32 upgrades will also have this service enabled


How To Test

Fedora 30/31 users can test today, any edition or spin: sudo dnf install earlyoom
sudo systemctl enable earlyoom

And then attempt to cause an out of memory situation. Extreme example by building webkitgtk, if your system has few CPUs or a lot of RAM, you may need to sabotage it by specifying an unreasonable number of jobs with -j flag.
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/XUZLHJ5O32OX24LG44R7UZ2TMN6NY47N/

Fedora Workstation 32 (and Rawhide) users will see this service is already enabled, and can experiment with it enabled and disabled without rebooting.

User Experience

The most egregious instances this change is trying to mitigate: a. RAM is completely used b. Swap is completely used c. System becomes unresponsive to the user as swap thrashing has ensued --> earlyoom disabled, the user often gives up and forces power off (in my own testing this condition lasts >30 minutes with no kernel triggered oom killer and no recovery) --> earlyoom enabled, the system likely still becomes unresponsive but oom killer is triggered in much less time (seconds or a few minutes, in my testing, after less than 10% RAM and 10% swap is remaining)

earlyoom starts sending SIGTERM once both memory and swap are below their respective PERCENT setting, default 10%. It sends SIGKILL once both are below their respective KILL_PERCENT setting, default 2%.

The package includes configuration file /etc/default/earlyoom which sets option -r 60 causing a memory report to be entered into the journal every minute.


Dependencies

earlyoom package has no dependencies

Contingency Plan

  • Contingency mechanism: Owner will revert all changes
  • Contingency deadline: Final freeze
  • Blocks release? No
  • Blocks product? No

Documentation

man earlyoom

https://www.kernel.org/doc/gorman/html/understand/understand016.html

Release Notes

Earlyoom service is enabled by default, which will cause kernel oom-killer to trigger sooner. To revert to previous behavior, sudo systemctl disable earlyoom.service, and to customize see man earlyoom.