From Fedora Project Wiki
Line 151: Line 151:


=== Speed impact ===
=== Speed impact ===
The presence or absence of the bytecode cache only impacts the speed of imports. It is most common that the imports happen while an application starts. Once the application is running, there is no speed difference.
A totally inappropriate and unscientific experiment:
$ du -a /usr/lib64/python3.9/ | grep py$ | sort -n -r | head -n 1
224 /usr/lib64/python3.9/_pydecimal.py
With caches:
$ time python3 -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 0m13.986s
user 0m13.554s
sys 0m0.365s
$ time python3 -O -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 0m13.594s
user 0m13.186s
sys 0m0.337s
$ time python3 -OO -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 0m13.225s
user 0m12.855s
sys 0m0.290s
Without caches:
$ time python3 -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 4m20.554s
user 4m14.600s
sys 0m4.850s
$ time python3 -O -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 4m14.291s
user 4m9.333s
sys 0m3.721s
$ time python3 -OO -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'
real 4m14.816s
user 4m11.035s
sys 0m2.400s
This suggests that an application that does 10000 module imports (with rather large 224 KiB modules) would be slowed down on start by 4 minutes. Obviously, such measurements depend on many aspects and doing 10000 imports is rather far-fetched. However, it is indisputable that importing modules without the cache is significantly slower.
Deployments negatively impacted by this are advised to either install the appropriate bytecode subpackage or pre-compile the relevant modules ahead of time (e.g. when building a container image).


=== Rejected ideas ===
=== Rejected ideas ===

Revision as of 18:33, 7 September 2020


Python: Optional Bytecode Cache

Summary

The Python standard library bytecode cache files (e.g. /usr/lib64/python3.9/.../__pycache__/*.pyc) will be moved from the Package-x-generic-16.pngpython3-libs package to three new optional subpackages (split by optimization level). The non-optimized bytecode cache will be recommended by Package-x-generic-16.pngpython3-libs and installed by default but removable. The bytecode cache optimization level 1 and 2 will not be recommended (and hence will not be installed by default) but will be installable. The default SELinux policy will be adapted not to audit AVC denials when the bytecode cache is created by Python on runtime. This will save 8.89 MiB disk space on default installations or 17.12 MiB on minimal installations (by opting-out from the recommended subpackage with non-optimized bytecode cache). When all three new packages are installed, the size will increase slightly over the status quo (by 4.5 MiB).

Owner

Current status

  • Targeted release: Fedora 34
  • Last updated: 2020-09-07
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

What is the Python bytecode cache

When Python code is interpreted, it is compiled to Python bytecode. When a pure Python module is imported for the first time, the compiled bytecode is serialized and cached to a .pyc file located in the __pycache__ directory next to the .py source. Subsequent imports use the cache directly, until it is invalidated (for example when the .py source is edited and its mtime stamp is bumped) -- at that point, the cache is updated. This behavior is explained in detail in PEP 3147. The invalidation is described in PEP 552.

Python can operate in 3 different optimization levels: 0, 1 and 2. By default, the optimization level is 0. When invoked with the -O command line option optimization is set to 1, similarly with -OO it is 2. Bytecode cache for different optimization levels is saved with different filenames as described in PEP 488.

As an example, a Python module located at /path/to/basename.py will have bytecode cache files for CPython 3.9 stored as:

  • /path/to/__pycache__/basename.cpython-39.pyc for the non-optimized bytecode
  • /path/to/__pycache__/basename.cpython-39.opt-1.pyc for optimization level 1
  • /path/to/__pycache__/basename.cpython-39.opt-2.pyc for optimization level 2

Python bytecode cache in RPM packages (status quo)

Pure Python modules shipped in RPM packages (and namely the ones shipped trough the Package-x-generic-16.pngpython3-libs package) are located at paths not writable by regular user, under /usr/lib(64)/python3.9/, hence the bytecode cache is also located in such locations. To work around this problem, the bytecode cache is pre-compiled when RPM packages are built and Package-x-generic-16.pngpython3-libs ships and owns the sources as well as the bytecode cache:

$ rpm -ql python3-libs
...
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc
...
/usr/lib64/python3.9/ast.py
...

As a result, the package is quite big, essentially shipping all pure Python modules 4 times.

Depending of the module content, its bytecode cache files might be identical across optimization levels. For such cases, the files are hardlinked to reduce the bloat:

$ ls -1i /usr/lib64/python3.9/collections/__pycache__/abc.*pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-1.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-2.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.pyc

This is however not possible for all the modules from Package-x-generic-16.pngpython3-libs:

$ ls -1i /usr/lib64/python3.9/__pycache__/ast.*pyc
8438 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
8440 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
8441 /usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc

What if the bytecode cache would not be packaged

When the bytecode cache is not packaged, several things happen:

  1. When non-root users run Python, the imported modules are never cached. As a result, the startup time of Python apps might be slightly larger than necessary until root runs them.
  2. When root runs Python, the imported modules are cached. As a result untracked .pyc files start to pop up in /usr/lib(64)/python3.9/. When the system is updated to a newer Python version, the untracked files remain on the filesystem until manually cleaned up.
  3. When root runs Python in SELinux restricted context, the imported modules are attempted to be cached but SELinux does not allow that. The result is same as (1) with a lot of noise from SELinux.

Packaging the bytecode cache into optional subpackages

To be able to save quite some disk space without disrupting the user experience, we propose to ship the pre-compiled bytecode cache previously included in Package-x-generic-16.pngpython3-libs as follows:

In order to properly own any runtime-generated bytecode-cache files, the Package-x-generic-16.pngpython3-libs will list all files listed in the three abovementioned packages as %ghost.

Package-x-generic-16.pngpython3-libs will not Require any of the bytecode cache packages, hence the packages will be (un)installable and fully optional.

Given that almost all Fedora Python packages invoke Python in the non-optimized mode¹, Package-x-generic-16.pngpython3-libs will Recommend Package-x-generic-16.pngpython3-libs-bytecode-opt-0 and hence the package will be installed by default together with Python; the user experience will remain the same for the vast majority of users and use cases.

Furthermore, container images and other minimal systems maintainers may choose to exclude the Package-x-generic-16.pngpython3-libs-bytecode-opt-0 package to save more disk space if desired.

Note that by splitting the three optimization levels to different RPM packages, files can no longer be hardlinked between each other. This results in a slight size increase when all three optimization levels are installed. The change owners consider the need for all three subpackages to be present simultaneously on one size-sensitive system unlikely and hence consider this a fair trade.

¹ No real data was collected to support this claim. This hypothesis is made by the Python maintainers based on their own experience.

SELinux policy changes

In order to suppress the otherwise omnipresent AVC denial messages about Python failing to write the bytecode cache, the Python maintainers have teamed up with the Fedora's selinux-policy maintainers to suppress those. The implementation details about this are available at:

When Python runs under the root user in SELinux restricted context, SELinux will still prevent it from writing the bytecode cache, but it will not clutter the audit log.

Size impact

Sizes calculated in mock on x86_64.

Situation Size of /usr/lib(64)/python3.9 in MiB Difference in MiB
Status quo (before this change) 31.84
Default (non-optimized cache only) 22.96 -8.89
No cache 14.72 -17.12
Levels 0 and 1 29.71 -2.13
All optimization levels (like before) 36.35 +4.50

Speed impact

The presence or absence of the bytecode cache only impacts the speed of imports. It is most common that the imports happen while an application starts. Once the application is running, there is no speed difference.

A totally inappropriate and unscientific experiment:

$ du -a /usr/lib64/python3.9/ | grep py$ | sort -n -r | head -n 1
224	/usr/lib64/python3.9/_pydecimal.py

With caches:

$ time python3 -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	0m13.986s
user	0m13.554s
sys	0m0.365s
$ time python3 -O -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	0m13.594s
user	0m13.186s
sys	0m0.337s
$ time python3 -OO -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	0m13.225s
user	0m12.855s
sys	0m0.290s


Without caches:

$ time python3 -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	4m20.554s
user	4m14.600s
sys	0m4.850s
$ time python3 -O -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	4m14.291s
user	4m9.333s
sys	0m3.721s
$ time python3 -OO -c 'import importlib as i, _pydecimal as p; [i.reload(p) for _ in range(10000)]'

real	4m14.816s
user	4m11.035s
sys	0m2.400s

This suggests that an application that does 10000 module imports (with rather large 224 KiB modules) would be slowed down on start by 4 minutes. Obviously, such measurements depend on many aspects and doing 10000 imports is rather far-fetched. However, it is indisputable that importing modules without the cache is significantly slower.

Deployments negatively impacted by this are advised to either install the appropriate bytecode subpackage or pre-compile the relevant modules ahead of time (e.g. when building a container image).

Rejected ideas

In this section, we briefly describe ideas that were presented by others or considered by the change owners, but rejected.

Stop shipping mandatory .py sources, ship only .pyc cache

Make Python not attempt to write bytecode cache into /usr/lib(64)/python3.9

Not realized ideas

In this section, we briefly describe ideas that were presented by others or considered by the change owners, but were not realized (e.g. for capacity reasons). Such ideas may be realized later.

Store bytecode cache in /var/cache and/or ~/.cache

Apply this change to all Python RPM packages

Feedback

Benefit to Fedora

Scope

  • Other developers: N/A (not a System Wide Change)
  • Release engineering: N/A (not a System Wide Change)
  • Policies and guidelines: N/A (not a System Wide Change)
  • Trademark approval: N/A (not needed for this Change)

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

N/A (not a System Wide Change)

User Experience

Dependencies

N/A (not a System Wide Change)

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), Yes/No
  • Blocks product? product

Documentation

N/A (not a System Wide Change)

Release Notes