From Fedora Project Wiki

= Checkpoint/Restore

Summary

Add support to checkpoint and restore processes. Checkpointing processes can be used for fault tolerance and/or load balancing.

Checkpointing a process in regular intervals can help to restart a process if it might crash to resume/restart/restore the calculation without too much data lost. Providing this ability transparent at the OS level removes the need to implement this functionality for all processes manually.

Checkpointing and restoring a process to another system can be used to migrate a process, process tree or container to another system to distribute the load during the runtime and also for maintenance without service interruption like it is possible with virtual machines.

Owner

  • Email: <adrian@lisas.de>

Current status

  • Targeted release: Fedora 19
  • Last updated: 2012-10-24
  • Percentage of completion: 0%

Detailed Description

Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.

Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). CRIU has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.2) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.

To offer the checkpoint/restore functionality the package crtools has to be imported into Fedora and following changes are necessary to the kernel RPM:

diff --git a/config-x86_64-generic b/config-x86_64-generic
index 342b862..c5f8cf9 100644
--- a/config-x86_64-generic
+++ b/config-x86_64-generic
@@ -1,5 +1,8 @@
 CONFIG_64BIT=y
 
+CONFIG_EXPERT=y
+CONFIG_CHECKPOINT_RESTORE=y
+CONFIG_NAMESPACES=y
 # CONFIG_X86_X32 is not set
 # CONFIG_MK8 is not set
 # CONFIG_MPSC is not set

Benefit to Fedora

Fedora offers possibility to checkpoint/restore processes.

Scope

  • add the crtools package to Fedora
  • activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)

How To Test

User Experience

Users can easily checkpoint and restore processes with the crtools package:

  • crtools dump -D <destination-directory> -t <PID>
  • crtools restore -D <destination-directory> -t <PID>

Dependencies

Contingency Plan

Documentation

Release Notes

Comments and Discussion