Modularity/Getting Started/Building modular things: Difference between revisions

Revision as of 16:25, 6 September 2016

And an attempt to get the vocabulary consistent

Author: Stephen Tweedie

Summary

We have some basic terminology confusion around modules. Is a container image the same thing as a module? Is a software collection a single module, or a group of modules? We can often get away with being vague, but for technical planning we need to be able to distinguish between all these concepts.

I propose we use these terms:

Package. Essentially, the same thing as an rpm. In the future it might be non-rpm content but should fit the same role.
Module. A set of packages tested and released together as a distinct unit, complete with the metadata needed to manage it as a unit. May depend on other modules.
Stack. A complete tree of modules. A stack can be thought of as a top-level module, with the understanding that we’re implicitly including all of that module’s dependencies in the stack.
Artifact or image. An actual set of bits built out of modules, in a format intended to be distributed or deployed in some way.

Generally, these serve distinct purposes. A module is a building block; a stack contains all the software for a complete solution; an artifact is a concrete object containing a stack (or stacks) for distribution to users.

We will also distinguish between:

A Build of a package: a process which involves compiling source code and creating a packaged output; and
A Compose of a module: a process which assembles pre-compiled packages into an organised module, but which includes no compilation step itself.

We will define these in more detail below.

The Package

We can start with the familiar package and subpackage, built as usual from a component (or source package):

We’re not changing anything at this level (yet… non-rpm packages are a possible topic for the future!)

It is important to distinguish the build of a package from the compose of a module; the build here is the step which includes compilation and creation of a reusable bundle of the compiled output. We might be building specifically to compile for a module’s contents; but the build step is still a distinct step.

The Module

We group these binary packages into modules. Note, we can pick just a subset of the subpackages from a particular build into a given module, eg. some hypothetical example base runtime module:

These are internal build groupings: basically repositories, but with additional metadata and semantics that allow them to play nicely with other modules and module tooling.

We attach metadata to these modules, on top of the normal metadata in a repository. That can be support metadata (SLAs or EOL dates for the module); metadata on how to use the module (eg. its dependencies, default installed packages); identity (builder, vendor, version etc), or many other things that we’re still just starting to imagine.

We should be able to update the module as a unit: updating and testing its component parts as necessary, but also testing the module as a whole before releasing an update.

We refer to the assembly of a module by a build tool as a compose. This step includes no compilation; it is merely the creation of a bundled set of packages and metadata comprising a single module. The closest analogue in today’s build system is the “puddle build”, where a custom repository is created for a particular purpose.

When we talk about a module compose, we are not talking about “the Compose” of a full distribution release. The two are similar, in that they both assemble existing compiled packages into bundled output; but the module compose is just one single targeted assembly, whereas the full distribution Compose typically creates multiple repositories and images as output in a single large job. Also note that we can in theory perform the build of the packages within a module, and the compose of the module itself, as a single step; we might term this a module build.

The stack

We can then combine modules into stacks.

A stack should represent something distinct that the user wants. It may be a traditional developer stack (LAMP, ruby-on-rails, etc.); or it may be an application (but extended to include all the dependencies that that application needs to run); or it could be the set of modules needed to deliver something like Atomic Host or Cockpit.

The stack represents this full set of software. It doesn’t presume how we distribute it, we’re still just talking about the set of modules making up the stack.

A stack is still just a module here. It’s just a way of referring to the module plus all its implied dependencies as a single unit, to distinguish that from the individual modules within the stack; the stack content and metadata may have exactly the same format as module metadata (the metadata is the same colour here for a reason!) But it’s still important to make the distinction between a single module, and a module plus all the external dependencies it relies on.

Importantly, we can take two modules with different lifecycles and combine them in a single stack. The definition of the stack gives us the way to plan and track the relationship or dependency between the modules.

The Image, or Artifact

We build stacks into images or other artifacts such as ostree trees. These are the formats in which we release a stack.

For example, we could take a single stack such as a LAMP stack, and build and release it both as SCL rpms in a yum repository, and as container images: indeed, we do so today. It’s the same stack in each artifact. We could add more such formats: virtual appliances, vagrant boxes, anaconda isos or ostree trees, and it would still be the same stack in each; only the artifact / format / image has changed.

The format of the content we deliver may be different in each case, but it’s the same binaries from the same stack inside, and the same stack metadata describing that content.

So how do these parts all fit together?

Let’s look at a couple of examples to make it (hopefully!) clear what we’re talking about.

Example 1: Our Atomic Host stack

We already release content that is organised in much this way, just not with any formal definition of modules or stacks.

Atomic, for example, is composed from multiple sources. Some content comes from traditional Fedora with its usual 6-month lifecycle; and some comes from -Extras with its much shorter lifecycle and rapid rebasing to pick up new features.

And even within Fedora we have different levels of lifecycle. An obvious example is Gnome; the Anaconda installer stack that we use for Atomic installer ISO images depends on this faster-moving content, even when the installed atomic host does not include it.

But even though these distinct sets of components all have differing lifecycles, they still belong within Fedora and we still produce coherent output images or artifacts from them today, with planned releases that require alignment between all the relevant parts. In the language of this document, it might look like this:

@@ Line 1: / Line 1: @@
 ''And an attempt to get the vocabulary consistent''
+Author: Stephen Tweedie
 == Summary ==

Search