Summer Coding 2010 ideas - Universal Build-ID

The main page for this idea is Summer Coding 2010 ideas - Universal Build-ID.

Status: "Idea"

Summary of idea: Extend the Build-ID support to make it more universally usable.

Contacts: Mark Wielaard, Roland McGrath

Mentor(s): Mark Wielaard, Roland McGrath

Notes: This is not a completely worked out idea yet. A proposal should pick one or more scenarios and create a concrete implementation plan.

More information

The main page for Summer Coding 2010 ideas is Category:Summer Coding 2010 ideas.

Summary

Build-IDs are currently being put into binaries, shared libraries, core files and related debuginfo files to uniquely identify the build a user or developer is working with. There are a couple of conventions in place to use this information to identify "currently running" or "distro installed" builds. This helps with identifying what was being run and match it to the corresponding package, sources and debuginfo for tools that want to help the user show what is going on (at the moment mostly when things break). We would like to extend this to a more universial approach, that helps people identify historical, local, non- or cross-distro or organisational builds. So that Build-IDs become useful outside the current "static" setup and retain information over time and across upgrades.

Build-ID background

Build-IDs are unique identifiers of "builds". A build is an executable, a shared library, the kernel, a module, etc. You can also find the build-id in a running process, a core file or a separate debuginfo file.

The main idea behind Build-IDs is to make elf files "self-identifying". This means that when you have a Build-ID it should uniquely identify a final executable or shared library. The default Build-ID calculation (done through ld --build-id, see the ld manual) calculates a sha1 hash (160 bits/20 bytes) based on all the ELF header bits and section contents in the file. Which means that it is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical. GCC now passes --build-id to the linker by default.

When an executable or shared library is loaded into memory the Build-ID will also be loaded into memory, a core dump of a process will also have the Build-IDs of the executable and the shared libraries embedded. And when separating debuginfo from the main executable or shared library into .debug files the original Build-ID will also be copied over. This means it is easy to match a core file or a running process to the original executable and shared library builds. And that matching those against the debuginfo files that provide more information for introspection and debugging should be trivial.

Fedora has had full support for build-ids since Fedora Core 8: https://fedoraproject.org/wiki/Releases/FeatureBuildId

Getting Build-IDs

A simple way to get the build-id(s) is through eu-unstrip (part of elfutils).

build-id from an executable, shared library or separate debuginfo file:

$ eu-unstrip -n -e <exec|.sharedlib|.debug>

build-ids of an executable and all shared libraries from a core file:

$ eu-unstrip -n --core <corefile>

build-ids of an executable and all shared libraries of a running process:

$ eu-unstrip -n --pid <pid>

build-id of the running kernel and all loaded modules:

$ eu-unstrip -n -k

Build-IDs are the bits, not the hex-string
Although in the examples above the Build-ID is always represented as a 20 character hex-string, this is just a representation. A Build-ID is any number of bytes, not fixed at 20 (160 bits) or any other number. Specs and formats should be open to varying sizes, though optimize for any given producer (vendor/distro, OS toolchain, etc.) using a single size for all its IDs.

Current conventions and usage

Build-IDs are as useful as the methods we build around them to look things up based on them.

The convention that is currently being used by Fedora (and which has been adopted by the upstream GNU toolchain in for example GDB to find files) is to include a link in the debuginfo package that points to the elf file and the debuginfo file under /usr/lib/debug/.build-id/XX/YYYY (where XX are the first two hex-digits of the build id and YYYY are all the others).

So for example the bash-debuginfo package has the following files/links:

/usr/lib/debug/.build-id/c7/a002ba1eb1dbc7c609d2e5fb9a57f10861dbdd
 -> ../../../../../bin/bash
/usr/lib/debug/.build-id/c7/a002ba1eb1dbc7c609d2e5fb9a57f10861dbdd.debug
 -> ../../bin/bash.debug

These files/links are added by the debugedit (/usr/lib/rpm/debugedit) and find-debuginfo.sh (/usr/lib/rpm/find-debuginfo.sh) programs which make sure every executable and shared library (and the separate .debug debuginfo packages) have Build-IDs embedded and that the links above are added under /usr/lib/debug/.debug-id.

This makes it extremely easy to find the executable or shared library and the corresponding debuginfo just given the build-id. If they are installed on your system.

Since these are files included in the rpm package, it also makes it easy to find the package that provided the executable/library, that corresponds to the build id (gdb and systemtap will suggest the right debuginfo package to install based on the build-id they found for the program you wanted to introspect). You can ask yum to install it, or use repoquery to figure out the details of the package and binary involved.

For example you find some core file and examine it with eu-strip -n --core, or a long running process is spending a lot of time in some section of code and when running eu-unstrip -n --pid, you find out that the Build-ID corresponding to that section of code is 84153a6428b291df6d62ce906b65ee9270ec6837. Now you can use yum (or repoquery) to figure out what that thing really is:

$ yum whatprovides \*/84/153a6428b291df6d62ce906b65ee9270ec6837
glibc-debuginfo-2.11.1-6.i686 : Debug information for package glibc
Repo        : updates-debuginfo
Matched from:
Filename    : /usr/lib/debug/.build-id/84/153a6428b291df6d62ce906b65ee9270ec6837

You install that package (yum install /usr/lib/debug/.build-id/84/153a6428b291df6d62ce906b65ee9270ec6837) and then you'll find:

$ ls -l /usr/lib/debug/.build-id/84/153a6428b291df6d62ce906b65ee9270ec6837
/usr/lib/debug/.build-id/84/153a6428b291df6d62ce906b65ee9270ec6837 -> ../../../../../lib/libutil-2.11.1.so

The debuginfo package will also contain the source code of libutil.so and so you can start debugging.

But this is only for the latest current/up-to-date installed repository. There is no support for historical information, local builds, cross-distro, etc. Extending the usefulness of having build-ids is what this idea is about.

How do we scale this up/down? The actual Universial Build-IDs idea

What we would like is that when you get a Build-ID for something you can easily map it to the original developer, "creator", package, distributor, executable, sources, debuginfo files, etc.

This Build-ID can come from anything really, an old executable, a core file once made but never fully investigated, some currently running process that needs to be introspected, etc. And for various reasons parts or all of the original package, the executable itself, the libraries it relied on, the debuginfo packages, etc. could all be missing on the machine.

With an old core file, it might be all you have. A system could have been upgraded since a process started running, so the executable or any of the libraries it is using might only be in memory at the moment. The debuginfo package might never have been installed.

One use case to keep in mind when reading the various examples of situations where we want Build-ID mappings to work is that of the "canonical backtrace". This is a the backtrace of a process (or from a core file) as pure Build-ID + canonpc list. A canonpc is the pc adjusted for module & prelink bias so it's relative to the original module. Such a "canonical backtrace" is useful for identifying similar crashes. It is also the minimal information you need to provide to someone with access to the full Build-ID artefacts (binaries plus debuginfo) to extract some useful information from the crash.

The "canonical backtrace" example is interesting in two ways. First to generate a backtrace one needs access to the .eh_frame section of executable or shared library (the .eh_frame contains the data that shows how to unwind from a particular address in a module). So given an address and having the corresponding Build-ID one wants to lookup the executable/shared library associated with it on the local machine. Secondly it shows why one might have a BuildID in "isolation". It was send to you for examination, as the shortest way to transfer the information of which module we are talking about that was involved in some crash/backtrace. Since one might want to store this extracted information over a longer period of time to see if there are patterns in the crash reporting users do, you also need access to an historical Build-ID database for matching it.

Up in fedora, what about getting "historical" mappings?

Up towards other distributions (packagekit?)

Up towards a general build-id mapping universe (build-id.org).
- Generic registration, querying and mapping of build-ids

Down towards to local database for lone developer.

Or an local shop that builds upon an existing distro, but also has (internal) apps in their organization.

To totally disorganized "installs" where people move around executables all the time (inotify/updatedb).

How do we "proxy" this information between the different layers, so tools can have one query mechanism that works for any build-id that they happen to come across.

Tie-in to packagekit, abrt, debuginfo-fs?

Search