From Fedora Project Wiki

(HTTPS won't scale)
 
Line 56: Line 56:


---[[User:Wwoods|WillWoods]] 16:02, 23 October 2009 (UTC)
---[[User:Wwoods|WillWoods]] 16:02, 23 October 2009 (UTC)
I don't think HTTPS will scale.  It has a significant constant-factor penalty compared to HTTP and can't be mirrored by untrusted parties.  The right solution would be to sign each debuginfo file individually, but we would have to invent new tooling for that. [[User:Mattmccutchen|Mattmccutchen]] 22:29, 7 May 2010 (UTC)


== Implementing an RPM-backed virtual filesystem ==
== Implementing an RPM-backed virtual filesystem ==

Latest revision as of 22:29, 7 May 2010

Versioning

12:02 < dwmw2> I'm slightly confused about versioning
12:02 < dwmw2> does it cope if clients aren't completely up to date?

Yes. The debuginfo filenames are unique, even across different versions of the same package. So, there will be debuginfo files available for every version of every package available in the debuginfo repos.

--WillWoods 19:50, 5 February 2009 (UTC)

To clarify a bit: a properly-updated debuginfofs server should have debuginfo for all packages in all configured Fedora repos - that is, we'll have debuginfo for the base version, the updated version, and the updates-testing version of every package.

For packages that have left the repos (e.g. obsoleted updates) debuginfo isn't deleted immediately, but probably will be deleted after a short grace period - a week for stable releases, less for Rawhide.

--WillWoods 22:49, 6 March 2009 (UTC)

NFS

Have you folks analyzed the file access patterns of debuginfo consumers such as gdb to see whether plain NFS would be suitable?

--Fche 14:05, 5 July 2009 (UTC)

From what I've seen of the access patterns of GDB, yes, it could work fairly efficiently for local sites. But it's not something I would recommend for proper implementation of this feature.

NFS over the internet is not a good idea. I'm sure many sites firewall it - in either direction - and it's not something I'd want to ask the infrastructure team to set up and support.

WebDAV, on the other hand, is just extended HTTP. Very few places will firewall that off, there's well-known ways to implement proxies/caching, and it still supports seek() and downloading partial files.

--WillWoods 15:12, 6 July 2009 (UTC)

Yet another daemon?

It sounds like this requires running yet another daemon all the time. Can it be started on demand instead? Vda 11:06, 17 August 2009 (UTC)

Security

There is a risk DebuginfoFSserver→client will send back malicious debuginfo. Currently the debuginfo installed by yum at the client is signed by the Fedora project keys. The DebuginfoFSserver→client protocol content does not have such easy security signature.

The information sent from client will be probably put as a public bug entry into bugzilla.redhat.com where the malicious DebuginfoFSserver can read it back from.

debuginfo can contain arbitrary Turing-complete DWARF expressions executed by the "virtual machine" in GDB, language described at http://dwarf.freestandards.org/Dwarf3.pdf - 2.5 DWARF Expressions; page 14 (26/267).

A malicious debuginfo can contain DWARF expression encoding the retrieved security information ("password") from the core file hiding from user it can be sensitive:

#1 harmless (s=0x7fffffffd0e0 "\332\313\331\331\335\305\330\316") at x.c:13

Have you recognized this is <code>"password" ^ 0xaa</code>?

Some further text in a mail thread.

--- Jkratoch

"The DebuginfoFSserver→client protocol content does not have such easy security signature." - Actually WebDAV can easily be run over HTTPS, which would give us certificate-based trust assurance for all debuginfo data.

---WillWoods 16:02, 23 October 2009 (UTC)

I don't think HTTPS will scale. It has a significant constant-factor penalty compared to HTTP and can't be mirrored by untrusted parties. The right solution would be to sign each debuginfo file individually, but we would have to invent new tooling for that. Mattmccutchen 22:29, 7 May 2010 (UTC)

Implementing an RPM-backed virtual filesystem

What additional metadata would be needed to support an efficient RPM-backed virtual filesystem? Here's the current filelists.sqlite schema:

CREATE TABLE filelist (  pkgKey INTEGER,  dirname TEXT,  filenames TEXT,  filetypes TEXT);
CREATE INDEX dirnames ON filelist (dirname);
CREATE INDEX keyfile ON filelist (pkgKey);

To list a directory, one can select all rows with that dirname; fine. To determine the package that provides a file, one can do the same, but with an additional condition '/' || filenames || '/' LIKE '%/name/%'. This may be slow for extremely large directories; I haven't done detailed tests.

The second problem is actually extracting an individual file from an RPM. An RPM contains a gzipped cpio archive, so extracting a file requires either decompressing all the preceding data (slow) or doing some serious magic with the gzip format. How slow? xulrunner-debuginfo decompresses to 235 MB, and kernel-debuginfo to a whopping 1.2 GB (though I'm not sure if that particular package is used).

We shouldn't rule out the possibility of preprocessing the debuginfo packages into a format that permits more efficient access but is still smaller than a complete uncompressed copy. One trivial solution would be to convert each RPM to a zip file. Zip files have individually compressed members, so individual extraction is fast. Mattmccutchen 22:21, 7 May 2010 (UTC)