From Fedora Project Wiki
Line 70: Line 70:
== Responsible parties ==
== Responsible parties ==
* Responsible party for initial go-ahead: [[User:spot|Tom 'spot' Callaway]]
* Responsible party for final project sign-off: [[User:spot|Tom 'spot' Callaway]]

Revision as of 19:26, 26 March 2012

statistics++: Making Fedora Project data accessible
Ian Weller, Fedora Engineering, Red Hat, Inc.

Project overview

Fedora Infrastructure has had a limited foray into the field of statistics. The Statistics page on the Fedora Project Wiki contains some limited information about the number of HTTP requests made to various infrastructure applications and the number of wiki edits made per month.

The statistics app in the first version of Fedora Community attempted to improve on the Statistics page, but ultimately failed because of the complexity of adding new and relevant automated queries to the platform and the limited amount of information Fedora's application servers could access.

With the planned messaging infrastructure for infrastructure applications, a statistics application can be programmed to listen on the message bus, record activity, and store activity in a database for later retrieval. This program will be called statistics++.

statistics++ consists of three services:

  1. datanommer, a server daemon that listens on the infrastructure message bus and records activity to a database
  2. datagrepper, An HTTP application that provides a RESTful web API for downloading data stored in the database based on a simple query syntax
  3. dataviewer, An HTTP application that produces automated data displays such as tables or charts

Target audience

datanommer is targeted toward infrastructure application developers who wish to make their data available for use in datagrepper and dataviewer.

datagrepper is targeted toward software developers who wish to generate their own queries for personal use or for inclusion in dataviewer.

dataviewer is targeted toward any user interested in statistics about the Fedora Project, such as Fedora users and developers, Red Hat executives, and journalists.



This project aims to solve the following problems:

  • Data on the Statistics wiki page can only be generated and validated by those who have access to Fedora log servers.
  • Data on the Statistics wiki page requires a human to generate the data each week.
  • Data on the Statistics wiki page does not encompass all infrastructure applications.
  • Data on the Statistics wiki page can be modified by anybody who can edit the wiki.
  • To generate data for other infrastructure applications (such as FAS, Koji, Bodhi, and other applications), separate code has to be written for each application in order to download data.

To solve these problems, statistics++ will have the following functionality:

  • Open, read-only access to any anonymized data collected by infrastructure applications
  • A standard RESTful API for downloading data
  • Flexible schemas for storing and retrieving data from infrastructure applications
  • Live updates of statistical data from infrastructure applications
  • An interface for creating automated queries and representing data in tables or charts



Use cases

Relationship to other services



Schedule summary


For statistics++ to run on Fedora Infrastructure, a messaging bus must be in place.

For the inclusion of each infrastructure application in statistics++, that application must send messages over the messaging bus.

Open issues

Resources for information

Design overview

Responsible parties