Infrastructure/AppBestPractices: Difference between revisions

Revision as of 17:33, 18 November 2013

Coding style

Python code should be PEP8 compliant

Tools to improve style

You can run pylint over your code once in a while to help catch PEP8 violations. pyflakes and pep8 (the program) are fast enough that you could setup your editor to run them whenever you save a python file (but they don't catch nearly as many things as pylint does).

Minimum Python Version

Except in very specific cases (certain libraries that may need to be run on RHEL5 clients), python code should target compatibility with python-2.6 (RHEL6 and all Fedora releases.) You should accept patches for compatibility with the latest version of python3 but at this time we are not writing code that runs solely on python3 (bloating the systems with both python2 and python3; inability to run mod_wsgi for both python2 and python3 on a single machine, etc).

Common Programming Issues

Centralized logging

Most of our apps use the standard Python logging module, which usually ends up logging to /var/log/httpd/error_log on the app server.

To have your app logs shipped to our central logging server, you can configure the SysLogHandler to do so.

For config-file based logging setups, you can do something like the following:

[logging]                                                                                                                                                             
                                                                                                                                                                          
[[handlers]]                                                                                                                                                          
                                                                                                                                                                         
[[[syslog_out]]]                                                                                                                                                      
class='handlers.SysLogHandler'                                                                                                                                        
args="('/dev/log', handlers.SysLogHandler.LOG_LOCAL4, socket.SOCK_STREAM)"                                                                                                               
formatter='full_content'                                                                                                                                              
                                                                                                                                                                          
[[loggers]]                                                                                                                                                           
                                                                                                                                                                          
[[[bodhi]]]                                                                                                                                                           
level='DEBUG'                                                                                                                                                         
qualname='bodhi'                                                                                                                                                      
handlers=['syslog_out']                                                                                                                                               
propagate=0

Here is an example of doing it in pure Python:

import logging
import logging.handlers

syslogger = logging.getLogger('bodhi')
syslogger.setLevel(logging.DEBUG)
handler = logging.handlers.SysLogHandler(address='/dev/log', facility=logging.handlers.SysLogHandler.LOG_LOCAL4)
syslogger.addHandler(handler)
syslogger.info('Hello SysLog!')

The app logs will then appear in /var/log/hosts/<HOST>/<YEAR>/<MONTH>/<DAY>/apps.log as well as the merged log /var/log/merged/apps.log on our central rsyslog server.

Translations

Use transifex to manage translations and add the projects to the Fedora Project Team so the Fedora i18n team can translate.

Do not mark Exception messages for translation. We want those to remain untranslated for several reasons:

It's easier to do a web search for the exceptions if the messages aren't translated.
Translators don't know what all the technical information in exception messages are so the translations aren't always accurate.
The exception messages aren't intended for end users -- they're to help with debugging or other coders. So it's not very useful to translate them.

fedmsg

Think about how to add fedmsg hooks into every event that changes data. However, also make it so that fedmsg is optional (ie: the app doesn't fail if fedmsg is not available/installed). This makes it easier to test apps outside of infrastructure and makes us more robust in case fedmsg starts failing sometime.

Libraries to use

kitchen: A library of common code that we've needed in Fedora. Lots of code to deal with unicode issues and localization. A few other things as well. Docs yum install python-kitchen

Web Frameworks

Use one of Flask, Pyramid, or TurboGears2. Limiting the number of web frameworks we're responsible for will greatly help with our long term maintenance burden.

Version numbering

The version numbering for libraries and applications that provide an API (almost all of our web apps) needs to be much stricter than application version numbering for other applications but the following can be considered best practices for both.

Version numbers for releases should generally be three incrementing numbers separated by dots ("."). Each number has a name: MAJOR.MINOR.MICRO There is a fourth number that can be added after MICRO called PATCHLEVEL. We'll talk about that in the #Immature_versions section.

The MAJOR number should be incremented anytime there is a backwards incompatible change (examples: deleting methods, removing attributes from the return value of a method, sometimes changing the ordering of a list that is publically available). The MINOR number should be incremented whenever there's an addition to API or other change in behaviour that is compatible but especially noteworthy (For instance, adding a method. Adding an entry to a dict that's being returned.). The MICRO number should be incremented when a new release just changes internals in a way that isn't visible to the user (Bugfixes generally fall into this category).

One thing to note is that there's a small amount of flexibility in this scheme. Some bugfixes change API -- for instance if a field wasn't supposed to be public but was being returned as part of a dict by mistake. Removing this on the grounds that no one should be using it and only bumping the MINOR number (ordinarily, removals would bump the MAJOR number) may be okay. Err on the side of incrementing numbers as it's better for consumers of libraries to be warned by the version number bump to check their code and find that the change does not affect them than for them to not be warned of a change that breaks things for them.

Immature Versions

Much of our code can be considered immature. We release relatively frequently and may be updating the API often. For these pieces of software we should still be adhering to a versioning scheme to help consumers know that changes to the API they are consuming may be coming. The scheme for those is very similar to the MAJOR.MINOR.MICRO given above but shifted over one position. Start by setting the MAJOR number to zero and then using MINOR for the same purpose as MAJOR in the previous section. Releases, where there are not API incompatible changes can bump MICRO whether or not there are forwards compatible changes to the API or just bugfixes. In the special case where there is a problem with a newly released version (usually within a few hours to one day) is found to have a defect that must be fixed, add a fourth number, PATCHLEVEL to show that you are replacing the previous release with a new one. The version for one of these would look like this: 0.3.32.1

Prereleases

There are two ways to do pre-release versioning. I have a slight preference for the first but I'll mention both:

Make the releases use the version number of the previous release with a MINOR (or PATCHLEVEL if it's an #Immature_Version) of .90. The next prerelease would use .91. And so on. The advantage of this is that computers can properly parse the order of these compared to the previous release and the next release. The disadvantage is that we are reusing a field that we use for other purposes for this.

The other method is to append a1 for the first alpha release, a2 for the second alpha, b1 for the first beta, and so on. This method is not parsed correctly by computers without giving them knowledge of the pre-release numbering scheme, however, we do have well documented methods of dealing with them. For instance, the Packaging:NamingGuidelines#Non-Numeric_Version_in_Release Prerelease versioning scheme for Fedora rpms. Just be sure that you follow those guidelines when building rpms otherwise you won't be able to upgrade to your final release properly. If you use this strategy, you should also be sure that your numbers and letters comply with the meanings assigned in PEP 386 for versions. (See, if you'd just use the first method, you could avoid having to read all of these standards documents ;-)

Common Development Tasks

Use git flow

We would like to have our upstream repository reflect what's deployed in infrastructure as well as what we are working on at the moment. git flow's ideas of separate branches maps well to this vision. The master branch in git flow should be very close to the code we have deployed in production. It should contain the code from the latest release plus any hotfixes that we've applied over the top (in infrastructure we use a puppet module to deploy the hotfix. In the upstream repo, we can use git flow hotfixes to manage applying the hotfix to the master and develop branch).

Getting started

Quickstart to using git flow.

yum install gitflow.

Then get clone the repository and initialize it for use:

# First thing when you initially clone a repo
$ git clone $URL
$ git flow init
# This will ask you questions.  The defaults are almost always
# fine because we've setup the repositories to work with it

You should always work on the 'devel' or 'develop' branch if just doing small maintenance. Changes on develop will get pushed out as the next major release.

New major feature

# Working on a major feature
# git flow creates feature branches that are prefixed with feature/

$ git flow feature start my-new-feature
# make edits, commit

# Share that feature with others for review.  This pushes your branch up to github
$ git flow feature publish my-new-feature

Go to the github interface and open a pull request. (In the future, perhaps we could package http://defunkt.io/hub/ and get a more automated way to do this step).

Ping someone(s) to review the change for you and +1 on the pull request. We aren't very strict about how thorough a code review must be at this point. The code review is partly to catch errors in the code but it's also to expose new contributors to how our code works. So a perfunctory code review by someone who's new to this infrastructure project has value just as a review by someone who's a better coder than you. If you want the change to be reviewed by someone specific for specific things that you're unsure about, be sure to ping them specifically.

# When you're done.. you can merge it on github with a click, or if you'd like to do so from the CLI
$ git flow feature finish my-new-feature

Hotfix

hotfixes end up on both the master and develop branches. They branch off of master.

$ git flow hotfix start fix-traceback-on-login
# make some edits
$ git flow hotfix publish fix-traceback-on-login

Ping the #fedora-apps channel or specific people to review and +1 your change. Note that small hotfixes are especially good for getting people who aren't core contributors to this code base to start becoming familiar with it whereas core committers might be a little bored with looking at them. If we have new contributors wanting to join, this is a good way to introduce them to the project.

hotfixes sometimes have a time element so remember that we deploy hotfixes in infrastructure independently of when the code is merged upstream. If you're confident of the fix, you may want to commit the hotfix to our production (or staging) deployments in infrastructure's puppet repo and create the hotfix ticket in trac while you're waiting for the hotfix to be reviewed.

# When you're done.. you can merge it from the CLI like this:
$ git flow hotfix finish my-new-feature

# If you want to do it on github, you need to both merge the pull request to
# the develop and the master branches.

Making a release

# Releases end up on the master branch
$ git flow release start 0.3.32
$ # make some edits, bump version numbers, commit, release to pypi.
$ git flow release finish 0.3.32
$ git push --all
$ git push --tags

Using Vanilla git

If you're more comfortable using vanilla git without the gitflow addon, the things to be aware of are:

new commits should go to the devel branch (the default on the repos we set up on github).
master is used to make releases from and should reflect what's in production (or in an rpm, ready to be pushed to production) at any given time.
hotfixes have their own branches that need to be merged into both master (for release) and devel.

Removing git branches

When a feature, release, etc has been merged or abandoned we can clean up the old branches. Removing them will make it easier for contributors (both new and old) to see what branches are currently important for the project. The commits that make up the branches will still be around if the changes were merged to devel. It's just that the nice way of referencing them as a separate branch will be gone.

If you merge on the github interface, there will be a button on the pull request "delete this branch". Pushing that will remove the branch.

From the git cli, you can do git push origin :$NAME_OF_FEATURE_BRANCH For instance, git push origin :feature/timeout-fix.

Ordinarily, when you use git push, git figures out where you want to push to (usually origin for the github server). It then pushes the current local branch to a branch of the same name on the origin. If you were explicit about this, the command line syntax would be: git push origin develop:develop. When we delete, we're using this explicit syntax but we're telling git that instead of a local branch to the origin, we want to push nothing (thus, having nothing on the left side of the colon).

Security Best Practices

Make yourself familiar with the top ten Web Application vulnerabilities and take measures against them.
Always use prepared statements for SQL queries or a framework's abstraction that does this
- Only if no input from a user is added to a SQL query or if it is not possible to use prepared statements, build the query yourself. Then let someone review the query and add warnings about this to the source code
Use a template engine to build web pages that takes care of proper encoding of user input to avoid Cross-Site Scripting
- Handy ressources: Cross-Site Scripting Prevention Cheat Sheet, HTML5 security cheat sheet
- Protect authentication cookies with the flag HttpOnly
Enable CSRF protection in your framework
- Follow the REST model, do use GET requests to change state of the application, because GET requests might not be covered by CSRF protection
Always use HTTPS for web apps that allow to login
- Check that the certificate is valid for the intended/published hostname
- Do not load external JavaScript code
  - If you need to do this, do it via HTTPS
- Protect authentication cookies with the flag Secure
- If the service runs on a server where all web applications should only be accessed via HTTPS enable HTTP Strict Transport Security
Set the X-Frame-Options header unless your applications needs to be included in a frame
Whenever someone logs in, create a new session ID and authenticate only the new session ID
If someone logs out, invalidate the session ID on the server side

Search