From Fedora Project Wiki
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Development Tips =
= Development Tips =
== Oprofile ==
== Oprofile ==
It is often mentioned that running oprofile is more complicated than using gprof, because it has to be started a daemon and loaded a kernel module. But gprof needs recompilation of an application and dependent libraries with -pg option, which could be worse in case you need to recompile also glib library.
It is often mentioned that running oprofile is more complicated than using gprof, because it has to start a daemon and load a kernel module. But gprof needs recompilation of an application and dependent libraries with -pg option, which could be worse in case you need to recompile also glib library.
Setting and using oprofile:
Setting and using oprofile:
* [http://www.ua.kernel.org/pub/mirrors/centos.org/4.6/docs/html/rhel-sag-en-4/s1-oprofile-configuring.html configuring]
* [http://www.ua.kernel.org/pub/mirrors/centos.org/4.6/docs/html/rhel-sag-en-4/s1-oprofile-configuring.html configuring]
Line 7: Line 7:


== Best practices ==
== Best practices ==
In every good course book are mentioned problems with memory allocation, performance of some specific functions and so on.  
In every good course book problems with memory allocation, performance of some specific functions and so on are mentioned.  
The best thing what to do is buy a good book ;-)
The best thing to do is to buy a good book ;-) It doesn't make sense to thing about every line, but optimize only things which are bottlenecks for performance.


Here is a short overview of techniques which are often problematic:
Here is a short overview of techniques which are often problematic:
* excessive I/O, power consumption, or memory usage - memleaks
* threads
* threads
* Avoid unnecessary work/computation
* Wake up only when necessary
* Wake up only when necessary
* Do not actively poll in programs or use short regular timeouts, rather react to events
* Don't use [f]sync() if not necessary
* Do not actively poll in programs or use short regular timeouts, rather react on events
* If you wake up, do everything at once (race to idle) and as fast as possible
* If you wake up, do everything at once (race to idle) and as fast as possible
* Use large buffers to avoid frequent disk access. Write one large block at a time
* Use large buffers to avoid frequent disk access. Write one large block at a time
* Don't use [f]sync() if not necessary
* Group timers across applications if possible (even systems)
* Group timers across applications if possible (even systems)
* excessive I/O, power consumption, or memory usage - memleaks
* Avoid unnecessary work/computation


And now some examples:
And now some examples:


=== Threads ===
=== Threads ===
It is widely belived that using threads make our application
It is widely believed that using threads make our application performing better and faster. But it is not true every-time.
performing better and faster. But it is not true everytime.
 
Python is using [http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock Global Lock Interpreter] so the threading is profitable only for bigger I/O operations. We can help threads by optimizing them by [http://code.google.com/p/unladen-swallow/ unladded swallow] (still not in upstream).
Python is using [http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock Global Lock Interpreter] so the threading is profitable only for bigger I/O operations. We can help ourselves by optimizing them by [http://code.google.com/p/unladen-swallow/ unladen swallow] (still not in upstream).
 
Perl [http://perldoc.perl.org/threads.html threads] was originally created for applications running on systems without
fork (win32). In Perl threads the data are copied for every single thread ([http://en.wikipedia.org/wiki/Copy-on-write Copy On Write]). The data are not shared by default, because user should be able to define the level of data sharing. For data sharing the ([http://search.cpan.org/~jdhedden/threads-shared-1.32/shared.pm threads::shared]) module has to be included, then the data are copied (Copy On Write) plus the module creates tied variables for them, which takes even more time and is even slower.
 
Reference: [http://www.perlmonks.org/?node_id=288022 performance of threads]
 
The C threads share the same memory, each thread has its own stack, kernel doesn't have to create new file descriptors and allocate new
memory space. The C can really use support of more CPUs for more threads.
 
Therefore, if you want have a better performance of your threads, you should be using some low language like C/C++. If you are using a scripting languages, then it's possible to write a C binding. The low performing parts can be tracked down by profilers.
 
Reference: [http://people.redhat.com/drepper/lt2009.pdf improving performance of your application]
 
=== Wake-ups ===
Many applications are scanning configuration files for changes. In many cases it is done by setting up some interval e.g. every minute. This
can be a problem, because it is forcing disc to wake-up from spindowns. The best solution is to find a good interval, a good checking mechanism or check for changes with inotify - eg. act on event.
Inotify can check variety of changes on a file or a directory. The problem is that we have only limited numbers of watches on a system. The number can be obtained from
<pre>
/proc/sys/fs/inotify/max_user_watches
</pre>
and it could be changed, but it is not recommended.
 
Example:
 
<pre>
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;sys/time.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/inotify.h&gt;
#include &lt;unistd.h&gt;
 
int main(int argc, char *argv[]) {
  int fd;
  int wd;
  int retval;
  struct timeval tv;
 
  fd = inotify_init();
 
  /* checking modification of a file - writing into */
  wd = inotify_add_watch(fd, "./myConfig", IN_MODIFY);
  if (wd &lt; 0) {
    printf("inotify cannot be used\n");
    /* switch back to previous checking */
  }
 
  fd_set rfds;
  FD_ZERO(&amp;rfds);
  FD_SET(fd, &amp;rfds);
  tv.tv_sec = 5;
  tv.tv_usec = 0;
  retval = select(fd + 1, &amp;rfds, NULL, NULL, &amp;tv);
  if (retval == -1)
    perror("select()");
  else if (retval) {
    printf("file was modified\n");
  }
  else
    printf("timeout\n");
 
  return EXIT_SUCCESS;
}
</pre>
 
Pros:
* variety of checks
Cons:
* finite number of watches per system
* failure of inotify
 
In the case where inotify fails, the code has to fallback to different check method. That usually means lot of "#if #define" in the source code.
 
Reference:
 
''man 7 inotify''
 
=== Fsync ===
Fsync is known as an I/O expensive operation, but according to the references is not completely true. The article has also interesting discussion which shows several different opinions on using or not using fsync at all.
 
The typical examples are Firefox freeze (fsync) vs. empty files (without fsync). What's happened in these cases?
 
Firefox used to call the sqlite library each time the user clicked on a link to go to a new page. The sqlite called fsync and because of the file system settings (mainly ext3 with data ordered mode), then there was a long latency when nothing was happening. This could take a long time (even 30s), if there was a large file copying by another process in the same time.
 
In other cases wasn't fsync used at all and there was no problem until the switch to ext4 file system. The ext3 was set to data ordered mode, which flushed memory every few seconds and saved it to a disc. But with ext4 and laptop_mode the interval was longer and the data might get lost when the system was unexpectedly switched off. Now the ext4 is patched, but still we should thing about design of our applications and use fsync carefully.
 
Simple example of reading/writing into a configuration file shows how backup of a file can be made or
how can data be lost.
 
Bad example:
<pre>
/* open and read configuration file e.g. ~/.kde/myconfig */
fd = open("./kde/myconfig", O_WRONLY|O_TRUNC|O_CREAT);
read(myconfig);
...
write(fd, bufferOfNewData, sizeof(bufferOfNewData));
close(fd);
</pre>


Perl [http://perldoc.perl.org/threads.html threads] were created for application which run on systems without
Better example:
fork (win32). In Perl are data by default copied for every thread ([http://en.wikipedia.org/wiki/Copy-on-write Copy On Write]) and if user includes module for sharing data
<pre>
([ threads::shared]), then are data copied (Copy On Write) plus it creates tied variables, which takes even more time.
open("/.kde/myconfig", O_WRONLY|O_TRUNC|O_CREAT);
It has pros and cons but definitely one should think about usage and run some performance tests.
read(myconfig);
...
fd = open("/.kde/myconfig.suffix", O_WRONLY|O_TRUNC|O_CREAT);
write(fd, bufferOfNewData, sizeof(bufferOfNewData));
fsync; /* paranoia - optional */
...
close(fd);
/* do_copy("/.kde/myconfig", "/.kde/myconfig~"); */ /* paranoia - optional */
rename("/.kde/myconfig.suffix", "/.kde/myconfig");
</pre>


In C threads share the same memory, each thread has his own stack, kernel doesn't have to create new file descriptors and allocate new
Reference:
memory space. Threads are taking advantage from more CPUs.
[http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ inside of fsync]

Latest revision as of 13:40, 22 July 2011

Development Tips

Oprofile

It is often mentioned that running oprofile is more complicated than using gprof, because it has to start a daemon and load a kernel module. But gprof needs recompilation of an application and dependent libraries with -pg option, which could be worse in case you need to recompile also glib library. Setting and using oprofile:

Best practices

In every good course book problems with memory allocation, performance of some specific functions and so on are mentioned. The best thing to do is to buy a good book ;-) It doesn't make sense to thing about every line, but optimize only things which are bottlenecks for performance.

Here is a short overview of techniques which are often problematic:

  • threads
  • Wake up only when necessary
  • Don't use [f]sync() if not necessary
  • Do not actively poll in programs or use short regular timeouts, rather react on events
  • If you wake up, do everything at once (race to idle) and as fast as possible
  • Use large buffers to avoid frequent disk access. Write one large block at a time
  • Group timers across applications if possible (even systems)
  • excessive I/O, power consumption, or memory usage - memleaks
  • Avoid unnecessary work/computation

And now some examples:

Threads

It is widely believed that using threads make our application performing better and faster. But it is not true every-time.

Python is using Global Lock Interpreter so the threading is profitable only for bigger I/O operations. We can help ourselves by optimizing them by unladen swallow (still not in upstream).

Perl threads was originally created for applications running on systems without fork (win32). In Perl threads the data are copied for every single thread (Copy On Write). The data are not shared by default, because user should be able to define the level of data sharing. For data sharing the (threads::shared) module has to be included, then the data are copied (Copy On Write) plus the module creates tied variables for them, which takes even more time and is even slower.

Reference: performance of threads

The C threads share the same memory, each thread has its own stack, kernel doesn't have to create new file descriptors and allocate new memory space. The C can really use support of more CPUs for more threads.

Therefore, if you want have a better performance of your threads, you should be using some low language like C/C++. If you are using a scripting languages, then it's possible to write a C binding. The low performing parts can be tracked down by profilers.

Reference: improving performance of your application

Wake-ups

Many applications are scanning configuration files for changes. In many cases it is done by setting up some interval e.g. every minute. This can be a problem, because it is forcing disc to wake-up from spindowns. The best solution is to find a good interval, a good checking mechanism or check for changes with inotify - eg. act on event. Inotify can check variety of changes on a file or a directory. The problem is that we have only limited numbers of watches on a system. The number can be obtained from

/proc/sys/fs/inotify/max_user_watches

and it could be changed, but it is not recommended.

Example:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/inotify.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
  int fd;
  int wd;
  int retval;
  struct timeval tv;

  fd = inotify_init();

  /* checking modification of a file - writing into */
  wd = inotify_add_watch(fd, "./myConfig", IN_MODIFY);
  if (wd < 0) {
    printf("inotify cannot be used\n");
    /* switch back to previous checking */
  }

  fd_set rfds;
  FD_ZERO(&rfds);
  FD_SET(fd, &rfds);
  tv.tv_sec = 5;
  tv.tv_usec = 0;
  retval = select(fd + 1, &rfds, NULL, NULL, &tv);
  if (retval == -1)
    perror("select()");
  else if (retval) {
    printf("file was modified\n");
  }
  else
    printf("timeout\n");

  return EXIT_SUCCESS;
}

Pros:

  • variety of checks

Cons:

  • finite number of watches per system
  • failure of inotify

In the case where inotify fails, the code has to fallback to different check method. That usually means lot of "#if #define" in the source code.

Reference:

man 7 inotify

Fsync

Fsync is known as an I/O expensive operation, but according to the references is not completely true. The article has also interesting discussion which shows several different opinions on using or not using fsync at all.

The typical examples are Firefox freeze (fsync) vs. empty files (without fsync). What's happened in these cases?

Firefox used to call the sqlite library each time the user clicked on a link to go to a new page. The sqlite called fsync and because of the file system settings (mainly ext3 with data ordered mode), then there was a long latency when nothing was happening. This could take a long time (even 30s), if there was a large file copying by another process in the same time.

In other cases wasn't fsync used at all and there was no problem until the switch to ext4 file system. The ext3 was set to data ordered mode, which flushed memory every few seconds and saved it to a disc. But with ext4 and laptop_mode the interval was longer and the data might get lost when the system was unexpectedly switched off. Now the ext4 is patched, but still we should thing about design of our applications and use fsync carefully.

Simple example of reading/writing into a configuration file shows how backup of a file can be made or how can data be lost.

Bad example:

/* open and read configuration file e.g. ~/.kde/myconfig */
fd = open("./kde/myconfig", O_WRONLY|O_TRUNC|O_CREAT);
read(myconfig);
...
write(fd, bufferOfNewData, sizeof(bufferOfNewData));
close(fd);

Better example:

open("/.kde/myconfig", O_WRONLY|O_TRUNC|O_CREAT);
read(myconfig);
...
fd = open("/.kde/myconfig.suffix", O_WRONLY|O_TRUNC|O_CREAT);
write(fd, bufferOfNewData, sizeof(bufferOfNewData));
fsync; /* paranoia - optional */
...
close(fd);
/* do_copy("/.kde/myconfig", "/.kde/myconfig~"); */ /* paranoia - optional */
rename("/.kde/myconfig.suffix", "/.kde/myconfig");

Reference: inside of fsync