From Fedora Project Wiki
Line 24: Line 24:


=== Threads ===
=== Threads ===
It is widely belived that using threads make our application
It is widely believed that using threads make our application performing better and faster. But it is not true every-time.
performing better and faster. But it is not true everytime.
Python is using [http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock Global Lock Interpreter] so the threading is profitable only for bigger I/O operations. We can help threads by optimizing them by [http://code.google.com/p/unladen-swallow/ unladded swallow] (still not in upstream).


Perl [http://perldoc.perl.org/threads.html threads] were created for application which run on systems without
Python is using [http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock Global Lock Interpreter] so the threading is profitable only for bigger I/O operations. We can help ourselves by optimizing them by [http://code.google.com/p/unladen-swallow/ unladen swallow] (still not in upstream).
fork (win32). In Perl are data by default copied for every thread ([http://en.wikipedia.org/wiki/Copy-on-write Copy On Write]) and if user includes module for sharing data  
 
([http://search.cpan.org/~jdhedden/threads-shared-1.32/shared.pm threads::shared]), then are data copied (Copy On Write) plus it creates tied variables, which takes even more time.  
Perl [http://perldoc.perl.org/threads.html threads] were originally created for application which run on systems without
It has pros and cons but definitely one should think about usage and run some performance tests.
fork (win32). In Perl threads are data copied for every thread ([http://en.wikipedia.org/wiki/Copy-on-write Copy On Write]). The data are not shared by default, because user should be able defining the level of data sharing. For sharing data could be included module ([http://search.cpan.org/~jdhedden/threads-shared-1.32/shared.pm threads::shared]), then are data copied (Copy On Write) plus the module creates for them tied variables, which takes even more time, so it's even slower.
 
Reference: [http://www.perlmonks.org/?node_id=288022 performance of threads]


In C threads share the same memory, each thread has his own stack, kernel doesn't have to create new file descriptors and allocate new
In C threads share the same memory, each thread has his own stack, kernel doesn't have to create new file descriptors and allocate new
memory space. Threads are taking advantage from more CPUs.
memory space. C can really use support of more CPUs for more threads.
 
Therefore, if you want have a better performance of your threads, you should be using some low language like C/C++. If you are using scripting languages, then it's possible write a binding in C. The low performing parts can be tracked down by profilers.
 
Reference: [http://people.redhat.com/drepper/lt2009.pdf improving performance of your application]

Revision as of 14:30, 10 November 2009

Development Tips

Oprofile

It is often mentioned that running oprofile is more complicated than using gprof, because it has to be started a daemon and loaded a kernel module. But gprof needs recompilation of an application and dependent libraries with -pg option, which could be worse in case you need to recompile also glib library. Setting and using oprofile:

Best practices

In every good course book are mentioned problems with memory allocation, performance of some specific functions and so on. The best thing what to do is buy a good book ;-)

Here is a short overview of techniques which are often problematic:

  • excessive I/O, power consumption, or memory usage - memleaks
  • threads
  • Avoid unnecessary work/computation
  • Wake up only when necessary
  • Do not actively poll in programs or use short regular timeouts, rather react to events
  • If you wake up, do everything at once (race to idle) and as fast as possible
  • Use large buffers to avoid frequent disk access. Write one large block at a time
  • Don't use [f]sync() if not necessary
  • Group timers across applications if possible (even systems)

And now some examples:

Threads

It is widely believed that using threads make our application performing better and faster. But it is not true every-time.

Python is using Global Lock Interpreter so the threading is profitable only for bigger I/O operations. We can help ourselves by optimizing them by unladen swallow (still not in upstream).

Perl threads were originally created for application which run on systems without fork (win32). In Perl threads are data copied for every thread (Copy On Write). The data are not shared by default, because user should be able defining the level of data sharing. For sharing data could be included module (threads::shared), then are data copied (Copy On Write) plus the module creates for them tied variables, which takes even more time, so it's even slower.

Reference: performance of threads

In C threads share the same memory, each thread has his own stack, kernel doesn't have to create new file descriptors and allocate new memory space. C can really use support of more CPUs for more threads.

Therefore, if you want have a better performance of your threads, you should be using some low language like C/C++. If you are using scripting languages, then it's possible write a binding in C. The low performing parts can be tracked down by profilers.

Reference: improving performance of your application