<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CUcARHw6eCp7ImA9WxBWF04.&quot;"><id>tag:blogger.com,1999:blog-6193377</id><updated>2010-02-09T17:30:45.210+01:00</updated><title>Rainer's Blog</title><subtitle type="html">This Blog is about many things Rainer is interested in. This happens to include syslog, astronomy and other fun things.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://blog.gerhards.net/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://blog.gerhards.net/" /><link rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>336</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/blogspot/cmfi" /><feedburner:info uri="blogspot/cmfi" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-nc-sa/2.0/" /><logo>http://creativecommons.org/images/public/somerights20.gif</logo><entry gd:etag="W/&quot;Ck4CQXw9eyp7ImA9WxBWF04.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-2603504274384417345</id><published>2010-02-09T16:56:00.002+01:00</published><updated>2010-02-09T16:56:00.263+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-09T16:56:00.263+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="reliability" /><category scheme="http://www.blogger.com/atom/ns#" term="syslog" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><category scheme="http://www.blogger.com/atom/ns#" term="auditing" /><title>Some thoughts on reliability...</title><content type="html">When talking syslog, we often talk about audit or other important data. A frequent question I get is if syslog (and &lt;a href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt; in specific) can provide a reliable transport.&lt;br /&gt;&lt;br /&gt;When this happens, I need to first ask what level of reliability is needed? There are several flavors of reliability and usually loss of message is acceptable at some level. &lt;br /&gt;&lt;br /&gt;For example, let's assume the process writes out log messages to a text file. Under (allmost?) all modern operating systems and by default, this means the OS accepts the information to write, acks it, does NOT persist it to storage and lets the application continue. The actual data block is usually written a short while later. Obviously, this is not reliable: you can lose log data if an unrecoverable i/o error happens or something else goes fatally wrong.&lt;br /&gt;&lt;br /&gt;This can be solved by instructing the operating system to actually persist the information to durable store before returning back from the API. You have to pay a big performance toll for that. This is also a frequent question for syslog data, and many operators do NOT sync and accept a small message loss risk to save themselves from requiring a factor of 10 servers of what they now need.&lt;br /&gt;&lt;br /&gt;But even if writes are synchronous, how does the application react? For example: what shall the application do if log data cannot be written? If one really needs reliable logging, the only choice is to shutdown the application when it can no longer log. I know of very few systems that actually do that, even though "reliability" is highly demanded. Here, the cost of shutting down the application may be so high (or even fatal), that the limited risk of log data loss is accepted.&lt;br /&gt;&lt;br /&gt;There are a myriad of things when thinking about reliability. So I think it is important to define the level of reliability that is required by the solution and do that in detail. To the best of my knowledge, this is also important for operators who are required by law to do "reliable" logging. If they have a risk matrix, they can define where it is "impossible" (for technical or financial reasons) to achieve full reliability and as of my understanding this is information auditors are looking for.&lt;br /&gt;&lt;br /&gt;So for all cases, I strongly recommend to think about which level of reliability is needed. But to provide an answer for the rsyslog case: it can provide very high reliability and will most probably fulfil all needs you may have. But there is a toll in both performance and system uptime (as said above) to go to "full" reliability.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-2603504274384417345?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/QUBNGYJ3iNk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/2603504274384417345/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=2603504274384417345" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2603504274384417345?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2603504274384417345?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/QUBNGYJ3iNk/some-thoughts-on-reliability.html" title="Some thoughts on reliability..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2010/02/some-thoughts-on-reliability.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4CQXw7cCp7ImA9WxBWFkk.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-2449940997829516628</id><published>2010-02-08T16:46:00.003+01:00</published><updated>2010-02-08T16:46:00.208+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-08T16:46:00.208+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="syslog" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>The typical logging problem as viewed from syslog</title><content type="html">I run into different syslog use cases from time to time. So I thought it is a good idea to express what I think the typical logging problem is. As I consider it the typical problem, syslog (and &lt;a href="http://www.winsyslog.com"&gt;WinSyslog&lt;/a&gt; and &lt;a href="http://www.rsyslog.com"&gt;rsyslog &lt;/a&gt;in specific) address most needs very well. What they spare is the analysis and correlation part, but other members of the family (like our &lt;a href="http://www.phplogcon.org"&gt;log analyzer&lt;/a&gt;) and third parties care well for that.&lt;br /&gt;&lt;br /&gt;So the typical logging problem, as seen from the syslog perspective, is:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;there exists events that need to be logged&lt;br /&gt;&lt;li&gt;a single "higher-level" event E may consist of a &lt;br /&gt;     number of fine-grained lower level events e_i&lt;br /&gt;&lt;li&gt;each of the e_i's may be on different&lt;br /&gt;     systems / proxies&lt;br /&gt;&lt;li&gt;each e_i consists of a subset of properties&lt;br /&gt;     p_j from a set of all possible common properties P&lt;br /&gt;&lt;li&gt;in order to gain higher-level knowledge, the&lt;br /&gt;     high-level event E must be reconstructed from&lt;br /&gt;     e_i's obtained from *various* sources&lt;br /&gt;&lt;li&gt;a transport mechanism must exist to move event&lt;br /&gt;     e_i records from one system to another, e.g., to&lt;br /&gt;     a central correlator&lt;br /&gt;&lt;li&gt;systems from many different suppliers may be involved,&lt;br /&gt;     resulting in different syntax and semantic of&lt;br /&gt;     the higher-level objects&lt;br /&gt;&lt;li&gt;there is potentially a massive amount of events&lt;br /&gt;&lt;li&gt;events potentially need to be stored for&lt;br /&gt;     an extended period of time&lt;br /&gt;&lt;li&gt;quick review of at least the current event data&lt;br /&gt;     (today, past week) is often desired&lt;br /&gt;&lt;li&gt;there exists lots of noise data&lt;br /&gt;&lt;li&gt;the data needs to be fed into backend processes,&lt;br /&gt;     like billing systems&lt;br /&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-2449940997829516628?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/qGG54dH0Z6k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/2449940997829516628/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=2449940997829516628" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2449940997829516628?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2449940997829516628?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/qGG54dH0Z6k/typical-logging-problem-as-viewed-from.html" title="The typical logging problem as viewed from syslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.gerhards.net/2010/02/typical-logging-problem-as-viewed-from.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkAGQHk-eip7ImA9WxBWE0U.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-6085038118010205062</id><published>2010-02-05T16:42:00.002+01:00</published><updated>2010-02-05T16:45:21.752+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-05T16:45:21.752+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="syslog" /><category scheme="http://www.blogger.com/atom/ns#" term="time" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>on leap seconds and syslog</title><content type="html">I was recently asked how syslog handles leap seconds. I thought it would be useful to reproduce my thoughts, initially expressed via private mail, here in the blog.&lt;br /&gt;&lt;br /&gt;RFC5424 specifically forbids leap seconds, as during our discussions we found many cases where leap seconds caused grief. I also think the the TAI is considering aborting the use of leap seconds for this reason as well. To the best of my knowledge, GPS also does not use leap seconds. The ultimate reason to abandon UTC leap seconds in syslog was the we failed to identify an operating system that would expose leap seconds to a user process. So a syslogd or any other syslog sender would not even be able to see that one was introduced. From the syslog perspective, a leap second is just like any other second, but time flows "somewhat slower". I guess we are in the same boat as many operating systems with this perspective.&lt;br /&gt;&lt;br /&gt;In RFC5424 we didn't explicitly state what time stamp should be written during a leap second - because we thought it could actually never happen (why? explained above!). But I would say that "Leap seconds MUST NOT be used" to me means that it should be expressed as the 59th second of said minute. But even if you bump the minute and use the 0 second, I cannot see how this should be problematic. On a single system, time should still evolve serially. For correlating events form multiple systems, the timestamp alone is insufficient in any case. You cannot closely enough synchronize the different real time clocks. So you need a different meachanism (like Lamport clocks) for this in any case.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-6085038118010205062?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/rbK8hbtmWdw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/6085038118010205062/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=6085038118010205062" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6085038118010205062?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6085038118010205062?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/rbK8hbtmWdw/on-leap-seconds-and-syslog.html" title="on leap seconds and syslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.gerhards.net/2010/02/on-leap-seconds-and-syslog.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIBSX09cSp7ImA9WxBXFko.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-7489757320294803645</id><published>2010-01-28T12:14:00.003+01:00</published><updated>2010-01-28T12:19:18.369+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-28T12:19:18.369+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Helping to find major issues in rsyslog...</title><content type="html">Rsyslog has a very rapid development process, complex capabilities and now gradually gets more and more exposure. While we are happy about this, it also has some bad effects: some deployment scenarios have probably never been tested and it may be impossible to test them for the development team because of resources needed. So while we try to avoid this, one may see a serious problem during deployments in demanding, non-standard, environments (hopefully not with a stable version, but chances are good you'll run into troubles with the development versions).&lt;br /&gt;&lt;br /&gt;Active support from the user base is very important to help us track down those things. Most often, serious problems are the result of some memory misadressing. During development, we routinely use valgrind, a very well and capable memory debugger. This helps us to create pretty clean code. But valgrind can not detect anything, most importantly not code pathes that are never executed. So of most use for us is information about aborts and abort locations.&lt;br /&gt;&lt;br /&gt;Unforutnately, faults rooted in adressing errors typically show up only later, so the actual abort location is in an unrelated spot. To help track down the original spot, &lt;a href="http://www.gnu.org/software/hello/manual/libc/Heap-Consistency-Checking.html"&gt;libc later than 5.4.23 offers support&lt;/a&gt; for finding, and possible temporary relief from it, by means of the MALLOC_CHECK_ environment variable. Setting it to 2 is a useful troubleshooting aid for us. It will make the program abort as soon as the check routines detect anything suspicious (unfortunately, this may still not be the root cause, but hopefully closer to it). Setting it to 0 may even make some problems disappear (but it will NOT fix them!). With functionality comes cost, and so exporting MALLOC_CHECK_ without need comes at a performance penalty. However, we strongly recommend adding this instrumentation to your test environment should you see any serious problems. Chances are good it will help us interpret a dump better, and thus be able to quicker craft a fix.&lt;br /&gt;&lt;br /&gt;In order to get useful information, we need some backtrace of the abort. First, you need to make sure that a core file is created. Under Fedora, for example, that means you need to have an "ulimit -c unlimited" in place.&lt;br /&gt;&lt;br /&gt;Now let's assume you got a core file (e.g. in /core.1234). So what to do next? Sending a core file to us is most often pointless - we need to have the exact same system configuration in order to interpret it correctly. Obviously, chances are extremely slim for this to be. So we would appreciate if you could extract the most important information. This is done as follows:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;    &lt;li&gt;$gdb /path/to/rsyslogd&lt;br /&gt;    &lt;li&gt;$info thread&lt;br /&gt;    &lt;li&gt;you'll see a number of threads (in the range 0 to n with n being&lt;br /&gt;      the highest number). For each of them, do the following (let's assume &lt;br /&gt;      that i is the thread number):&lt;br /&gt;          &lt;ul&gt;&lt;br /&gt;          &lt;li&gt;$ thread i (e.g. thread 0, thread 1, ...)&lt;br /&gt;          &lt;li&gt;$bt&lt;br /&gt;          &lt;/ul&gt; &lt;br /&gt;    &lt;li&gt;then you can quit gdb with "$q" &lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Then please send all information that gdb spit out to the development team. It is best to first ask on the forum or mailing list on how to do that. The developers will keep in contact with you and, I fear, will probably ask for other things as well ;)&lt;br /&gt;&lt;br /&gt;Note that we strive for highest reliability of the engine even in unusual deployment scenarios. Unfortunately, this is hard to achieve, especially with limited resources. So we are depending on cooperation from users. This is your chance to make a big contribution to the project without the need to program or do anything else except get a problem solved ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-7489757320294803645?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/xq1Rk_1c9ZA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/7489757320294803645/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=7489757320294803645" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/7489757320294803645?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/7489757320294803645?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/xq1Rk_1c9ZA/helping-to-find-major-issues-in-rsyslog.html" title="Helping to find major issues in rsyslog..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2010/01/helping-to-find-major-issues-in-rsyslog.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU8HRXk7cSp7ImA9WxBXFUo.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-1417090886068290575</id><published>2010-01-27T07:29:00.000+01:00</published><updated>2010-01-27T07:30:34.709+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-27T07:30:34.709+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Tools to detect stack adressing Problems?</title><content type="html">Since I have begun to use the valgrind memory debugger routinely in &lt;a href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt; development (some two years ago), the quality of the source has much increased. Unfortunately, however, valgrind is not able to detect problems related to misaddressing variables on the stack. The 5.3.6 bug I was hunting for almost a week is a good example of this. Valgrind also provides only limited support for global data, as far as I know (and see from testing results).&lt;br /&gt;&lt;br /&gt;This becomes an even more important restriction as I moved a lot of former heap memory use to the stack for performance reasons. I remember at least one more major bug hunting effort that was hard to find because it affected only stack space.&lt;br /&gt;&lt;br /&gt;So I am currently looking for tools that could complement valgrind by providing good stack checking capabilities. As one tool, mudflap was suggested to me. It sounds interesting, but gives me a very hard time [very hard to read debug output (no symbolic names for dlloade'ed modules, (false?) reports for areas where I can not see anything wrong as well as frequent (threading-related?) crashes when running under instrumentation). Maybe I am just misinterpreting the output...&lt;br /&gt;&lt;br /&gt;In short: I would highly appreciate suggestions for tools that can help with debugging stack memory access (global data would be a plus) - and/or instructions on how to interpret mudflap, if that is considered to be *the* tool for that use case.&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Rainer&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-1417090886068290575?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/MMyWH4SS13s" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/1417090886068290575/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=1417090886068290575" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1417090886068290575?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1417090886068290575?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/MMyWH4SS13s/tools-to-detect-stack-adressing.html" title="Tools to detect stack adressing Problems?" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2010/01/tools-to-detect-stack-adressing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MFQ3c5eip7ImA9WxBTE0U.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-5002840372499348754</id><published>2009-12-09T18:34:00.003+01:00</published><updated>2009-12-09T18:56:52.922+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-09T18:56:52.922+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog - feature "schedule"</title><content type="html">Every know and then somebody asks what "release schedule" I have on my mind for future rsyslog releases. Today, it happened again, so I'll do my usual blog post ;). Long time readers of this blog will know that this is a snapshot of what I have on my mind - open source development is quite dynamic and so what I actually implement can be quite different - and has been so in the past. It may be a good time to read my blog post describing &lt;a href="http://blog.gerhards.net/2009/11/priorities-for-rsyslog-work.html"&gt;how I assign priorities to rsyslog work&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;For the foreseeable future, I have two primary things on my mind: one is a set of tools to gain insight knowledge of rsyslog's inner workings &lt;b&gt;while&lt;/b&gt; it is running. This includes statistics, but goes beyond (still, a recent forum post on &lt;a href="http://kb.monitorware.com/can-rsyslog-log-statistics-about-the-events-processed-t9862.html"&gt;rsyslog statistics&lt;/a&gt; may be an interesting read). This feature is of interest for the community at large, but it is also something that I need to do some in-depth performance analysis plus it is a real great debugging helper. As such, I intend not only to provide the glue inside rsyslog, but also create a full-blown GUI so that the power can actually be used. If nothing comes in the way, this is my top priority for new work (I intended to begin with it during summer time, but then more important things came into its way - but now it is becoming really pressing...).&lt;br /&gt;&lt;br /&gt;The next feature I have on my mind is a change to the configuration language, which may also include some core changes. The community complains rightfully that rsyslog's configuration is a real pain. It is extremely hard to configure some of the most enhanced features - even I need to think hard about how to create some desired results. This is a result of the growth of rsyslog. When the current config system was invented (some three years ago?), we had a handful of low-power commands. This has dramatically changed. For some time, I intended to replace the config language solely by a scripting language. This I no longer believe in. A full-blown scripting language would be a very desirable enhancement, but the base configuration must be done without it (this is also a performance issue). Redoing the config language includes untangling some of the inner workings, adding more flexibility. I am working towards that goal for roughly two and a half month now and that part went well. Now I need to do the next step. I expect that a new config format requires at least a month, more realistic two, to materialize. But adding more features with the current config system is of limited use, because only "expert experts" could configure them. But while the config is important, it is on the second spot on my todo list, right after the GUI and diagnostics tools.&lt;br /&gt;&lt;br /&gt;GUI and diagnostics I expect another at least two month to get to something decent. Adding these numbers, I really do not think what the next larger features could be that I intend to implement. If all goes well, I can think about this in spring.&lt;br /&gt;&lt;br /&gt;Also, I am currently quite busy with some other, paid, projects. So the time I can spent on rsyslog at the moment is limited. I devote much of this time to fixing bugs, with a primary goal being to get v5 finally ready for prime time (it looks good, but we are not yet fully there).&lt;br /&gt;&lt;br /&gt;Also, I notice that adoption rate increases. I notice that by a large growth in support requests both on the mailing list as well as the forum. This is good new, but the bad news is that there are only few frequent contributors. So there is a lot of things that I need to take care of myself, and this needs increasingly more time - time that I obviously do not have for bug-fixing or developing new features. To get things a bit balanced, I have stopped responding to some questions, those that I think either a little google search can lead results to or those that obviously have a primarily commercial background. I'd like to respond to anything - but unfortunately, I simply do not have the time (if I did, rsyslog development would be totally stalled).&lt;br /&gt;&lt;br /&gt;As I said, this is just a snapshot of how things look. Maybe tomorrow a sponsor shows up that changes my todo list considerably (we had only very few occurrences of such, but we thankfully had ;)). Even with a sponsor, I am tied with work for the rest of this year, then I have a little vacation, some more paid work, so that I think I can begin working on larger features mid-January, maybe a bit later. Bottom line: don't take any "schedule" for granted, but I hope you get an overall idea of how things evolve. And: please continue to send in bug reports and feature request. Feature requests are very important - I use them (and their occurrence rate!) to judge how much demand for a feature there is in the community.&lt;br /&gt;&lt;br /&gt;Happy syslogging!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-5002840372499348754?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/yn_okOJAQyI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/5002840372499348754/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=5002840372499348754" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/5002840372499348754?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/5002840372499348754?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/yn_okOJAQyI/rsyslog-feature-schedule.html" title="rsyslog - feature &quot;schedule&quot;" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/12/rsyslog-feature-schedule.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4FQ3g6eyp7ImA9WxNaEEU.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-8209472828764076132</id><published>2009-11-24T18:05:00.004+01:00</published><updated>2009-11-24T18:31:52.613+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-24T18:31:52.613+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="performance" /><category scheme="http://www.blogger.com/atom/ns#" term="syslog" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog multithreading</title><content type="html">From time to time, I receive questions on how many cores rsyslog can run on a highly parallel system. Rsyslog is massivley multi-threaded, but that does not necessarily mean that each configuration, and even each use case, can actually benefit from it.&lt;br /&gt;&lt;br /&gt;The most important thing to gain a speedup from parallelism is the ability to break down the workload (this is called "partitioning") and distribute it to a set of threads, which than can work in parallel on each part.&lt;br /&gt;&lt;br /&gt;For the partitioning to work well, the workload, and configuration, must be "partionable". Let me give a counter-example. If you have a single sender and a single action (yes, this sometimes is the case!), there can not be much parallelism. Such a config looks like this (using imtcp as an example here):&lt;br /&gt;&lt;br /&gt;$TCPServerRun 10514&lt;br /&gt;*.* /path/to/file&lt;br /&gt;&lt;br /&gt;This can not gain much, because we have on thread for the TCP receiver, one thread for the filtering and one for the output. With the queue engine, we can increase the number of threads that will work on filters in parallel, but these have almost nothing to do in any case. We can not, however, walk in parallel into the output action, because a) the output plugin interface guarantees that only one thread hits a plugin at one time and b) it wouldn't make much sense here in any case: what would it help if we had hit the output twice and then need top synchronize the file access? No much...&lt;br /&gt;&lt;br /&gt;So the bottom line is that a configuration like the one above is highly sequential in nature and consequently there is almost no gain by running some of the tasks concurrently. So, out of the box, rsyslog gains speedup from parallel processing in more complex cases, with more complex rule and many of them.&lt;br /&gt;&lt;br /&gt;We are working the provide excellent speedup even for sequential configurations. But this is a long and complex road. For example, in v5 we have now de-coupled message parsing from the receiver thread, resulting in somewhat improved speedup for sequential configs like the one above. Also, we have added batching support in v5, which reduces some overhead involved with multiple threads (and thus reduces the gain we could potentially have). And in late v4 builds we introduced the ability to do double-buffered block i/o for output files, which can considerably reduce i/o overhead for high end systems and also runs in pipeline mode, sequzing a bit more parallelism out of the sequential job.&lt;br /&gt;&lt;br /&gt;So with the newer engines, we have been able to apply a basic processing pipeline that looks like&lt;br /&gt;&lt;br /&gt;input -&gt; parse &amp; filter -&gt; generate file data -&gt; write&lt;br /&gt;&lt;br /&gt;which can be done in parallel. Of course, the file write is action-specific, but I guess you get the idea. What you need to do, however, is configure all that. And even then, you can not expect a 4-time speedup on a quad core system. I'd say you can be happy if the speedup is around 2, depending on a lot of factors.&lt;br /&gt;&lt;br /&gt;To get to higher speedups, the job must be made more parallel. One idea is to spread the input, e.g. run it on four ports, then create four rulesets with ruleset queues for each of the inputs. Ideally, to solve the file bottleneck, these should write into four different files. While I did not have the opportunity to test this out in an actual deployment, that should gain a much larger speedup. Because now we have four of this pipelines running in parall, on partitioned data where there is no need to synchronize between them.&lt;br /&gt;&lt;br /&gt;Well, almost... The bad news is that the current code base (5.5.0 as of this writing) does unfortunately not yet provide the ability to run the input on more than one thread. So if you have 1000 tcp connections, all of these need to be processed by a single thread (even though they may use different ports, that doesn't matter...). It is not as bad as it sounds, because the input now is *very* quick (remember the parsing is done concurrently in a different thread [pool!]). But still it causes some loss of parallel processing where not strictly needed. My thinking is that we should either do a "one thread per connection" server (not any longer such a big problem on 64bit machines) or (better but even more effort) do a thread pool for pulling data from the connections. Unfortunately, I do not have time to tackle that beast, but maybe someone is interested in sponsoring that work (that would be *really* useful)?&lt;br /&gt;&lt;br /&gt;As you can see, full speedup by using multiple cores is perfectly doable, but going the max requires a lot of careful thinking. And, of course, I have to admit that the best features are present in the newest releases (somewhat naturally...). Obviously, there is some stability risk involved with them, but on the other hand I had some very good success reports from some high-end sites, at least on of them has v5 already deployed in large-scale production.&lt;br /&gt;&lt;br /&gt;I could only touch the issue here, but I hope the information is useful. For further reading, I recommend both the doc on queues, as well as &lt;a href="http://www.rsyslog.com/doc-queues_analogy.html"&gt;my explanation on how messages are processed in rsyslog&lt;/a&gt;. These documents are somewhat older and do not cover all details of pipeline processing (which simply did not exist at that time), but I think they will be very useful to read. And, yes, updating them is another thing on my too-long todo list...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-8209472828764076132?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/qiMW6dO1vG4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/8209472828764076132/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=8209472828764076132" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8209472828764076132?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8209472828764076132?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/qiMW6dO1vG4/rsyslog-multithreading.html" title="rsyslog multithreading" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/rsyslog-multithreading.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU4DSH4zfip7ImA9WxNbF00.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-5514937649461676181</id><published>2009-11-20T09:42:00.004+01:00</published><updated>2009-11-20T10:06:19.086+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-20T10:06:19.086+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog internal messages</title><content type="html">I had an interesting conversation with someone who runs multiple instances of &lt;a href="http://www.rsyslog.com"&gt;rsyslog &lt;/a&gt;on a machine for remote reception, only and for some reasons, another syslogd for local messages. The question arose where rsyslog error messages are emitted to.&lt;br /&gt;&lt;br /&gt;It was expected that the showed up in the other syslogd. However, that is not the case, and for good reason. So I thought it is good to provide some general advise on how internal messages are emitted.&lt;br /&gt;&lt;br /&gt;First of all, internal messages are messages generated by the rsyslog itself. The vast majority of them is error messages (like config error, resource error, unauthorized connect etc...), but there are also some status-like messages (like rsyslogd startup and shutdown, unexpectedly dropping tcp connection, ...). Traditionally, rsyslog does not make a distinction between status and error messages (we could change that over time, but so far nobody asked what means this is not worth the hassle).&lt;br /&gt;&lt;br /&gt;Rsyslogd is a syslogd, so all message it emits internally are syslog messages. For obvious reasons, they use the "syslog" facility. And as all are flagged as error message, to total priority is "syslog.err". The internal message source is implicitly bound to the default ruleset.&lt;br /&gt;&lt;br /&gt;It now depends on how that ruleset is defined where these messages show up. I strongly encourage everyone to include a rule that logs these message. If there are some e.g. config issues, they can be easily solved by looking at the emitted error message. But if you do not have them, it can take you ages to sort out what is wrong.&lt;br /&gt;&lt;br /&gt;So you should always make sure that "syslog.err" (or probably better "syslog.*") is logged somewhere.&lt;br /&gt;&lt;br /&gt;If you now would like to use another syslogd to log these messages, but not rsyslog itself, you do what you usually do in this situation: first of all, make sure that no local rule logs syslog.* messages. Then, include a rule that forward syslog.* to the recipient that you want to receive it. You have the full flexibility of the rule engine at hand to limit or reformat those messages. Note that an elegant solution to do both is including the following 2 lines at the top of rsyslog.conf (I assume you use UDP-forwarding to another syslogd running on the same host machine):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;syslog.* @127.0.0.1&lt;br /&gt;&amp; ~&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Note that the tilde character is the discard action.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-5514937649461676181?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/IhHL5Cz6EHY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/5514937649461676181/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=5514937649461676181" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/5514937649461676181?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/5514937649461676181?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/IhHL5Cz6EHY/rsyslog-internal-messages.html" title="rsyslog internal messages" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/rsyslog-internal-messages.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck4DRHk8fCp7ImA9WxNbFkg.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-889714488058239865</id><published>2009-11-19T18:08:00.002+01:00</published><updated>2009-11-19T18:16:15.774+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-19T18:16:15.774+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>disk assisted mode performance in rsyslog</title><content type="html">I would just like to clearify one thing: rsyslog supports disk assistance in which case messages are written to disk if the in-memory queue becomes full.&lt;br /&gt;&lt;br /&gt;However, it is generally bad if the system needs to go to the disk during normal operations. That is primarily meant for things like output targets going offline. If this happens during normal operations, one is probably lost. In the v3&amp;v4 engines, when disk mode is enabled, the in-memory worker threads are shut down. So all processing then takes place over the disk. That means processing will be slower than before. So if the system was incapable of handling the work load when running on a pure in-memory queue, it will definitely be incapable of handling it in disk mode.&lt;br /&gt;&lt;br /&gt;Note that things are different in recent v5 engines: starting with 5.3.5, the disk worker runs concurrently to the in-memory workers and as such the performance is similar to what it was in non-disk mode. Still, overall processing is probably slower, so going to disk is not a cure for a system that can not handle the overall workload. In v5, however, it may be a way to handle excess bursts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-889714488058239865?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/YDpfAX6emYw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/889714488058239865/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=889714488058239865" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/889714488058239865?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/889714488058239865?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/YDpfAX6emYw/disk-assisted-mode-performance-in.html" title="disk assisted mode performance in rsyslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/disk-assisted-mode-performance-in.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D08HRngyeSp7ImA9WxNbFk8.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-3742185388556600917</id><published>2009-11-19T10:41:00.005+01:00</published><updated>2009-11-19T11:17:17.691+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-19T11:17:17.691+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Priorities for rsyslog Work</title><content type="html">I receive requests for support and code additions to &lt;a href="http://www.rsyslog.com"&gt;rsyslog &lt;/a&gt;every day and I am grateful so many people express their interest and see rsyslog as a useful tool.&lt;br /&gt;&lt;br /&gt;The bottom line, unfortunately, is that I can not do everything and I also can not do many things as quickly as I would like to. Also, I have to admit, there are some things that I do not like to do, at least as a cost-free activity. The typical example is work that benefits only a single or small subset of commercial organizations.&lt;br /&gt;&lt;br /&gt;I suggest that you read a bit about my philosophy on how &lt;a href="http://blog.gerhards.net/2009/11/paying-for-open-source-projects.html"&gt;open source projects are paid&lt;/a&gt; philosophy. Note that "payment" includes for more things other than money, for example good suggestions and bug reports.&lt;br /&gt;&lt;br /&gt;I tend to follow this priority scheme, with some variations:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;security-related issues&lt;br /&gt;&lt;li&gt;serious problems in the current release&lt;br /&gt;&lt;li&gt;serious problems in previous releases&lt;br /&gt;&lt;li&gt;paid work&lt;br /&gt;&lt;li&gt;things useful to the community at large&lt;br /&gt;&lt;li&gt;things useful to smaller parts of the community (with descending priority)&lt;br /&gt;&lt;li&gt;support for non-commercial folks&lt;br /&gt;&lt;li&gt;bugs in older releases already fixed in newer ones&lt;br /&gt;&lt;li&gt;activities aiding only commercial organizations&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;The term "things useful" is deliberately vague. Among others, it involves fixing bugs, adding new features and following support requests. However, support requests usually fall only in that category if either a bug is involved or I can gain some more insight into things that need to be changed (like better doc, general user needs, etc...).&lt;br /&gt;&lt;br /&gt;Note that, as of my philosophy, I try to avoid doing work for free that only benefits a commercial party, but neither me personally nor the project. If you find this harsh, read &lt;a href="http://blog.gerhards.net/2009/11/paying-for-open-source-projects.html"&gt;my in-depth explanation&lt;/a&gt; of that philosophy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-3742185388556600917?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/1Xi9RMJFwHM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/3742185388556600917/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=3742185388556600917" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3742185388556600917?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3742185388556600917?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/1Xi9RMJFwHM/priorities-for-rsyslog-work.html" title="Priorities for rsyslog Work" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/priorities-for-rsyslog-work.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEMFQHg_cSp7ImA9WxNbF08.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-1611336488146268055</id><published>2009-11-19T10:38:00.005+01:00</published><updated>2009-11-20T15:13:31.649+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-20T15:13:31.649+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="philosophy" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Paying for Open Source Projects...</title><content type="html">A selected the word "paying" in this post's title deliberately. Of course, open source software usually is (and should be) cost-free to all interested parties, but that does not mean there comes no price tag whatsoever with it.&lt;br /&gt;&lt;br /&gt;As an open source author I need to admit that it is virtually impossible to give away everything without any price. "Price", in my perception, does not necessarily mean "money". There are many benefits you may gain from working on software, and money is only one of them.&lt;br /&gt;&lt;br /&gt;But first of all, let me re-iterate the &lt;a href="http://www.gnu.org/philosophy/free-sw.html"&gt;FSF's "freedom vs. free beer"&lt;/a&gt; argument, in which I fully believe:&lt;br /&gt;&lt;blockquote&gt;"Free software" is a matter of liberty, not price. To understand the concept, you should think of "free" as in "free speech," not as in "free beer."&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;This is very true. In my personal mind, I would really love to give away any work I create to those that need it. But that thought involves some subtle issues. One prominent problem is that other people may think differently. For example, my landlord doesn't like this idea. Nor does my bakery. Not even the computer manufacturer, on whom's system I develop my software! What a shame! So if I gave away everything for free, I would, thanks to the social security system, probably not die, but I would definitely not have a machine to create those things I would like to provide for free.&lt;br /&gt;&lt;br /&gt;So it looks like I need to make a compromise, give away certain things and charge for others. One approach would be to quick computing as a profession and become a gardener instead. In my spare time I could then program and give away everything for free. The bottom line is that I could program much less than I can do currently. Also, I prefer programming over gardening. So this does not look like a good approach - neither for me personally (the then-unhappy gardener) nor for society at large (who can no longer gain the full benefit of my work: believe me, I am far more productive as a programmer as opposed to a gardener...).&lt;br /&gt;&lt;br /&gt;So this seems to be the wrong approach. It naturally follows that I need to charge for some of the computing work I do. &lt;br /&gt;&lt;br /&gt;Then, look at my motivation as an open source developer. I'd like to make the world a little bit a better place, providing useful tools. And, if I am honest, I may even like to get a little bit of fame as a recognized open source developer. I guess that motivates many, but few admit to it ;) This hits a sweet spot of "payment": being recognized feels good and thus it keeps me motivated. Seeing the project grow and spread also motivates me. Projects where there is no feedback and which do not grow are usually quickly abandoned. Why? Because not even the most basic "payment" is provided in exchange for the work done.&lt;br /&gt;&lt;br /&gt;So a very important form of "payment" to open source authors, at least in my point of view, are contributions to the project, help in spreading news about it, and, (very, very valuable) good bug reports. Everything that helps push a project and make it evolve. Of course contributions in any form are also happily accepted (be it software, hardware, book, ...., and of course money). Money is not evil. It pays the electricity to run my machine, among others.&lt;br /&gt;&lt;br /&gt;Taken the arguments together, there is no ideal world where I can give away everything and receive in exchange whatever I need (and, I barely remember, experiments in trying this failed miserably...).&lt;br /&gt;&lt;br /&gt;With that on my mind, I begin to divide the world in "friends" and "foes". Friends are those that provide me with some form of "payment", that is anything that is useful for me. Good examples are the folks that write the open source software *I* use (aha, this is cyclic!), folks that provide good bug reports and try out development versions etc. Any activity that I can also use to my benefit makes you my friend. &lt;br /&gt;&lt;br /&gt;Then, there are "foes". That world probably is too hard and maybe should be re-phrased as "non-friends". But the term and the idea is well known.&lt;br /&gt;&lt;br /&gt;If you are not my friend, you do not contribute anything that I can use for my benefit. This doesn't mean you are a bad guy. If you and I do not have anything in common, why should you do something that benefits me? There are far more people that I never provided any benefit to than there are people where I did. I guess that is true for almost all of us except a few outstanding people (which then usually receive admiration as a sort of "payment").&lt;br /&gt;&lt;br /&gt;But if you are not my friend, you should not expect from me that I do anything for free for you. Envision a stranger comes to your home and asks you if you would like to help him build his home. I guess you will be astonished and probably ask "Why should I do that?". Now assume the sole answer is "Because that is good for me, the stranger, but you need to bring your own clothes and tools and need to pay the gas to come to my home". Would you be willing to help that guy out? I guess, the answer would be "no" in almost all cases.&lt;br /&gt;&lt;br /&gt;So why should I as an open source developer create software for or otherwise help a non-friend? Why am I supposed to say "yes, of course" if a stranger asks me "Can you implement this and that, but you need to pay for your own hardware and the electricity used and also for..."? The answer is: I am not! So don't silently expect me to do that.&lt;br /&gt;&lt;br /&gt;Of course, the question itself may have made you my friend. How come? Simple: the idea you propose may be a very useful idea for my project. If it gets implemented, it will help many of my currently existing friends and it will eventually help spread the project. So by providing the idea itself, you did me a big favor, which one may consider as a form of "payment". Consequently, I often implement things asked for by complete strangers. And I often try to help out complete strangers on the mailing list and on other support channels. Here, I often learn a real lot about what is good and bad about my projects. This is a very valuable for of "payment" for me.&lt;br /&gt;&lt;br /&gt;HOWEVER, and this is my personal limit, whenever I am asked to do something for free, I evaluate *my* benefit in doing so. Of course, this includes the benefit to the project and the benefit to the community at large, but this all goes into the picture of "my" benefit as the sum of all that.&lt;br /&gt;&lt;br /&gt;So if a complete stranger asks me to do something, I check for immediate benefits in doing that. Sometimes, there are cases where I can see a benefit, but only to that stranger. Usually, these are things corporate guys need, and they are very special and non-generic. If there is no benefit at all, I simply do not look any further. Of course, the proper solution here is that those folks can actually pay money to make me implement whatever they need. The logic behind this is that when they pay money, the help fund activities that also benefit the project at large. But if they are corporate guys, and they do not get any money approved for what they (think they) need, they don't really need it at all! Because if it were really useful for there corporation, they would have received the money grant (corporations are very good in making these trade-offs, though they occasionally fail ;)). So in short, the money is even a filter that prevents me from spending time on things that nobody really needs!&lt;br /&gt;&lt;br /&gt;If a friend comes along and asks me to do something, I still need to evaluate the request. But I am much more likely to implement the functionality requested (its a game of "give and take"). Of course, I need to evaluate the overall priority for my project here, too. But friends definitely receive a priority boost if at all possible. And I think this is only fair.&lt;br /&gt;&lt;br /&gt;In general, I favor requests that are useful to the community at large over those that are only useful to a small subset of it. I tend not to implement without any form or "hard" payment (hardware, money, a nice vacation on Hawaii... ;)) anything that is only useful to a single commercial organization. For example, why should I provide free services to a company that expects me to pay, e.g. the utility bill? If you do not give your services to me for free, don't expect me to give my time for free to &lt;span style="font-weight:bold;"&gt;just &lt;/span&gt;your benefit (think about the "stranger asking for my help on building his home" analogy).&lt;br /&gt;&lt;br /&gt;My thoughts my sound very material, but in fact they just describe on what I think is fair in the non-perfect world we live in. Remember that most non-profit organizations are my friend, because they offer useful service to "me" (as part of the community). And think about my thoughts in the intro of this blog post about my inability to do any useful work at all if I would NOT have a somewhat material point of view. So, honestly, I think my philosophy here is not actual "material" but rather a result of how life is...&lt;br /&gt;&lt;br /&gt;Edit: it may also useful to have a look at my blog post "&lt;a href="http://blog.gerhards.net/2008/08/work-and-personality.html"&gt;work, friends and personality&lt;/a&gt;", which looks at a very similar issue from a slightly different angle.&lt;br /&gt;&lt;br /&gt;The philosophy also influences priority decisions in my open source projects, as outlined for example in "&lt;a href="http://blog.gerhards.net/2009/11/priorities-for-rsyslog-work.html"&gt;rsyslog work priorites&lt;/a&gt;".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-1611336488146268055?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/tYFl-ewN3XY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/1611336488146268055/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=1611336488146268055" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1611336488146268055?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1611336488146268055?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/tYFl-ewN3XY/paying-for-open-source-projects.html" title="Paying for Open Source Projects..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/paying-for-open-source-projects.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkUARXo8cSp7ImA9WxNbE0U.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-6198447018256814342</id><published>2009-11-16T15:01:00.001+01:00</published><updated>2009-11-16T15:04:04.479+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-16T15:04:04.479+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>ACLs, imudp and accepting messages</title><content type="html">I am working again on moving the DNS name resolution outside of the input thread of those sources where this is potentially time-consuming and affecting message acceptance rates. As it turned out, currently imudp seems to be the only case.&lt;br /&gt;&lt;br /&gt;While this is potentially easy to do, a problem is ACLs ($AllowedSender) which use system names rather than ip addresses. In order to check these ACLs, we need to do a DNS lookup. Especially in the case of UDP, such a lookup may actually case message loss and thus may be abused by an attacker to cause a certain degree of denial of service (what also points out that these types of ACLs are not really a good idea, even though requested by practice).&lt;br /&gt;&lt;br /&gt;In the light of this, I will now do something that sounds strange at first: I will always accept messages that require DNS lookups and enqueue these into the main queue and do the name resolution AND the final name-based ACL check only on the queue consumer part. Please note that it will be done BEFORE message content is parsed, so there is no chance that buffer overlow attacks can be carried out from non-authenticated hosts. The core idea is to move the lengthy, potentially message-loss causing code, away from the input thread. The only questionable effect I can currently see is that queue space is potentially taken up by messages which will immediately be discarded and should not be there in the first place. At the extreme end, that could lead to loss of valid messages. But on the other hand valid messages are more likely to be lost by the DNS name query overhead if I do the ACL check directly in the input thread.&lt;br /&gt;&lt;br /&gt;If anyone has an argument against this approach please let me know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-6198447018256814342?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/tg8TcPiVkMA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/6198447018256814342/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=6198447018256814342" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6198447018256814342?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6198447018256814342?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/tg8TcPiVkMA/acls-imudp-and-accepting-messages.html" title="ACLs, imudp and accepting messages" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/acls-imudp-and-accepting-messages.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEUHQ34_eip7ImA9WxNUFU0.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-3401302533709068717</id><published>2009-11-06T12:02:00.005+01:00</published><updated>2009-11-06T12:17:12.042+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-06T12:17:12.042+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="syslog" /><category scheme="http://www.blogger.com/atom/ns#" term="logging" /><category scheme="http://www.blogger.com/atom/ns#" term="log analysis" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>A solution for invalid syslog message formats...</title><content type="html">&lt;span style="font-weight: bold;"&gt;In syslog, we traditionally have a &lt;/span&gt;&lt;a style="font-weight: bold;" href="http://www.monitorware.com/en/workinprogress/needle-in-haystack.php"&gt;myriad of message formats&lt;/a&gt;&lt;span style="font-weight: bold;"&gt;, causing &lt;/span&gt;&lt;a style="font-weight: bold;" href="http://www.rsyslog.com/doc-syslog_parsing.html"&gt;lots of trouble&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; in real-world deployments.&lt;/span&gt; There are a number of industry efforts underway trying to find a common format. To me, it currently does not look like one of them has received the necessary momentum to become "&lt;span style="font-weight: bold;"&gt;the&lt;/span&gt;" dominating standard, so it looks like we need to live with various presentations of the same information for some more time.&lt;br /&gt;&lt;br /&gt;The past two weeks, I have begun to make additions to &lt;a href="http://www.rsyslog.com/"&gt;rsyslog&lt;/a&gt; that hopefully will help solve this unfortunate situation. I know that I have no real cure to offer, but at least baby steps toward it. I have introduced so called &lt;a href="http://www.rsyslog.com/doc-messageparser.html"&gt;message parsers&lt;/a&gt;, which can be utilized to convert malformed messages into rsyslog's well-formed internal structure.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why is it not a solution?&lt;/span&gt; Because what I really introduced was actually an interface, which permits to write different parsers for the myriad of devices. I have not provided a generic solution to do that, so the individual parsers need to be written. And secondly, I have not yet defined any more standard properties than those specified in the recent IETF syslog rfc series, most importantly &lt;a href="http://tools.ietf.org/html/rfc5424"&gt;RFC5424&lt;/a&gt;.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;So why I hope this will lead to a long-term solution?&lt;/span&gt; First of all, there are some hopes that the IETF effort will bring more standard items. Also, we could embed other specifications within the RFC5424 framework, so this could become the lingua franca of syslog message content over time. And secondly, I hope that rsyslog's popularity will help in getting parsers at least for core RFC5424 information objects, which would be the basis for everything else. Now we have the capability to add custom parsers, and we have an interface that third parties can develop to (and do so with relative ease).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;All in all, I think this development is a huge step into the right direction&lt;/span&gt;. The rest will be seen by history ;) To me, the probably most interesting question is if we will actually attract third party developers. If there are any, I'll definitely will help get them going with the rsyslog API.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-3401302533709068717?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/tybrAKQpOE0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/3401302533709068717/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=3401302533709068717" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3401302533709068717?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3401302533709068717?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/tybrAKQpOE0/solution-for-invalid-syslog-message.html" title="A solution for invalid syslog message formats..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/11/solution-for-invalid-syslog-message.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEUHQ3c_cCp7ImA9WxNVFks.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-6538338416383361024</id><published>2009-10-27T17:31:00.003+01:00</published><updated>2009-10-27T17:50:32.948+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-27T17:50:32.948+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>next round of performance enhancement in rsyslog</title><content type="html">Today, I made a very important change to rsyslog:&lt;a href="http://www.rsyslog.com/doc-rsconf1_rulesetcreatemainqueue.html"&gt; rulesets now can have their own "main" queue&lt;/a&gt;. This doesn't sound too exciting, but can offer dramatic performance improvements.&lt;br /&gt;&lt;br /&gt;When rsyslog was initially created, it followed the idea that messages must be processed in the order they were received. To facilitate that, all inputs submitted message to a single main message queue, off from which the processing took place. So messages stayed in reception order. ... Well, actually they stayed in "enqueued order", because it depended on the OS scheduler if input modules could really enqueue in the order they received. If, for example, input A received two messages, but was preempted by module B's message reception, B's data could hit the queue earlier than A's. As rsyslog supported more and more concurrency, the order of messages did become ever less important. The real cure for ordered delivery is to look at high-precision timestamps and build the sort order based on them (in the external log analyzer/viewer).&lt;br /&gt;&lt;br /&gt;So, in essence, reception order never has worked well and the requirement to try keep it has long been dropped. That also removed one important reason for the single main message queue. Still, it is convenient to have a single queue, as its parameters can be set once and for all.&lt;br /&gt;&lt;br /&gt;But a single queue limits concurrency. In the parallel processing world, we try to partition the input data as much as possible so that the processing elements can independently work on the data partitions. All data received by a single input is a natural data partition. But the single main queue merged all these partitions again, and caused performance bottlenecks via lock contention. "Lock contention", in simple words, means that threads needed to wait for exclusive access to the queue.&lt;br /&gt;&lt;br /&gt;This has now been solved. Today, I created the ability to create ruleset-specific queues. In rsyslog, the user can decide which ruleset is bound to which inputs. For a highly parallel setup, each input should have its own ruleset and each ruleset should have defined its own "main" queue. In that setting, inputs do no longer block each other during queue access. On a busy system with many inputs, the results can be dramatic. And as more as a side-effect, each ruleset is now processed by its dedicated rule processing thread, totally independent from each other.&lt;br /&gt;&lt;br /&gt;This design offers a lot of flexibility. But that is not enough. The next step I plan to do is to create the ability to submit a message to a different ruleset during processing. That way, hierarchies of rulesets can be created, and these rulesets can even be executed via separate thread pools, with different queue parameters and in full concurrency. And the best is that I currently think it will not be very hard to create the missing glue.&lt;br /&gt;&lt;br /&gt;The only really bad thing is that the current configuration language is really not well-suited to handle that complexity ("really not" is not a type for "not really"...). But I have no alternative than to take this route again, until I finally find time to create a new config language. The only good thing is that I get better and better understanding of what this new language must be able to do, and it looks that my initial thoughts were not up to what now is required...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-6538338416383361024?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/rn2lDsyO8dg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/6538338416383361024/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=6538338416383361024" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6538338416383361024?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6538338416383361024?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/rn2lDsyO8dg/next-round-of-performance-enhancement.html" title="next round of performance enhancement in rsyslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/10/next-round-of-performance-enhancement.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUUMQ307fCp7ImA9WxNWE0k.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-3290046591006565366</id><published>2009-10-12T12:27:00.000+02:00</published><updated>2009-10-12T12:28:02.304+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-12T12:28:02.304+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Canonical Paper on RSyslog</title><content type="html">I just found out that Canonical (the Company behind Ubuntu) did a nice paper on rsyslog, which also explains why &lt;a href="http://www.ubuntu.com/system/files/CentralLogging-v4-20090901-03.pdf"&gt;Ubuntu chooses rsyslog as its default syslogd&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It is interesting to see that the paper is well-written and well-researched, but rsyslog has also evolved while the paper has been written. So in fact, it offers even more features than described in the paper.&lt;br /&gt;&lt;br /&gt;And, obviously, I am glad to see Ubuntu move to rsyslog as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-3290046591006565366?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/mXEx0k6dBpw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/3290046591006565366/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=3290046591006565366" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3290046591006565366?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/3290046591006565366?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/mXEx0k6dBpw/canonical-paper-on-rsyslog.html" title="Canonical Paper on RSyslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/10/canonical-paper-on-rsyslog.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUFSXs9fSp7ImA9WxNXGE8.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-8446796541258965199</id><published>2009-10-06T11:51:00.003+02:00</published><updated>2009-10-06T12:16:58.565+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-06T12:16:58.565+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="logging" /><category scheme="http://www.blogger.com/atom/ns#" term="windows event log" /><category scheme="http://www.blogger.com/atom/ns#" term="windows" /><title>Will Microsoft remove the Windows Software RAID?</title><content type="html">These days, hardware rates are quite inexpensive. So everybody is moving towards them. However, all mainstream operating systems still support software RAIDs, maybe even for a good reason: an os-controlled software raid may be a bit better to optimize under some circumstances. Anyhow. Microsoft seems to move away from that feature set:&lt;br /&gt;&lt;br /&gt;As you probably know, Adiscon provides premier &lt;a href="http://www.monitorware.com"&gt;Windows event log processing &lt;/a&gt;solutions. Some of our customers use the products for example to monitor if their RAIDs break. And some of them use software RAIDs. So we wrote a nice article on how to monitor &lt;a href="http://www.eventreporter.com/common/en/articles/software_raid_monitoring_windows_2003.php"&gt;RAID health using the Windows Event Log&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Since the days of Windows NT 3.1 (or was it 3.5), the Windows logged an error message if the RAID failed. Actually, I'd consider this a necessary functionality for any working RAID solution. Why? Well, if the RAID solution works, you will not notice that a disk has died. So if nobody tells you, you'll continue to use the system as usual, not suspecting anything bad. So guess what - at some time the next disk fails and then (assuming the usual setup) you'll be "notified" by the disk system, with those nice unnercoverable i/o errors. So without any health alerts, a RAID system is virtually useless.&lt;br /&gt;&lt;br /&gt;We learned, that Windows Server 2008's RAID system does no longer issue these alerts! (aka "is useless" ;)). So a long while ago, we reported this to Microsoft. The bug went through several stages of escalation. A few minutes ago, my co-worker got a call from the frontline Microsoft tech. He told him that, regrettably, Microsoft won't fix this issue. According to his words, Micorosoft has confirmed this to be a bug, and the group responsible for ftdisk has confirmed that it should be fixed but someone more powerful up in the hierarchy has opted not to do that. Boom. The tech tried to persuade us to switch to a hardware RAID, but actually that was not the point of the support call ;) &lt;br /&gt;&lt;br /&gt;What does that mean? To me, it looks like Microsoft is actually moving away from providing software RAID. How other can one explain that there is no interest in providing any error message at all if something goes wrong with the RAID. Given the wide availability of hardware RAIDs (which, btw, provide proper alerting), this step does  not look illogical. But do they really want to leave Linux with being the only widely deployed mainstream operating system that provides software RAID? Or do they intend to keep it on the feature sheet, but provide a dysfunctional solution like in Windows Server 2008?&lt;br /&gt;&lt;br /&gt;Let's stay tuned and listen what the future brings...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-8446796541258965199?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/2Wcx8C16hWw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/8446796541258965199/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=8446796541258965199" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8446796541258965199?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8446796541258965199?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/2Wcx8C16hWw/will-microsoft-remove-windows-software.html" title="Will Microsoft remove the Windows Software RAID?" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/10/will-microsoft-remove-windows-software.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4DRH45cSp7ImA9WxNXF04.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-6051991532157924422</id><published>2009-10-05T12:10:00.003+02:00</published><updated>2009-10-05T12:36:15.029+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-05T12:36:15.029+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Another note on hard-to-find-bugs...</title><content type="html">Before I began to write this blog post, I realized how long I had not written anything! I promise to begin to write in a more timely manner, but the past weeks were merely a consolidation phase, ironing out bugs from the new releases.&lt;br /&gt;&lt;br /&gt;I'd like to elaborate on one of these, one that really drove me crazy the past days. The problem was that omfile's directory creation mode was sometimes set to zero (well, almost always in some configurations). What began as a minor nit turned into a real nightmare ;)&lt;br /&gt;&lt;br /&gt;The issue was that the variable fDirCreateMode was always set to zero, except if it was written to at the start of module initialization or when it was simply displayed at start of module initialization. That sounded strange, but even stranger seemed that by moving around the variable definition in the sources code (and thus assumingly changing its memory location), nothing changed. So I came to a point where I used this code as a patch:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/omfile.c;h=c938f18c554f297bb4e9897a02785829f90ce26a;hb=36bfaf63485a444d58ca359191377b6694720a37#l769"&gt;omfile.c from rsyslog git&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Look at line 769. With that seemingly unrelated write, the variable stayed as expected. However, if I moved the write to a function, nothing worked again. Strange... After committing the patch, testing showed that the directory permissions now worked well BUT the file create mode now behaved wrong in the same way.&lt;br /&gt;&lt;br /&gt;I was stunned - and speechless. What followed, were intense debugging sessions. I ended up finding the commit that introduced the problem, but still could not see why that commit would affect anything. After hours of debugging, I ended up with a stripped-down and almost codeless omfile, which still had the same problem. And it appeared and disappeared almost at random as code lines were moved in and out.&lt;br /&gt;&lt;br /&gt;I once again checked the git history and then I noticed that a few commits down the line, I had introduced another config variable for the io buffer size. Now I finally had the idea. The size-type config directives were introduced for file size restrictions. Thus, the regular 32 bit integer is not sufficiently large for them. Consequently, they needed 64 bit integers as pointers! But, of course, I had provided only a pointer to a 32 bit int, thus the config handler overwrote another 32 bits that happened to be close to the address I provided.&lt;br /&gt;&lt;br /&gt;This was clearly an error. But could it explain the behavior I saw? Not really... But the problem went away once I had corrected the issue. So I began to suspect the that compiler hard re-ordered variable memory assignment in order to optimize access to them (maybe to get a better cache hit rate or whatever else). But hadn't I compiled with -O0 and as such no optimization should take place? I checked, and I realized that due to a glitch in lab setup, optimization actually was on, and not turned off! So now I think I can explain the behavior and theory as well as practice go hand in hand. &lt;br /&gt;&lt;br /&gt;Really? What about the write and the debug print that made everything work? I guess these changes triggered some compiler optimization and thus the memory assignment was changed and so the "extra 32" bit pointed to some other variable. What also explains why the file creation mode was affected by my change. As well as why the bug reacted quite random to my testing code changes.&lt;br /&gt;&lt;br /&gt;So it looks like I finally resolved this issue.&lt;br /&gt;&lt;br /&gt;Lessens learned? Re-check your lab environment, even if it always worked well before. Be careful with assumption about memory layout, as the optimizer seems to heavily reorder variables, and even single statements and statement sequences seem to make a big difference. I knew the compiler reorders things, but I did not have this clear enough on my mind to become skeptic about my lab setup.&lt;br /&gt;&lt;br /&gt;And, as always, some assumption limit your ability to really diagnose what goes on... May this be a reminder not only for me (I wonder how long it will last) but for others as well (thus I thought a blog post makes sense ;)).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-6051991532157924422?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/b-i8Gh6DXYQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/6051991532157924422/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=6051991532157924422" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6051991532157924422?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/6051991532157924422?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/b-i8Gh6DXYQ/another-note-on-hard-to-find-bugs.html" title="Another note on hard-to-find-bugs..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/10/another-note-on-hard-to-find-bugs.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEEBR3o7fCp7ImA9WxJbFEk.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-4127155463620129785</id><published>2009-07-24T16:13:00.003+02:00</published><updated>2009-07-24T16:44:16.404+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-24T16:44:16.404+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="space" /><category scheme="http://www.blogger.com/atom/ns#" term="nasa" /><category scheme="http://www.blogger.com/atom/ns#" term="moon" /><category scheme="http://www.blogger.com/atom/ns#" term="computing" /><category scheme="http://www.blogger.com/atom/ns#" term="apollo" /><title>The code that put people onto the moon...</title><content type="html">... was just recently published by NASA and is now available via Google code. Google has a &lt;a href="http://googlecode.blogspot.com/2009/07/apollo-11-missions-40th-anniversary-one.html"&gt;nice blog post on it&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Of course, &lt;a href="http://code.google.com/p/virtualagc/source/browse/trunk/Comanche055/CM_BODY_ATTITUDE.s?r=258"&gt;reading the "old" assembly code&lt;/a&gt; is a probably a bit hard even for today's programmers used to high-level languages, and even more so for non-programmers. I still think these are excellent documents and at least the comments speak to folks with technical interest (and some are &lt;a href="http://code.google.com/p/virtualagc/source/browse/trunk/Luminary099/LUNAR_LANDING_GUIDANCE_EQUATIONS.s?r=258#178"&gt;really explicit&lt;/a&gt; ;)).&lt;br /&gt;&lt;br /&gt;While digging through this material, I found a very interesting and insightful article on the &lt;a href="http://klabs.org/history/apollo_11_alarms/eyles_2004/eyles_2004.htm"&gt;Lunar Module Guidance Computer&lt;/a&gt; by Don Eyles, who was deeply involved with its programming. This is a long article, but it is a rewarding read. It not only offers a lot of insight into how challenging it was to fly with these day's hardware (every cell phone has *far* more capability today, maybe even washing machines...). The article also explains, in plain word, some concepts that were created for Apollo and influence today's programs as well.&lt;br /&gt;&lt;br /&gt;Most importantly, I think that the Apollo program not only showed that mankind can leave earth. It also is probably the first instant where computing machinery was absolutely vital to achieve a goal. In the Apollo days, there were some overrides possible, and obviously needed. Today, we are betting our life more and more on technology, and often without a real alternative. Having overrides would sometimes be useful, too, but we seem to partially forget that ;)&lt;br /&gt;&lt;br /&gt;But enough said: enjoy these documents!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-4127155463620129785?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/Rdlw9-TigNU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/4127155463620129785/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=4127155463620129785" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/4127155463620129785?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/4127155463620129785?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/Rdlw9-TigNU/code-that-put-people-on-moon.html" title="The code that put people onto the moon..." /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/07/code-that-put-people-on-moon.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYGRn45eCp7ImA9WxJUFEU.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-2076780744114837779</id><published>2009-07-13T12:42:00.005+02:00</published><updated>2009-07-13T14:12:07.020+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-13T14:12:07.020+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="phplogcon" /><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog - what's next?</title><content type="html">I've not blogged so much the recent weeks. I have had my nose deep down inside &lt;a href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt; code, adding new features (like an automatic zip file writer or the ability to spoof/forge UDP sender addresses) and enhancing performance (where I think I scored some major points ;)). &lt;br /&gt;&lt;br /&gt;So it is time for an update. Where is rsyslog heading to? With the many changes I made in the past two to three month, I think it is very important to let the code base stabilize. So I would prefer not to touch too much existing code for a while. Also, it is summer time and my summer vacation is not so far away. Another good argument that it is probably not the best time for big code changes (or do I like to break things before I go away...? ;)).&lt;br /&gt;&lt;br /&gt;So looking at what to do next, I would like to center myself on improving the tool set that helps create rsyslog. That doesn't mean direct improvement to the actual syslogd, but rather to tools that help build and maintain it. The first major effort in this regard was adding an automated testbench. If you look at v3, I think it has around four automated tests (previous versions had none). With v5, we have over 20 subtests, each of which test various cases, so in total we currently have around one hundered test cases automatically covered.&lt;br /&gt;&lt;br /&gt;When I started with this in v3, it was a major effort, even though the number of covered cases was rather small. But getting started with a testbench meant I needed to evaluate ways to automate the tests and create them in the best possible way for rsyslog (which also means convenient during the development process). At that point, I tried a lot of things and finally came up with the current set of tests.  The initial testbench covered only a very limited set of use cases.&lt;br /&gt;&lt;br /&gt;Since then, it has greatly improved, but there are still a lot of uncovered areas. But I now regularly add new tests, most often when I implement new features, change existing ones or hunt bugs. The process is now well understood and many tests can be added with relative ease (but others not, I have some testcases in the queue that require notable extensions to my current system plus a bit ... of the different toolset I will be talking about soon...).&lt;br /&gt;&lt;br /&gt;Initially, I was rather skeptic if the testbench would really pay, especially after I saw the initial effort required (which I by far underestimated). But in the meantime I am convinced. Especially the past couple of month has shown that the automated tests both increase development productivity (by reducing the number of manual tests that need to be done and spotting regressions early) as well as code quality (detect regressions where they otherwise would have been overlooked).&lt;br /&gt;&lt;br /&gt;Now I am in a similar situation in regard to performance testing as I was in regard to correctness testing: everything is done manually and with very low-level tools. Still, I was able to make good progress without tools. But I hope that tools would be as useful for performance testing as they were for bug hunting. Most importantly, my current performance improvement testing covers only limited (though highly relevant) scenarios: those where getting sufficiently reliable numbers is possible with the limited capabilities I have. Most importantly this means that almost all testing so far has been done with plain tcp syslog. While this still enables to check the core engine's performance, it does not offer a clear view of e.g. UDP performance (which I really do not have now). Also, the examples are artificial, and it would be useful to get more of a real-world performance benchmark.&lt;br /&gt;&lt;br /&gt;Finally, performance benchmarks stress the engine, especially its multi-threading capabilities. So performance testing is also a good way to uncover those nasty threading bugs, that one otherwise only detects when systems fail in production (and nobody then knows why...). So I consider decent performance test also to be a plus for code quality. I even consider them very important to stability e.g. the v5-engine, which so far has received only limited attention in practice. It looks like almost nobody ever tried it. I know because the initial v5 release had such a big memory leak, that any serious tester would have needed to come up and complain very quickly. A lack of test deployments makes it harder to mature the engine. I think that good stress tests (which all have a performance co-notation) will help to somewhat mitigate this problem. As a side-note, I have uncovered many of those bugs that I fixed during my manual performance testing. This seems to prove the point.&lt;br /&gt;&lt;br /&gt;So I am more or less convinced (if nothing more urgent shows up) to spend some time implementing performance tools and tests for rsyslog. I would also like to include a somewhat older idea of a "diagnostic front-end" that would be able to pull (and maybe modify) some of rsyslog parameters. I'd expect that as a side-activity I'll also gather (at a minimum) or improve (preferable) performance in a couple of areas (UDP reception performance is on top of the list). But improvements will only come after the basic tools have been written. &lt;br /&gt;&lt;br /&gt;As with the testbench, that will mean that new features and enhancements will probably stall a bit in the coming weeks. This even more as I do not intend to write the front-end in C (I personally do not consider C to be the language of choice for non-performance critical interactive programs, especially looking into some of the portability issues - but YMMV...). So I will try to approach this with an Java app. I have to admit that I learned Java 8 to 10 years ago and never programmed much in it, so that will probably mean I'll need to re-learn the language, but as I don't consider this GUI to be something extremely critical, I don't see any issues with me as a Java freshmen doing it.&lt;br /&gt;&lt;br /&gt;As a side-note, I should probably mention that I am also involved in the &lt;a href="http://www.phplogcon.org"&gt;phpLogCon &lt;/a&gt;project. So far, I am only part of the design team, but I have a number of really cool visualization features on my personal wish-list. If I ever get time to work on that (I hope for next year), I probably will need to do that in Java, so it doesn't hurt to practice on a less demanding project. In that sense, I also hope to be able to set stage for some future cool technology while I work on a current demand ;) It would also be interesting so visualize some of the performance counters, but that's another story ;)&lt;br /&gt;&lt;br /&gt;All in all, getting an interactive troubleshooting and analysis front-end has big potential, not only for testing but also for deployments and finding configuration bugs (which become more and more an issue with improving complexity of the configuration). One could also envision that it could include a graphical configuration build ... as well as tools for setting up all those TLS certs. I don't think I can do all of this now or in the next quarter. But I think it is the right time now to begin working on a foundation that offers yet another big potential. Especially as it also helps to urgent need to get better testing for the engine plus the desire to further improve its performance (my goal is no less than to provide the by-far fastest AND most reliable syslogd on this planet ;)).&lt;br /&gt;&lt;br /&gt;Well, that's it for now. I hope you like the idea of an additional performance-centric toolset (which of course also requires engine enhancements) and a GUI as much as I do. If you have comments or concerns, please let me know. I sincerely hope to begin a new round of capability enhancements with this move.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-2076780744114837779?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/L-d0H-OgHQM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/2076780744114837779/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=2076780744114837779" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2076780744114837779?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2076780744114837779?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/L-d0H-OgHQM/rsyslog-where-are-we.html" title="rsyslog - what's next?" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/07/rsyslog-where-are-we.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0AMRno6eip7ImA9WxJWEEo.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-1491916093326901043</id><published>2009-06-15T16:33:00.007+02:00</published><updated>2009-06-15T17:29:47.412+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-06-15T17:29:47.412+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>high-peformance, low-precision time API under linux?</title><content type="html">This time, I raise a question in my blog. Suggestions, tips and full answers are very welcome.&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://www.rsyslog.com/"&gt;rsyslog&lt;/a&gt;, there are various situations where I only need low resolution timestamps. With low resolution, I precise within a second. Of course, this thing is provided by the time() API. However, time() is very slow - far too slow for many things I do in rsyslog. So far, I have been able to work around this problem by doing a time() call only every n-th time where I run in tight loops and know that this will not bring me outside of me 1-second window (well, to be precise, this is at least very unlikely and thus acceptable).&lt;br /&gt;&lt;br /&gt;However, this approach does not work for all work that I am doing. Now I am facing the challenge once gain, but this time in an area where the "query only n-th time" approach does not work. I need the time in order to schedule asynchronous activities (like writing so far unwritten buffers to disk). With them, there  is no tight loop that provides me with some sense of timing, and so I simply do not know if half a second or half an hour has elapsed between calls - except when I do one of these costly time() calls.&lt;br /&gt;&lt;br /&gt;A good work-around would be to define my own interval timer, awaking me e.g. every seconds. So I would not need absolute time but could do things based on these timer ticks. &lt;b&gt;However&lt;/b&gt;, there is lot of evil in this approach, too: most importantly: this means rsyslogd will be active whenever the system is up, and running on a tick will prevent the operating system from switching the CPU to power saving modes. So this option looks very dirty, too.&lt;br /&gt;&lt;br /&gt;So what to do now? Is there any (decently portable) way to get a second-resolution current timestamp (or a tick counter) &lt;b&gt;without&lt;/b&gt; actually running on a tick?&lt;br /&gt;&lt;br /&gt;If I don't find a better solution, I'll probably be forced to run rsyslogd on a tick, which would not be a good thing from a power consumption point of view.&lt;br /&gt;&lt;br /&gt;As I already said, feedback is greatly appreciated...&lt;br /&gt;&lt;br /&gt;Edit: in case my description was a bit unclear: it is not so important that the timestamp is of low resolution. Of course, I prefer higher resolution, but I would be OK with lower resolution if that is faster.&lt;br /&gt;&lt;br /&gt;The problem with time() and gettimeofday() is that they are quite slow. As an example, I can only do around 250,000 time()/gettimteofday() calls per second on my current development system. So each API call takes around 4ms on that system. While this sounds much, it adds considerable runtime to each messages being processed - especially if multiple calls are required thanks to modular structure.&lt;br /&gt;&lt;br /&gt;I have also thought about a single "lowres system time getter" inside rsyslog. However, that brings up problems with multi-threading. If one would like to be on the safe side, its entry points need to be guarded by mutexes, another inherently slow operation (depending on circumstances, overhead can be even worse then time()). With atomic operations, things may improve. But even then, we run into the issue that we do not know if the last call was half a second or half an hour ago...&lt;br /&gt;&lt;br /&gt;Another edit:&lt;br /&gt;This is a recording from a basic test I did on one lab system:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[rgerhards@rf10up tests]# cat timecaller.c&lt;br /&gt;#include &lt;stdio.h&gt;&lt;br /&gt;#include &lt;time.h&gt;&lt;br /&gt;#include &lt;sys/time.h&gt;&lt;br /&gt;&lt;br /&gt;int main(int argc, char* argv[])&lt;br /&gt;{&lt;br /&gt; time_t tt;&lt;br /&gt; struct timeval tp;&lt;br /&gt; int i;&lt;br /&gt;&lt;br /&gt; for(i = 0 ; i &lt; atoi(argv[1]) ; ++i) {&lt;br /&gt; // time(&amp;tt);&lt;br /&gt;  gettimeofday(&amp;tp, NULL);&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;[rgerhards@rf10up tests]# cc timecaller.c&lt;br /&gt;[rgerthards@rf10up tests]# time ./a.out 100000&lt;br /&gt;&lt;br /&gt;real 0m0.309s&lt;br /&gt;user 0m0.004s&lt;br /&gt;sys 0m0.294s&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The runtime for the time() call is roughly equivalent (especially giving the limited precision of the instrumentation). Please also note that we identified the slowness of the time() calls in autumn 2008, when doing performance optimization with the help of David Lang. David was the first to point to the time-consuming time() calls in strace. Reducing them made quite a difference.&lt;br /&gt;&lt;br /&gt;Since them, I try to avoid time() calls at all costs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-1491916093326901043?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/rZdBvxkgTxg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/1491916093326901043/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=1491916093326901043" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1491916093326901043?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1491916093326901043?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/rZdBvxkgTxg/high-peformance-low-precision-time-api.html" title="high-peformance, low-precision time API under linux?" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/06/high-peformance-low-precision-time-api.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0cEQn46fip7ImA9WxJQFUU.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-1866860941714760787</id><published>2009-05-29T10:16:00.002+02:00</published><updated>2009-05-29T11:23:23.016+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-29T11:23:23.016+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>introducing rsyslog v5</title><content type="html">&lt;span style="font-weight: bold;"&gt;A new v5 version of &lt;/span&gt;&lt;a style="font-weight: bold;" href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; will be released today.&lt;/span&gt; Originally, I did not plan to start the v5 version before the end of the year (2009). But then we received sponsorship to enhance queue performance. And then we saw that an audit-grade queue subsystem was needed (audit-grade means that no message is ever lost, not even in fatal failure cases like sudden power loss).&lt;br /&gt;&lt;br /&gt;Especially the audit-grade queue subsystem resulted in very large design changes to the queue engine. Their magnitude is so large that I assume we need some time to stabilize it. Thus, I have decided to start a new v5 branch, which will feature the redesigned queue engine.&lt;br /&gt;&lt;br /&gt;When we introduced the queue engine in early 2008 (in rsyslog v3), it took roughly three to five month until it got decently stable. With the magnitude of changes we have done now, it will probably take some time, again. It depends a bit on the actual feedback we receive from practice. Also, this time I have added lots of automated tests, so a lot of bugs should already have been caught. Also, during the next weeks I will focus on actual deployment scenarios, rather than things that theoretically may happen (the testbench covers many of those). So, all in all, I expect that the new queue engine will become production-ready faster than the v3 engine.&lt;br /&gt;&lt;br /&gt;Still, I think it is desirable to create a new major version branch for this change. So here we are, at v5. &lt;span style="font-weight: bold;"&gt;I will continue to develop functionality that does not necessarily need the new queue engine inside the v4-devel.&lt;/span&gt; That way, we will have this functionality available both with the proven queue engine as well as with the new experimental one. Note that I can  not do this with a stable branch: per definition, stable branches never receive enhancements (as that would potentially destabilize the branch). So, for the time being and probably a couple of month, &lt;span style="font-weight: bold;"&gt;we will have two development branches&lt;/span&gt;: the v4 as well as the v5 branch. With that v5 will focus on the new queue engine plus any other additions, which are done in v4.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-1866860941714760787?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/trtXbwOp_P0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/1866860941714760787/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=1866860941714760787" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1866860941714760787?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1866860941714760787?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/trtXbwOp_P0/introducing-rsyslog-v5.html" title="introducing rsyslog v5" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/05/introducing-rsyslog-v5.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QCSX09eyp7ImA9WxJRF0g.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-4829049384061517363</id><published>2009-05-19T17:38:00.007+02:00</published><updated>2009-05-19T18:42:48.363+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-19T18:42:48.363+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog queue enhancements  - status report</title><content type="html">I thought I post a few thoughts about how far the &lt;a href="http://www.rsyslog.com/"&gt;rsyslog&lt;/a&gt; queue enhancements have evolved.&lt;br /&gt;&lt;br /&gt;We started with the goal to increase performance, especially for database outputs. As part of that endeavor, we designed and implemented message batches as the new processing entity. This approach was suggested by David Lang, who also offered very valuable feedback, suggestions and review of the relevant papers (not to mention actual testing) during the process. Then, we came to the conclusion that we need to have a truly ultra-reliable queue. One that does not even lose messages in case of a sudden fatal failure (like a power fail without a UPS - or a failing UPS!). That lead to further redesign and a lot of design work. All of this is very exciting.&lt;br /&gt;&lt;br /&gt;Since last Friday, I have now worked on the actual code. I do now have updated for queue, the queue storage drivers and action processing. Most importantly, the rsyslog testbench does once again successfully run, even in DA queue mode. There are still a couple of things that need to be looked at, but I think most of the bulk work is done. What now follows is careful looking at the open issues plus a LOT more of testing.&lt;br /&gt;&lt;br /&gt;The testbench has improved much in the past three month, but it still is far from covering even the most important code areas. Especially the various queueing scenarios are not very well covered by it, mostly because it is rather complex to do so. Anyhow, I will now try not to do so many ad-hoc manual tests but rather see that I can create more automated tests. While this is a lot more of work, even the current testbench has been proven to be extremely valuable during this mayor code change effort (which, let me re-iterate, is far from being fully completed). Without it, it would have been much harder to find those bugs that came up during the testbench run. I think that the time I invested into it already has payed back.&lt;br /&gt;&lt;br /&gt;Let me end with a list of things I need to look at. That will at least help me keep focused and let you know what is extremely weak right now:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;more tests&lt;/li&gt;&lt;li&gt;so far, the last batch is not freed until at least one more message comes in (permit DeleteProcessedBatch() to be called de-coupled)&lt;/li&gt;&lt;li&gt;cancel processing cleanup, decision if we should still support cancel processing entry points&lt;/li&gt;&lt;li&gt;configured discarding of messages on queue-full condition [at least add extra nElem counter]&lt;br /&gt;&lt;/li&gt;&lt;li&gt;make output actions support message-permanent failures (at least PostgreSQL output plugin) [also think about test cases for this]&lt;/li&gt;&lt;li&gt;double-check of action and action unit state processing&lt;/li&gt;&lt;li&gt;persisting of messages from memory queues during shutdown (testing)&lt;/li&gt;&lt;li&gt;Think about a new way of handling iDeqSlowdown (maybe during batch processing?)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-4829049384061517363?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/nmReb2GR8e4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/4829049384061517363/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=4829049384061517363" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/4829049384061517363?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/4829049384061517363?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/nmReb2GR8e4/rsyslog-queue-enhancements-status.html" title="rsyslog queue enhancements  - status report" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/05/rsyslog-queue-enhancements-status.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk4DR3k_eCp7ImA9WxJREk0.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-2175649153294986358</id><published>2009-05-13T10:48:00.002+02:00</published><updated>2009-05-13T10:56:16.740+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-13T10:56:16.740+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>ultra-reliable queueing in rsyslog</title><content type="html">As part of the ongoing mailing list discussion on ultra-reliable queueing in &lt;a href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt;, I'd like to create another blogpost from discussion content (again, I hope this reference is handy in the future).&lt;br /&gt;&lt;br /&gt;The key point with ultra-reliable queues is that no message can be lost once it has been enqueued. In the current (v2,v3,v4 &lt;= 4.1.2) releases of rsyslog, this is ensured as long a the system is guarded against a sudden loss of power (or similar disaster) and even then all but the last messages dequeued are save.&lt;br /&gt;&lt;br /&gt;To make queue operations ultra-reliable in that case, the queue needs to be run as a pure disk queue and a checkpoint interval of one needs to be used. This makes the queue reliable at the expense of performance. Note also that with a disk queue only a single queue worker is permitted.&lt;br /&gt;&lt;br /&gt;Now let's look at a simplified scenario:&lt;br /&gt;&lt;br /&gt;input -&gt; queue -&gt; output&lt;br /&gt;&lt;br /&gt;This is not correct in that inputs never connect directly to outputs, but this detail is irrelevant for what I intend to say (replace "input" by "producer" and "output" by "consumer" if you'd prefer to have a fully consistent version).&lt;br /&gt;&lt;br /&gt;Let's say the processing time is the cost we incur. If we look at it, the queue's cost dominates by far the combined cost of input and output. In most cases, it dominates input+output cost so much, that you can express the total cost as just the cost of the queue operation, without looking at anything else.&lt;br /&gt;&lt;br /&gt;So the input needs to wait until the queue is ready to accept a new message. Once it has done so, the output is notified and immediately acquires the queue lock and begins the dequeue operation. At the same time, the input has already finished input processing (as I said, this happens in virtually "no time" compared to the queue operation). So it needs to wait for the queue lock. Once the dequeue operation is finished, the output releases the lock, and processes the message in virtually no time, too. The input acquired the queue lock, and the whole story begins right from the start.&lt;br /&gt;&lt;br /&gt;A small queue may build up depending on the OS scheduler, but I think most often, input and output will just wait for the queue to complete. In that sense, this mode is similar to DIRECT mode, except that a queue can build up when the action needs to be retried.&lt;br /&gt;&lt;br /&gt;So to optimize such a scenario, the best thing to do is a totally new queue storage driver for such cases. Sequential files do not really work well if we have multiple producers running.&lt;br /&gt;&lt;br /&gt;This is a major effort and even then we need to think about the implications I raised in regard to processing cost above.&lt;br /&gt;&lt;br /&gt;First of all, rsyslog was never designed for this use case (preserve every message EVEN in case of sudden power fail). When I introduced purely disk-based queues, this was done to support disk-assisted mode. I needed a queue type to permit me store things on disk, if we run out of memory. As a "side-effect", a pure disk mode was available also (I'd never implemented it for the sake of itself). As it was there, I decided to expose this mode and made it user-configurable. I thought (probably correct) that it could solve some need - a need that I'd consider "very exotic" (think about the reliance on a audit-grade protocol for this to really make sense). And I added the checkpoint capability because it seemed useful, even with disk-based queues, which could be guarded from total loss of messages by using a reasonable checkpoint interval. Again, a checkpoint interval of one is permitted just because this capability came "for free" and could be handy in some use cases. &lt;br /&gt;&lt;br /&gt;The kiosk example we discussed 2008 (?) on the mailing list looked like a good match for such an exotic environment. Sudden power loss was an option, and we had low traffic volume. Bingo, perfect match.&lt;br /&gt;&lt;br /&gt;However, I'd never thought about a reasonable high-volume system using disk-only queues. Think about the cost functions, such a system boils down to a DIRECT mode queue which just takes an exceptional lot of time for processing messages.&lt;br /&gt;&lt;br /&gt;So probably the best approach for this situation would be to run the queue actually in direct mode. That removes the overwhelming cost of queue operations. Direct mode also ensures that the input receives an ack from the output [but there may be subtle issues which I need to check to make sure this is always the case, so do not take this for granted - but if it is not yet so, this should not be too complex to change]. With this approach, we have two issues left:&lt;br /&gt;&lt;br /&gt;a) the output action may be so slow, that it actually is the dominating cost factor and not disk queue operation&lt;br /&gt;&lt;br /&gt;b) the output action may block for an extended period of time (e.g. during a retry)&lt;br /&gt;&lt;br /&gt;In case a), a disk-queue makes sense, because it's cost is irrelevant in this scenario. Indeed, it is irrelevant under all circumstances. As such, we can configure a disk-only action queue in that case. Note that this implies a *very* slow output.&lt;br /&gt;&lt;br /&gt;Case b) is more complicated. We do NOT have any proper way to address it with current code. The solution IMHO is to introduce a new queue mode "Disk Queue on Delay" which starts an ultra-reliable disk queue (preferably with a faster queue store driver) if and only if the action indicates that it will need extended processing time. This requires some changes to action processing, but the action state machine should be capable to handle that with relatively slight modification [again, an educated guess, not a guarantee]). &lt;br /&gt;&lt;br /&gt;In that scenario, we run the action immediately whenever possible. Only if that take the (considerable) extra effort of buffering messages into a much-slower on disk queue. Note that such a mode makes only sense with audit-grade protocols and senders (which hold processing until the ACK has been received). As such, a busy system automatically slows down to the rate that the queue writer can handle. In this sense, the overall system (e.g. a financial trading system!) may be slowed down by the unavailability of a failing output (which in turn causes the extra and very high cost of disk queue operations). It needs to be considered if that is an acceptable price.&lt;br /&gt;&lt;br /&gt;The faster an ultra-reliable queue disk store driver performs, the more cases we can handle in the spirit of a) above. In theory, this can lead to elimination of b) cases. &lt;br /&gt;&lt;br /&gt;Nevertheless, I hope I have shown that re-designing the queue (drivers) to support high throughput AND ultra-reliable operations AT THE SAME TIME is far from being a trivial task. To do it right, it involves some other changes too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-2175649153294986358?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/i4BN-ye_ZTo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/2175649153294986358/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=2175649153294986358" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2175649153294986358?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/2175649153294986358?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/i4BN-ye_ZTo/ultra-reliable-queueing-in-rsyslog.html" title="ultra-reliable queueing in rsyslog" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/05/ultra-reliable-queueing-in-rsyslog.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIERnk8eCp7ImA9WxJREEg.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-8794538698384152330</id><published>2009-05-11T17:43:00.010+02:00</published><updated>2009-05-11T17:58:27.770+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-11T17:58:27.770+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>rsyslog configuration graphs</title><content type="html">&lt;b&gt;I worked today on adding a configuration graphing capability to &lt;a href="http://www.rsyslog.com"&gt;rsyslog&lt;/a&gt;.&lt;/b&gt; This was inspired by many discussions about how the rule engine works. From a high-level perspective, rsyslog is "just" a configurable message router, that routes messages from a set of inputs to a set of outputs, potentially with transformations doing to the messages. Rsyslog does so via the rule set, which is the most important part of the configuration file. In that sense, rsyslog is a configurable state machine and the rule set is its configuration.&lt;br /&gt;&lt;br /&gt;While typical syslog configurations are rather simple and easy to understand, complex ones can be challenging. The graphing capability we now have provide a high-level, human-readable representation of rsyslogd's internal control structures. The beauty with that is that every user can create an exact right diagram from his own configuration.&lt;br /&gt;&lt;br /&gt;I hope this is a useful tool for documenting a system setup, but I also think it is a very valuable tool for learning to understand rsyslog as well troubleshooting problems with message processing.&lt;br /&gt;&lt;br /&gt;With that said, I now send you to the new&lt;a href="http://www.rsyslog.com/doc-rsconf1_generateconfiggraph.html"&gt; graphing feature manual page&lt;/a&gt;, which I hope provides sufficient insight into how this feature is used.&lt;br /&gt;&lt;br /&gt;But... here is a sample graph to whet your appetite:&lt;br /&gt;&lt;center&gt;&lt;img src="http://www.rsyslog.com/modules/Static_Docs/data/rsyslog_confgraph_complex.png"&gt;&lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-8794538698384152330?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/HElr6LGYFsw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/8794538698384152330/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=8794538698384152330" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8794538698384152330?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/8794538698384152330?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/HElr6LGYFsw/rsyslog-configuration-graphs.html" title="rsyslog configuration graphs" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/05/rsyslog-configuration-graphs.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk4NRn48eCp7ImA9WxJSF0U.&quot;"><id>tag:blogger.com,1999:blog-6193377.post-1449341093625132791</id><published>2009-05-08T13:34:00.004+02:00</published><updated>2009-05-08T14:16:37.070+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-08T14:16:37.070+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rsyslog" /><title>Can "more reliable" actually mean "less reliable"?</title><content type="html">On the rsyslog mailing list, we currently have a discussion about how reliable rsyslog should be. It circles about a small potential window of message loss in the case of sudden power failure. Rsyslog can be configured to put all messages into a disk queue (instead of main memory), so these messages survive such a powerfail condition. However, messages dequeued and scheduled for processing during the power outage may be lost. &lt;br /&gt;&lt;br /&gt;I now consider a case where we have bursty UDP traffic and rsyslog is configured to use a disk-only queue (which obviously is much slower than an in-memory queue). Looking at processing speeds, the max burst rate is limited by using an ultra-reliable queue. To avoid using UDP messages, a second instance could be run that uses an in-memory queue and forwards received messages to the one in ultra-reliable mode (that is with the disk-only queue). So that second instance queues in memory until the (slower) reliable rsyslogd can now accept the message and put it into the reliable queue. Let's say that you have a burst of r messages and that from these burst only r/2 can be enqueued (because the ultra reliable queue is so slow). So you lose r/2 messages.&lt;br /&gt;&lt;br /&gt;Now consider the case that you run rsyslog with just a reliable queue, one that is kept in memory but not able to cover the power failure scenario. Obviously, all messages in that queue are lost when power fails (or almost all to be precise). However, that system has a much broader bandwidth. So with it, there would never have been r messages inside the queue, because that system has a much higher sustained message rate (and thus the burst causes much less of trouble). Let's say the system is just twice as fast in this setup (I guess it usually would be *much* faster). Than, it would be able to process all r records.&lt;br /&gt;&lt;br /&gt;In that scenario, the ultra-reliable system loses r/2 messages, whereas the somewhat more "unreliable" system loses none - by virtue of being able to process messages as they arrive. &lt;br /&gt;&lt;br /&gt;Now extend that picture to messages residing inside the OS buffers or even those that are still queued in their sources because a stream transport blocked sending them.&lt;br /&gt;&lt;br /&gt;I know that each detail of this picture can be argued at length about.&lt;br /&gt;&lt;br /&gt;However, my opinion is that there is no "ultra-reliable" system in life, only various probabilities in losing messages. These probabilities  often depend on each other, what makes calculating them very hard to impossible. Still, the probability of message loss in the system at large is just the product of the probabilities in each of its  components. And reliability is just the inverse of that probability.&lt;br /&gt;&lt;br /&gt;This is where *I* conclude that it can make sense to permit a system to lose some messages under certain circumstances, if that influences the overall probability calculation towards the desired end result. In that sense, I tend to think that a fast, memory-queuing rsyslogd instance can be much more reliable compared to one that is configured as being ultra-reliable, where the rest of the system at large is badly influenced by this (the scenario above).&lt;br /&gt;&lt;br /&gt;However, I also know that for regulatory requirements, you often seem to need to prove that a system may not lose messages once it has received them, even at the cost of an overall increased probability of message loss.&lt;br /&gt;&lt;br /&gt;My view of reliability is much the same as my view of security: there is no such thing as "being totally secure", you can just reduce the probability that something bad happens. The worst thing in security is someone who thinks he is "totally secure" and as such is no longer actively looking at potential issues.&lt;br /&gt;&lt;br /&gt;The same I see for reliability. There is no thing like "being totally reliable" and it is a really bad idea to think you could ever be. Knowing this, one may begin to think about how to decrease the overall probability of message loss AND think about what rate is acceptable (and what to do with these cases, e.g. "how can they hurt").&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6193377-1449341093625132791?l=blog.gerhards.net' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/cmfi/~4/4Ic3DjRj_wU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.gerhards.net/feeds/1449341093625132791/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=6193377&amp;postID=1449341093625132791" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1449341093625132791?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6193377/posts/default/1449341093625132791?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/blogspot/cmfi/~3/4Ic3DjRj_wU/can-more-reliable-actually-mean-less.html" title="Can &quot;more reliable&quot; actually mean &quot;less reliable&quot;?" /><author><name>Rainer</name><uri>http://www.blogger.com/profile/12765720626924376847</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03130076873660943451" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.gerhards.net/2009/05/can-more-reliable-actually-mean-less.html</feedburner:origLink></entry></feed>
