<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-7764126179108633515</atom:id><lastBuildDate>Sat, 07 Mar 2026 13:39:10 +0000</lastBuildDate><category>programming</category><category>c++</category><category>perl</category><category>boost</category><category>database</category><category>documentation</category><category>key value</category><category>mongodb</category><category>shell</category><category>sql</category><category>threads</category><category>xml</category><category>acitvemq</category><category>bash</category><category>benchmark</category><category>c</category><category>csv</category><category>dba</category><category>erlang</category><category>html</category><category>indexing</category><category>java</category><category>ldap</category><category>libxml++</category><category>messaging</category><category>mysql</category><category>optimization</category><category>parser</category><category>php</category><category>puppet</category><category>scripting</category><category>security</category><category>stl</category><category>stomp</category><category>string</category><category>system administration</category><category>volatile</category><category>xerces c++</category><title>Ilya Martynov&#39;s blog</title><description></description><link>http://www.martynov.org/</link><managingEditor>noreply@blogger.com (Ilya Martynov)</managingEditor><generator>Blogger</generator><openSearch:totalResults>13</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-1660585397194109947</guid><pubDate>Fri, 25 May 2012 15:04:00 +0000</pubDate><atom:updated>2012-05-28T14:28:47.226+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">database</category><category domain="http://www.blogger.com/atom/ns#">key value</category><category domain="http://www.blogger.com/atom/ns#">mongodb</category><title>MongoDB pain points</title><description>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Recently I was contacted by 10gen&#39;s account manager soliciting a feedback on our use of &lt;a href=&quot;http://www.mongodb.org/&quot;&gt;Mon&lt;span id=&quot;goog_1625138767&quot;&gt;&lt;/span&gt;&lt;span id=&quot;goog_1625138768&quot;&gt;&lt;/span&gt;goDB&lt;/a&gt; at the &lt;a href=&quot;http://www.iponweb.com/&quot;&gt;company&lt;/a&gt; I&#39;m working at. I wrote a&amp;nbsp;lengthy&amp;nbsp;reply on what we do with MongoDB and the problems we see with it and never heard back. It was a shame to all these feedback to go to waste so I decided to repost it with minor edits in my blog. So here it comes ...&lt;br /&gt;
&lt;br /&gt;
We are using MongoDB at IPONWEB for quite long time - for ~2 years 
already for a number of high loaded projects. Our company specializes at 
creating products for display advertising and mostly we are using 
MongoDB to keep track of user data in our adservers.&amp;nbsp;The main reason we 
are using MongoDB is raw performance. We are using MongoDB mostly as dumb 
NoSQL key-value database where we try to keep data fully cached in RAM. 
With rare exceptions we are not using any fancy features like complex 
queries, map-reduce and so on but rather limit ourselves to queries by a
 primary key. We do use sharding because as I mentioned above we have to
 put whole database to RAM so we often have to split database across 
multiple servers. Generally we are very price sensitive about costs of 
the installation so we are always looking at reducing hardware costs 
for our databases. Giving this background the following limitations in 
MongoDB implementation cause us the most grief:&lt;br /&gt;
&lt;br /&gt;
a) lack of online database defragmentation in MongoDB. Currently the
 only way to compact MongoDB database is to stop the database and run 
compact or repair. On our datasets this process runs for considerable 
time. We do have to defragment&amp;nbsp; database pretty often to keep RAM usage 
under control. Fragmented vs non-fragmented database can be easily be 
two times bigger what in our case means two time higher hardware costs.&lt;br /&gt;
&lt;br /&gt;
b) realistically for our use case we can do MongoDB resharding only 
offline. Adserving is extremely sensitive to any latencies and if we add
 more shards to existing cluster we more or less forced to take the 
application offline until resharding finishes.&lt;br /&gt;
&lt;br /&gt;
c) lack of good support of SSD. The way MongoDB works now switching 
from using more RAM with HDD as backing storage in favor of using less 
RAM with SSD backing storage doesn&#39;t seem to be cost effective. SSD if 
priced per 1GB is roughly two times cheaper then RAM but if we place our
 data on SSD we have to reserve at least two more time space on SSD if 
we want to be able to run repair on the data (this is because running 
repair requires two times more space). Other reason we considered using 
SSD as backing storage instead of HDD is write performance in some 
applications where it was a limitation. But from our limited 
benchmarking we found small performance difference because it looks like
 single thread write lock in MongoDB becomes a bottleneck rather then 
underlying storage.&lt;br /&gt;
&lt;br /&gt;
d) minor point: underlying networking protocol could be more 
efficient with some optimizations. If you send many small queries and 
get small documents as result MongoDB creates separate TCP packets for 
each request/response. Under high load especially in case of virtualized
 hardware (i.e. EC2) this introduces additional high overhead. We have 
our own client driver which tries to pack multiple requests in single 
TCP packets and it makes noticeable difference in performance on EC2. 
But this is a partial solution because responses from MongoDB and 
communications between mongos and mongod are still inefficient.&lt;br /&gt;
&lt;br /&gt;
e) another minor point: BSON format is very wasteful in terms of 
memory usage. Giving that we try to minimize our database sizes to 
reduce hardware costs the recent trend in our use of MongoDB is instead of
 representing data as BSON document do serialization to some more 
compact format and instead store our data as big binary blobs (i.e. to 
simplify our documents look like { _id = &#39;....&#39;, data = &#39;... serialized 
data ...&#39;}&lt;br /&gt;
&lt;br /&gt;
By the way at some point we evaluated switching to &lt;a href=&quot;http://citrusleaf/&quot;&gt;CitrusLeaf&lt;/a&gt;.
 This product supposedly addresses some of the above issues (mostly a, b
 and c) but it seems that expected savings in hardware costs would be offset by 
licensing costs so at least for now we are not going to.&lt;/div&gt;</description><link>http://www.martynov.org/2012/05/mongodb-pain-points.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>16</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-258858507822695233</guid><pubDate>Wed, 10 Feb 2010 14:41:00 +0000</pubDate><atom:updated>2010-02-10T17:52:23.085+03:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">c</category><category domain="http://www.blogger.com/atom/ns#">c++</category><category domain="http://www.blogger.com/atom/ns#">database</category><category domain="http://www.blogger.com/atom/ns#">key value</category><category domain="http://www.blogger.com/atom/ns#">mongodb</category><category domain="http://www.blogger.com/atom/ns#">programming</category><title>MongoDB client library: C vs C++</title><description>&lt;p&gt;I&#39;ve been playing a bit with &lt;a href=&quot;http://www.mongodb.org/&quot;&gt;MongoDB&lt;/a&gt; recently. Particularly I&#39;ve looked into source code for client libraries as I was interested in how hard is to change the client API to support async mode of operation. One thing I noticed is that &lt;a href=&quot;http://github.com/mongodb/mongo-c-driver/blob/master/src/mongo.c&quot;&gt;C version&lt;/a&gt; of the client library compared to &lt;a href=&quot;http://github.com/mongodb/mongo/blob/master/client/dbclient.cpp&quot;&gt;C++ version&lt;/a&gt; is shorter and much easier to read. I cannot shake off the feeling that sometimes C++ feels like a step backwards compared to C.&lt;/p&gt;</description><link>http://www.martynov.org/2010/02/mongodb-client-library-c-vs-c.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-6207022228775391451</guid><pubDate>Wed, 08 Apr 2009 12:50:00 +0000</pubDate><atom:updated>2009-04-08T17:34:54.430+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">puppet</category><category domain="http://www.blogger.com/atom/ns#">system administration</category><title>Running Puppet on big scale</title><description>This is a rehash of my &lt;a href=&quot;http://slashdot.org/comments.pl?sid=1154635&amp;cid=27132539&quot;&gt;comment&lt;/a&gt; in slashdot discussion and my &lt;a href=&quot;http://blog.kovyrin.net/2008/03/12/puppet-the-best-admins-friend/&quot;&gt;comment&lt;/a&gt; on Alexey Kovrygin&#39;s blog post.&lt;br /&gt;&lt;br /&gt;We run &lt;a href=&quot;http://reductivelabs.com/products/puppet/&quot;&gt;Puppet&lt;/a&gt; on hundreds of servers in two datacenters and it was pain to get it working right. There are many issues which show up here and there: memory leaks in both client (puppetd) and server (puppetmaster), periodic lock ups and even file corruption. Besides it is quite slow. These problems are being slowly fixed with each new release but right now using Puppet for big installations is a source of constant problems. Unfortunately you do not notice these problems until you get many servers to manage; on smaller installations it seems to work without problems or at least they happen much less often to be noticable. In our case number of servers we managed increased slowly so we felt into the trap and now rely on Puppet too much and now it is too late to change. At the end we have managed to work around of most of issues in the Puppet we have met so combined with monitoring to catch problems it works good enough for us. On the other hand if I were to start from scratch I would evaluate something different for the project. Perhaps I would use &lt;a href=&quot;http://www.cfengine.org/&quot;&gt;Cfengine&lt;/a&gt;. It is not as flexible and nice as puppet but probably is more stable simply because it is much more old. I talked to people who used Cfengine on much bigger scale (thousands of servers) and they did not recall stability problems with it. In the long run Puppet will be probably ok too as it is being developed actively but right now I&#39;d consider it to be in “beta” state. Or maybe even in &quot;alpha&quot;.&lt;br /&gt;&lt;br /&gt;For anyone interested in how to get Puppet work for real work load this is what we do:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;We run &lt;a href=&quot;http://reductivelabs.com/trac/puppet/wiki/UsingMongrel&quot;&gt;Puppet under Apache+Mongrel&lt;/a&gt;. By default it runs using WEBrick what breaks easily under any moderate load. So we use Apache+Mongrel instead of it. Another benefit of using Apache is that you can run multiple backends.  This helps if you have multi-core server for puppetmaster as by itself it can use only one core. Alternatively you can use Nginx+Mongrel or another other web server with proxying capabilities + Mongrel.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Because Puppet is slow we load balance it across two boxes in each datacenter.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;We restart backends from time to time because they leak memory. We have a cron job to do this every 15 minutes (yes, it is that bad).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Puppetmaster has a cache which we saw to get corrupted sometimes. Our &quot;fix&quot; is to delete it before each restart. This might be fixed in later version: I&#39;ve seen some closed bug reports which loooked relevant but we still have this cache clean up just in case.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;We do not run puppet client as daemon. We run it as a &lt;a href=&quot;http://reductivelabs.com/trac/puppet/wiki/Recipes/cron&quot;&gt;cron job&lt;/a&gt;. Puppet client when run as daemon leaks memory and gets stuck from time to time. In our cron job we add random sleep before starting client to make sure requests do not hit server at the same time and overload it.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;We never serve big files over puppet using its fileserver. Puppet does a number of stupid things with big files like say reading them into memory first before serving it to puppet client. If you need to distribute big files use other means (HTTP, FTP, NFS, etc).&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;</description><link>http://www.martynov.org/2009/04/running-puppet-on-big-scale.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-1359025179540088839</guid><pubDate>Wed, 25 Mar 2009 15:07:00 +0000</pubDate><atom:updated>2009-03-31T16:49:45.676+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">acitvemq</category><category domain="http://www.blogger.com/atom/ns#">java</category><category domain="http://www.blogger.com/atom/ns#">messaging</category><category domain="http://www.blogger.com/atom/ns#">perl</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">stomp</category><title>Stomp messaging for non-Java programmers on top of Apache ActiveMQ</title><description>Recently I was researching available options for messaging between Perl programs. In the past I had quite a lot of experience with &lt;a href=&quot;http://www.spread.org/&quot;&gt;Spread&lt;/a&gt; and I don&#39;t want to repeat. I hated Spread as it was buggy and unstable. So I looked into other alternatives: &lt;a href=&quot;http://xmpp.org/&quot;&gt;XMPP&lt;/a&gt;, &lt;a href=&quot;http://stomp.codehaus.org/&quot;&gt;Stomp&lt;/a&gt; and &lt;a href=&quot;http://www.rabbitmq.com/&quot;&gt;AMQP&lt;/a&gt;. AMQP has no Perl client so it was out. Stomp and XMPP are closely tied in my view but then Stomp looked simplier so I decided to go with Stomp. There is very good Perl client library for Stomp: &lt;a href=&quot;http://search.cpan.org/dist/Net-Stomp/&quot;&gt;Net::Stomp&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Then there is a choose of the server. This is quite an important choice and here is why: Stomp is theoretically language agnostic protocol but in reality you are very likely to depend on semantics of specific Stomp server implementation. For example like I mention below Stomp protocol doesn&#39;t really define any rules of message delivery.&lt;br /&gt;&lt;br /&gt;There are several servers which support Stomp but &lt;a href=&quot;http://activemq.apache.org/&quot;&gt;Apache ActiveMQ&lt;/a&gt; looked to me like one of the most robust implementations. While Apache ActiveMQ supports a wide range of interfaces its design is centered around &lt;a href=&quot;http://java.sun.com/products/jms/&quot;&gt;JMS&lt;/a&gt; and it helps to understand basic concepts of JMS even if you use Stomp only. This was a problem for me and as I don&#39;t really program in Java and all JMS concepts were alien to me. Moreover most of documentation on Stomp and ActiveMQ takes for granted that you know JMS basics. &lt;br /&gt;&lt;br /&gt;So I&#39;m recording all my finding on Stomp/ActiveMQ from viewpoint of non-Java programmer. I hope it might be helpful for other non-Java programmers. Word of warning: all below might be specific to Apache ActiveMQ implementation of Stomp server. I didn&#39;t bother to check other Stomp servers.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Basic model&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As I mentioned earlier Stomp protocol by itself doesn&#39;t specify rules of message delivery. It is up to Stomp server to define the rules. This is where JMS API model becomes important as Stomp implementation is basically just a mapping of JMS model to non-Java specific protocol. Below is the short summary of API model which is relevant to Stomp clients (this is mostly based on my reading of &lt;a href=&quot;http://java.sun.com/products/jms/tutorial/1_3_1-fcs/doc/jms_tutorialTOC.html&quot;&gt;JMS tutorial&lt;/a&gt;, &lt;a href=&quot;http://stomp.codehaus.org/Protocol&quot;&gt;Stomp protocol&lt;/a&gt; description and description of &lt;a href=&quot;http://stomp.codehaus.org/Stomp+JMS&quot;&gt;JMS extensions to Stomp&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;There are two distinct ways to organize messaging:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Use queues. If one message gets into queue, only one of subscribers gets it. If there are no subscribers then server stores the message until someone shows up.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Use topics. For each message sent to the topic all active (i.e. connected) subscribers get a copy of it. Actually non-active subscribers can get a copy as well if they register their subscription as durable in advance. If there are no subscribers message gets lost.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;How do use queues and topics in Stomp client? It is all controlled by destination you specify when subscribing to messages or sending messages. Destinations like /queue/* act as queues. Destinations like /topic/* act as topics.&lt;br /&gt;&lt;br /&gt;There is also a concept of temporary queues and topics in JMS. The idea is that they are visible only to connection which creates them so that client might have private queues and topics. I&#39;m not sure if this is exposed to Stomp clients at all. It might be - I haven&#39;t researched this as I don&#39;t need it in my application.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Control over reliability of messaging&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;JMS API gives you some control over reliability of messaging and at least some of it is exposed to Stomp layer.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Message acknowledgement&lt;/span&gt;: Stomp client on subscription tells if it acknowledges messages automatically or not. Automatic means that messages is considered delivered even if subscriber doesn&#39;t actually read it. I guess there are cases when it makes sense but I&#39;d argue that default behavior should be opposite as for most applications it doesn&#39;t.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Message persistence&lt;/span&gt;: if Stomp server dies it either losses undelivered messages or it rereads them from some permanent storage. Message persistence controls this.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Message priority&lt;/span&gt;: in theory JMS provider tries to deliver higher-priority messages before lower-priority. In practice I have no idea - I didn&#39;t research how ActiveMQ implements this as it is not important for my application. Anyway this bit is exposed into Stomp protocol as well.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Message expiration&lt;/span&gt;: this defines for how long time server keeps undelivered messages.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Transactions&lt;/span&gt;: not sure about this one. Both JMS and Stomp  support concept of transactions but I&#39;m not sure what is the exact overlap. I might look into this later but for my application transactions are probably not important.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Configuring ActiveMQ as a Stomp server&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Latest version (5.2) seems to support Stomp out of box without need for any additional configuration. As a quick test you can run the following program. It is just a copy&amp;paste from Net::Stomp perldoc - I&#39;m adding it here in case they change perldoc later:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#&amp;nbsp;send&amp;nbsp;a&amp;nbsp;message&amp;nbsp;to&amp;nbsp;the&amp;nbsp;queue&amp;nbsp;&#39;foo&#39;&lt;br /&gt;use&amp;nbsp;Net::Stomp;&lt;br /&gt;my&amp;nbsp;$stomp&amp;nbsp;=&amp;nbsp;Net::Stomp-&gt;new(&amp;nbsp;{&amp;nbsp;hostname&amp;nbsp;=&gt;&amp;nbsp;&#39;localhost&#39;,&amp;nbsp;port&amp;nbsp;=&gt;&amp;nbsp;&#39;61613&#39;&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;$stomp-&gt;connect(&amp;nbsp;{&amp;nbsp;login&amp;nbsp;=&gt;&amp;nbsp;&#39;hello&#39;,&amp;nbsp;passcode&amp;nbsp;=&gt;&amp;nbsp;&#39;there&#39;&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;$stomp-&gt;send(&amp;nbsp;{&amp;nbsp;destination&amp;nbsp;=&gt;&amp;nbsp;&#39;/queue/foo&#39;,&amp;nbsp;body&amp;nbsp;=&gt;&amp;nbsp;&#39;test&amp;nbsp;message&#39;&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;$stomp-&gt;disconnect;&lt;br /&gt;&lt;br /&gt;#&amp;nbsp;subscribe&amp;nbsp;to&amp;nbsp;messages&amp;nbsp;from&amp;nbsp;the&amp;nbsp;queue&amp;nbsp;&#39;foo&#39;&lt;br /&gt;use&amp;nbsp;Net::Stomp;&lt;br /&gt;my&amp;nbsp;$stomp&amp;nbsp;=&amp;nbsp;Net::Stomp-&gt;new(&amp;nbsp;{&amp;nbsp;hostname&amp;nbsp;=&gt;&amp;nbsp;&#39;localhost&#39;,&amp;nbsp;port&amp;nbsp;=&gt;&amp;nbsp;&#39;61613&#39;&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;$stomp-&gt;connect(&amp;nbsp;{&amp;nbsp;login&amp;nbsp;=&gt;&amp;nbsp;&#39;hello&#39;,&amp;nbsp;passcode&amp;nbsp;=&gt;&amp;nbsp;&#39;there&#39;&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;$stomp-&gt;subscribe(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&amp;nbsp;&amp;nbsp;&amp;nbsp;destination&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;=&gt;&amp;nbsp;&#39;/queue/foo&#39;,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&#39;ack&#39;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;=&gt;&amp;nbsp;&#39;client&#39;,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&#39;activemq.prefetchSize&#39;&amp;nbsp;=&gt;&amp;nbsp;1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;);&lt;br /&gt;while&amp;nbsp;(1)&amp;nbsp;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;my&amp;nbsp;$frame&amp;nbsp;=&amp;nbsp;$stomp-&gt;receive_frame;&lt;br /&gt;&amp;nbsp;&amp;nbsp;warn&amp;nbsp;$frame-&gt;body;&amp;nbsp;#&amp;nbsp;do&amp;nbsp;something&amp;nbsp;here&lt;br /&gt;&amp;nbsp;&amp;nbsp;$stomp-&gt;ack(&amp;nbsp;{&amp;nbsp;frame&amp;nbsp;=&gt;&amp;nbsp;$frame&amp;nbsp;}&amp;nbsp;);&lt;br /&gt;}&lt;br /&gt;$stomp-&gt;disconnect;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;Default installation doesn&#39;t seem to do any authorization so any login/passcode works.</description><link>http://www.martynov.org/2009/03/stomp-messaging-for-non-java.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>6</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-1983453835217536737</guid><pubDate>Mon, 17 Nov 2008 11:44:00 +0000</pubDate><atom:updated>2008-11-17T15:16:53.584+03:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">erlang</category><category domain="http://www.blogger.com/atom/ns#">programming</category><title>Erlang debugging tips</title><description>I&#39;ve just started playing with Erlang so I have a lot to discover but so far I&#39;ve found several things which help me to debug my programs:&lt;ol&gt;&lt;br /&gt;&lt;li&gt;I tried to write my programs using &lt;a href=&quot;http://www.erlang.org/doc/design_principles/part_frame.html&quot;&gt;OTP principles&lt;/a&gt; but the problem for me was that by default it often causes Erlang to hide most of the problems. The faultly process just get silently restarted by its supervisor or even worse - the whole application just exits with unclear &quot;shutdown temporary&quot; message. The solution is simple: start &lt;a href=&quot;http://www.erlang.org/doc/apps/sasl/part_frame.html&quot;&gt;sasl&lt;/a&gt; application and it&#39;ll start logging all crashes. For development starting Erlang shell as &lt;span style=&quot;font-style:italic;&quot;&gt;erl -boot start_sasl&lt;/span&gt; does the trick.&lt;br /&gt;&lt;li&gt;If you compile your modules with &lt;span style=&quot;font-style:italic;&quot;&gt;debug_info&lt;/span&gt; switch then you can use quite nifty visual debugger to step through your program. Quick howto: you open debugger window with Erlang console command &lt;span style=&quot;font-style:italic;&quot;&gt;im()&lt;/span&gt; and then you add modules for inspection via menu &lt;span style=&quot;font-style:italic;&quot;&gt;Module/Interpret&lt;/span&gt;. Then you can either add breakpoints manually or configure debugger to auto attach on one of conditions (say on first call). Instead of clicking menus you can also use Erlang console commands to control the debugger. See &lt;span style=&quot;font-style:italic;&quot;&gt;i:help()&lt;/span&gt;.&lt;br /&gt;&lt;li&gt;With command &lt;span style=&quot;font-style:italic;&quot;&gt;appmon:start()&lt;/span&gt; you can launch &lt;a href=&quot;http://www.erlang.org/doc/apps/appmon/index.html&quot;&gt;visual application monitor&lt;/a&gt; which shows all active applications. One particular useful thing is ability to click on application what shows a tree of processes it consist of. Then you can enable tracing of individual processes. When tracing is enabled it seems to be showing messages send or recieved by the traced process.&lt;br /&gt;&lt;/ol&gt;</description><link>http://www.martynov.org/2008/11/erlang-debugging-tips.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-4573019703104503769</guid><pubDate>Thu, 06 Dec 2007 13:06:00 +0000</pubDate><atom:updated>2007-12-07T15:00:55.293+03:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">benchmark</category><category domain="http://www.blogger.com/atom/ns#">c++</category><category domain="http://www.blogger.com/atom/ns#">parser</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">stl</category><category domain="http://www.blogger.com/atom/ns#">string</category><title>STL strings vs C strings for parsing</title><description>I&#39;m working on a project where I need to build custom high performance HTTP server. One piece of this server is a parser for URLs in incoming requests. It is very simple and on the first glance it shouldn&#39;t be that slow compared with other parts of the server. Yet it was taking quite a lot of CPU according to the profiler. The parser is using STL and basically does several &lt;span style=&quot;font-style:italic;&quot;&gt;string::find()&lt;/span&gt; calls to find parts of URL. So I thought maybe &lt;span style=&quot;font-style:italic;&quot;&gt;string::find()&lt;/span&gt; is too slow and decided to benchmark it against &lt;span style=&quot;font-style:italic;&quot;&gt;strchr()&lt;/span&gt;. This is my benchmark code:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;string.h&gt;&lt;br /&gt;#include &amp;lt;string&gt;&lt;br /&gt;#include &amp;lt;time.h&gt;&lt;br /&gt;#include &amp;lt;iostream&gt;&lt;br /&gt;&lt;br /&gt;using std::string;&lt;br /&gt;using std::cout;&lt;br /&gt;&lt;br /&gt;int main() {&lt;br /&gt;    const char* str1 = &quot;                                      a              &quot;;&lt;br /&gt;    const string&amp; str2 = str1;&lt;br /&gt;&lt;br /&gt;    const unsigned long iterations = 500000000l;&lt;br /&gt;&lt;br /&gt;    {&lt;br /&gt;        clock_t start = clock();&lt;br /&gt;&lt;br /&gt;        for (unsigned long i = 0; i &lt; iterations; ++i) {&lt;br /&gt;            char* pos = strchr(str1, &#39;a&#39;);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        clock_t end = clock();&lt;br /&gt;        double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;&lt;br /&gt;        double iterTime = totalTime / iterations;&lt;br /&gt;        double rate = 1 / iterTime;&lt;br /&gt;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Total time:         &quot; &amp;lt;&amp;lt; totalTime &amp;lt;&amp;lt; &quot; sec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Iterations:         &quot; &amp;lt;&amp;lt; iterations &amp;lt;&amp;lt; &quot; it\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Time per iteration: &quot; &amp;lt;&amp;lt; iterTime * 1000 &amp;lt;&amp;lt; &quot; msec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Rate:               &quot; &amp;lt;&amp;lt; rate &amp;lt;&amp;lt; &quot; it/sec\n&quot;;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    {&lt;br /&gt;        clock_t start = clock();&lt;br /&gt;&lt;br /&gt;        for (unsigned long i = 0; i &lt; iterations; ++i) {&lt;br /&gt;            string::size_type pos = str2.find(&#39;a&#39;);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        clock_t end = clock();&lt;br /&gt;        double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;&lt;br /&gt;        double iterTime = totalTime / iterations;&lt;br /&gt;        double rate = 1 / iterTime;&lt;br /&gt;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Total time:         &quot; &amp;lt;&amp;lt; totalTime &amp;lt;&amp;lt; &quot; sec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Iterations:         &quot; &amp;lt;&amp;lt; iterations &amp;lt;&amp;lt; &quot; it\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Time per iteration: &quot; &amp;lt;&amp;lt; iterTime * 1000 &amp;lt;&amp;lt; &quot; msec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Rate:               &quot; &amp;lt;&amp;lt; rate &amp;lt;&amp;lt; &quot; it/sec\n&quot;;&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Turns out strchr is much faster as long as the benchmark code is compiled with optimizations on:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;ilya@denmark:~$ g++ -O3 test.cc &amp;&amp; ./a.out&lt;br /&gt;Total time:         0 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 0 msec&lt;br /&gt;Rate:               inf it/sec&lt;br /&gt;Total time:         15.5 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 3.1e-05 msec&lt;br /&gt;Rate:               3.22581e+07 it/sec&lt;br /&gt;&lt;br /&gt;ilya@denmark:~$ g++ -O2 test.cc &amp;&amp; ./a.out&lt;br /&gt;Total time:         0 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 0 msec&lt;br /&gt;Rate:               inf it/sec&lt;br /&gt;Total time:         15.76 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 3.152e-05 msec&lt;br /&gt;Rate:               3.17259e+07 it/sec&lt;br /&gt;&lt;br /&gt;ilya@denmark:~$ g++ -O1 test.cc &amp;&amp; ./a.out&lt;br /&gt;Total time:         0 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 0 msec&lt;br /&gt;Rate:               inf it/sec&lt;br /&gt;Total time:         19.23 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 3.846e-05 msec&lt;br /&gt;Rate:               2.6001e+07 it/sec&lt;br /&gt;&lt;br /&gt;ilya@denmark:~$ g++ -O0 test.cc &amp;&amp; ./a.out&lt;br /&gt;Total time:         18.64 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 3.728e-05 msec&lt;br /&gt;Rate:               2.6824e+07 it/sec&lt;br /&gt;Total time:         16.89 sec&lt;br /&gt;Iterations:         500000000 it&lt;br /&gt;Time per iteration: 3.378e-05 msec&lt;br /&gt;Rate:               2.96033e+07 it/sec&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I checked the same code with callgrind and from call graph it looks like &lt;span style=&quot;font-style:italic;&quot;&gt;strchr()&lt;/span&gt; call was inlined while &lt;span style=&quot;font-style:italic;&quot;&gt;string::find()&lt;/span&gt; wasn&#39;t. It could be the reason for the difference in the performance. Maybe compiler is even smarter and optimized whole cycle with &lt;span style=&quot;font-style:italic;&quot;&gt;strchr()&lt;/span&gt; out. I&#39;m not sure that the benchmark is completly fair. Anyway one thing is certain: I&#39;ll should try to rewrite my URL parser using strchr() and see if the real code is faster.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Update:&lt;/span&gt; As anonymous commented it looks as though GCC is replacing the call to &lt;span style=&quot;font-style:italic;&quot;&gt;strchr()&lt;/span&gt; with a compiler builtin, and noticing that str1 points to a literal and doing the search at compile-time. The make the benchmark fair &lt;span style=&quot;font-style:italic;&quot;&gt;str1&lt;/span&gt; should by supplied at runtime to prevent the optimization. I tried that (passed the string via arvs) and it does change the result. It seems that for short string C strings are faster and for long string STL strings are faster. Not quite sure why yet.&lt;br /&gt;&lt;br /&gt;For the reference here the update benchmark code:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;string.h&gt;&lt;br /&gt;#include &amp;lt;string&gt;&lt;br /&gt;#include &amp;lt;time.h&gt;&lt;br /&gt;#include &amp;lt;iostream&gt;&lt;br /&gt;&lt;br /&gt;using std::string;&lt;br /&gt;using std::cout;&lt;br /&gt;&lt;br /&gt;int main(int argc, char** argv) {&lt;br /&gt;    const char* str1 = argv[1];&lt;br /&gt;    const string&amp; str2 = argv[1];&lt;br /&gt;&lt;br /&gt;    const unsigned long iterations = 500000000l;&lt;br /&gt;&lt;br /&gt;    {&lt;br /&gt;        clock_t start = clock();&lt;br /&gt;&lt;br /&gt;        for (unsigned long i = 0; i &lt; iterations; ++i) {&lt;br /&gt;            char* pos = strchr(str1, &#39;a&#39;);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        clock_t end = clock();&lt;br /&gt;        double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;&lt;br /&gt;        double iterTime = totalTime / iterations;&lt;br /&gt;        double rate = 1 / iterTime;&lt;br /&gt;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Total time:         &quot; &amp;lt;&amp;lt; totalTime &amp;lt;&amp;lt; &quot; sec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Iterations:         &quot; &amp;lt;&amp;lt; iterations &amp;lt;&amp;lt; &quot; it\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Time per iteration: &quot; &amp;lt;&amp;lt; iterTime * 1000 &amp;lt;&amp;lt; &quot; msec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Rate:               &quot; &amp;lt;&amp;lt; rate &amp;lt;&amp;lt; &quot; it/sec\n&quot;;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    {&lt;br /&gt;        clock_t start = clock();&lt;br /&gt;&lt;br /&gt;        for (unsigned long i = 0; i &lt; iterations; ++i) {&lt;br /&gt;            string::size_type pos = str2.find(&#39;a&#39;);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        clock_t end = clock();&lt;br /&gt;        double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;&lt;br /&gt;        double iterTime = totalTime / iterations;&lt;br /&gt;        double rate = 1 / iterTime;&lt;br /&gt;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Total time:         &quot; &amp;lt;&amp;lt; totalTime &amp;lt;&amp;lt; &quot; sec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Iterations:         &quot; &amp;lt;&amp;lt; iterations &amp;lt;&amp;lt; &quot; it\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Time per iteration: &quot; &amp;lt;&amp;lt; iterTime * 1000 &amp;lt;&amp;lt; &quot; msec\n&quot;;&lt;br /&gt;        cout &amp;lt;&amp;lt; &quot;Rate:               &quot; &amp;lt;&amp;lt; rate &amp;lt;&amp;lt; &quot; it/sec\n&quot;;&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Meanwhile I rewrote my URL parser using C strings and it is now 2x times faster. I guess the speedup comes from the fact that with C strings I can minimize number memory allocations for temporary strings. With C string if you want to take substring of another string you can just take pointer in the middle of the original and place &#39;\0&#39; where substring should end if don&#39;t mind destroying the original string. With STL (ok, standart) strings you cannot do this and have to make copies.</description><link>http://www.martynov.org/2007/12/stl-strings-vs-c-strings-for-parsing.html</link><author>noreply@blogger.com (Ilya Martynov)</author><thr:total>6</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-7054031657507268091</guid><pubDate>Fri, 21 Sep 2007 22:55:00 +0000</pubDate><atom:updated>2007-09-26T00:59:47.689+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">csv</category><category domain="http://www.blogger.com/atom/ns#">html</category><category domain="http://www.blogger.com/atom/ns#">ldap</category><category domain="http://www.blogger.com/atom/ns#">perl</category><category domain="http://www.blogger.com/atom/ns#">php</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">security</category><category domain="http://www.blogger.com/atom/ns#">shell</category><category domain="http://www.blogger.com/atom/ns#">sql</category><category domain="http://www.blogger.com/atom/ns#">xml</category><title>Beyond XSS and SQL injections</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6EABosSan5KDIp_M2_wRGwhmMXDJ18UFa15eee8CzB_1lZ72Kl8CHS5406XYvtZ7JZ6YXVn6c2H8MSRGWRMdAxZgdw1afTsLfX7LmIAZAIFnOaNhqkEeT4Ve3r8jFG6CW2vTM_FqsUQ/s1600-h/IMG_7193.jpg&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6EABosSan5KDIp_M2_wRGwhmMXDJ18UFa15eee8CzB_1lZ72Kl8CHS5406XYvtZ7JZ6YXVn6c2H8MSRGWRMdAxZgdw1afTsLfX7LmIAZAIFnOaNhqkEeT4Ve3r8jFG6CW2vTM_FqsUQ/s200/IMG_7193.jpg&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5112783820455883314&quot; /&gt;&lt;/a&gt;What is common about HTML, XML and CSV files, SQL and LDAP queries, filenames and shell commands? All these things are based on text which is often generated by programs. And one commonly observed flaw in such programs is encoding rules are not being followed. These days many developers are aware about &lt;a href=&quot;http://en.wikipedia.org/wiki/SQL_injection&quot;&gt;SQL injection&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Cross-site_scripting&quot;&gt;XSS&lt;/a&gt; problems as many books, online tutorials, blogs, coding standards, etc speak about them. Yet I&#39;m not sure there is enough education so that developers use correct methods to protect their code from these problems. And besides this there is a lack of awareness that it is not just SQL and HTML. Definitely developers should think more broadly: if you generate programmatically &lt;i&gt;any&lt;/i&gt; kind of text you must think about proper encoding of all data used in the generated text.&lt;br /&gt;&lt;br /&gt;Talking about correct methods to secure code from text encoding related problems one my pet peeve is when people try to strip input data when they really should be thinking about protecting output. Nitesh Dhanjani covers this really well in his blog &lt;a href=&quot;http://www.oreillynet.com/onlamp/blog/2005/10/repeat_after_me_lack_of__outpu.html&quot;&gt;&quot;Repeat After Me: Lack of Output Encoding Causes XSS Vulnerabilities&quot;&lt;/a&gt;. Quote:&lt;blockquote&gt;The most common mistake committed by developers (and many security experts, I might add) is to treat XSS as an input validation problem. Therefore, I frequently come across situations where developers fix XSS problems by attempting to filter out meta-characters (&amp;lt;, &gt;, /, “, ‘, etc). At times, if an exhaustive list of meta-characters is used, it does solve the problem, but it makes the application less friendly to the end user – a large set of characters are deemed forbidden. The correct approach to solving XSS problems is to ensure that every user supplied parameter is HTML Output Encoded&lt;/blockquote&gt;A good example of wrong approach is PHP&#39;s invention called &lt;a href=&quot;http://php.net/magic_quotes&quot;&gt;magic quotes&lt;/a&gt;. I have mixed feelings about this thing. On one hand it was probably a good thing because so many web based software is developed by dilettantes so overall we are living in a slightly better world as &lt;i&gt;magic quotes&lt;/i&gt; do somewhat limit damage from bad code. On the other hand it teaches bad habits while not fixing all problems in bad code. Also it causes everybody else to &lt;a href=&quot;http://www.sitepoint.com/blogs/2005/03/02/magic-quotes-headaches/&quot;&gt;suffer&lt;/a&gt;. Good news is that they are &lt;a href=&quot;http://mjsabby.com/2007/07/php6-magic-quotes-best-practice.php&quot;&gt;getting rid of this abomination in PHP6&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now let&#39;s take a look for some examples how not to generate text which I saw in real life. I&#39;ll skip HTML and SQL as this is well covered elsewhere and I&#39;ll take a look on other things I mentioned in the beginning of this article.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;XML files&lt;/b&gt;: bad code which generates XML often shares similar problems as bad code which generates HTML - after all these two are closely related. But as XML is a more generic tool it is used in many domains other then web development where developers are not &quot;blessed&quot; with knowledge of XSS like problems. Moreover I noticed even web developers for some reason often consider XML to be something very different then HTML and suddenly forget they have to escape data. I&#39;m especially amused when that many people are not aware that &lt;a href=&quot;http://www.xml.com/pub/98/07/binary/binary.html&quot;&gt;you cannot put arbitrary binary data in XML&lt;/a&gt;. You have to either encode it into text (base64 encoding is quite popular for this) or put it outside of the XML document.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;CSV files:&lt;/b&gt; this format is still quite popular for exchange of tabular data between programs. Guess what? I&#39;ve seen so many naive CSV producers and parsers that ignore reserved characters and which break later when these programs get real data. No, to write CSV file you cannot just do &lt;pre&gt;print join &quot;,&quot;, @columns&lt;/pre&gt; What if one of columns contains say &quot;,&quot; (comma)?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;LDAP queries:&lt;/b&gt; being text based query language it is a target of very similar problems as SQL. But while many developers are aware of SQL injection problem, not many are aware that you have exactly the same problem with LDAP queries too. Also it doesn&#39;t help that while nearly all SQL libraries provide tools to escape data in SQL queries it doesn&#39;t always seem to be the case for LDAP libraries. For example: &lt;a href=&quot;http://www.php.net/ldap&quot;&gt;PHP&#39;s LDAP extension&lt;/a&gt; - there is no API to escape data at all.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Using shell to execute commands:&lt;/b&gt; if you are running a command using system() in &lt;a href=&quot;http://linux.die.net/man/3/system&quot;&gt;C&lt;/a&gt;, &lt;a href=&quot;http://perldoc.perl.org/functions/system.html&quot;&gt;Perl&lt;/a&gt;, &lt;a href=&quot;http://www.php.net/system&quot;&gt;PHP&lt;/a&gt; or any other language and you are constructing the command from your data you again should treat this as a problem of proper encoding. The example below is from &lt;a href=&quot;http://www.google.com/codesearch?hl=en&amp;q=+%22system(%22+show:4KS8rCF7Owc:s4I4VJk8XWs:7ttjSMLtQcQ&amp;sa=N&amp;cd=31&amp;ct=rc&amp;cs_p=http://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.8a4/src/mozilla-source-1.8a4.tar.bz2&amp;cs_f=mozilla/modules/plugin/samples/SanePlugin/nsSanePlugin.cpp#a0&quot;&gt;mozilla&#39;s source code&lt;/a&gt;&lt;pre&gt;sprintf(cmd, &quot;cp %s %s&quot;, orig_filename, dest_filename);&lt;br /&gt;system(cmd);&lt;/pre&gt;Guess what happens if any of these filenames were not escaped for characters which are special for shell?&lt;br /&gt;&lt;br /&gt;While I&#39;m at this I&#39;d mention that it is probably a good idea to avoid APIs which use shell to execute commands at all. Simply because &lt;a href=&quot;http://ilyamart.blogspot.com/2007/09/perl-as-replacement-for-shell-scripting.html&quot;&gt;shell programming is too hard to get right&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What would help a lot if tools would support developers better when writing correct code which deals with text based APIs. Sometimes it is just lack of documentation on encoding rules. For example a month ago I was learning &lt;a href=&quot;http://developers.facebook.com/&quot;&gt;Facebook APIs&lt;/a&gt;. One of the provided APIs is  API to execute so called &lt;a href=&quot;http://developers.facebook.com/documentation.php?doc=fql&quot;&gt;FQL queries&lt;/a&gt;. This is an SQL like query language and naturally I&#39;d expect FQL injections to be covered in documentation. They don&#39;t, it is not even documented how to escape string data in FQL queries! I played with different queries in FQL console and it seems like standard SQL-like method (i.e. using &quot;\&quot; (backslash)) does work as an escape character in strings but why do I have to find this on my own? It is also shame when libraries built around text APIs do not provides means to properly encode data for used text formats. I mentioned one such example above: PHP&#39;s LDAP extension provides no functions to escape data for LDAP queries. How hard is it to add this? If you are creating text based APIs or libraries around such APIs it is your duty to help developers who will be using them. So do document encoding rules and do provide tools to automatically encode data!</description><link>http://www.martynov.org/2007/09/beyound-xss-and-sql-injections.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6EABosSan5KDIp_M2_wRGwhmMXDJ18UFa15eee8CzB_1lZ72Kl8CHS5406XYvtZ7JZ6YXVn6c2H8MSRGWRMdAxZgdw1afTsLfX7LmIAZAIFnOaNhqkEeT4Ve3r8jFG6CW2vTM_FqsUQ/s72-c/IMG_7193.jpg" height="72" width="72"/><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-1292853529521022988</guid><pubDate>Thu, 06 Sep 2007 14:33:00 +0000</pubDate><atom:updated>2007-09-07T00:01:11.143+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">bash</category><category domain="http://www.blogger.com/atom/ns#">perl</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">scripting</category><category domain="http://www.blogger.com/atom/ns#">shell</category><title>Perl as replacement for shell scripting (Part I)</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2yitN_O_4VKrsdUbs1Beu9fhMf1l4t_3SEPndFvX0yXf3pNneC92pxdsv2StnjuBNnL-jy6iOz8hN6HVDsPyzL37sHU2Q3JPSo1pN5TG4Tgs5BOwKyXFDtPQlEkL4OFOssy6sPVZAzQ/s1600-h/IMG_6676.jpg&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2yitN_O_4VKrsdUbs1Beu9fhMf1l4t_3SEPndFvX0yXf3pNneC92pxdsv2StnjuBNnL-jy6iOz8hN6HVDsPyzL37sHU2Q3JPSo1pN5TG4Tgs5BOwKyXFDtPQlEkL4OFOssy6sPVZAzQ/s200/IMG_6676.jpg&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5107183623653744162&quot; /&gt;&lt;/a&gt;By shell scripting I mean bash as it is what most (all?) Linux distributions use. Bash can be used as a &lt;a href=&quot;http://tldp.org/LDP/abs/html/why-shell.html&quot;&gt;quite capable programming language&lt;/a&gt;. Bash allows programmer to build rather complex scripts by using other programs as building blocks. System comes with a number of such building blocks: &lt;a href=&quot;http://tldp.org/LDP/abs/html/part4.html&quot;&gt;find, grep, sed, awk and many others&lt;/a&gt; and unsurprisingly there is a lot you can do with them. But it is often a challenge to write robust shell scripts which work or at least fail gracefully for any kind of input. The main reason is that historically shell scripts could use one only data type - string&lt;a href=&quot;#ref1&quot;&gt;*&lt;/a&gt;.  Those building blocks, external programs you use in shell scripts have very restricted interface: there are program arguments which are strings, stream of strings as input, stream of strings as output and exit code.&lt;br /&gt;&lt;br /&gt;Even a simple concept like a list have to be emulated. For example a list of file names often is passed as a string which contains these file names separated by whitespace. But what if one of these file names contains whitespace? You get a problem. To fix it you need to escape whitespace characters in the filename. And it is rather easy to miss places where you have to do escaping.  A bit convolved example:&lt;pre&gt;rm `ls`&lt;/pre&gt;This would delete all files in the current directory .. unless they have whitespace characters in their names. There are many similar cases where an unwary programmer can make a mistake in his(her) shell script. Passing data from one process to another often requires a lot of care and the simplest code is often wrong. Another problem is that you are very limited in how you can handle errors in shell scripts - you only have process&#39;s exit code to tell you if it finished successfully. And usually it is just a &lt;a href=&quot;http://tldp.org/LDP/abs/html/exitcodes.html&quot;&gt;boolean value saying you if there was any error or not&lt;/a&gt;. Quote from the linked document:&lt;blockquote&gt;However, many scripts use an exit 1 as a general bailout upon error. Since exit code 1 signifies so many possible errors, this probably would not be helpful in debugging.&lt;/blockquote&gt;If say &lt;i&gt;mkdir&lt;/i&gt; fails your script cannot easily tell if it is because another directory with the same name already exists or you just don&#39;t have permissions for this operation.&lt;br /&gt;&lt;br /&gt;So any solutions to this problem? As for myself any moment I see my shell script getting longer then three lines of code I rewrite whole thing into Perl. In Perl you don&#39;t need to use external programs as much as often as you need in bash. Therefore you are not limited to their restrictive interfaces of them (remember, only strings and exit codes for input and output); native Perl APIs can be much more expressive when they need to.&lt;br /&gt;&lt;br /&gt;There is a price though. Perl code is not always as compact as similar shell code for some scripting tasks. This is because the shell scripting is optimized very well to handle interaction of processes and Perl is not as much. It is worth to mention that many things which come for granted in the shell scripting often require you using Perl modules including non standard CPAN Perl modules. It is not problem as such except that not all Perl programmers know where to look for things if they are not covered by &lt;a href=&quot;http://perldoc.perl.org/perlfunc.html&quot;&gt;perlfunc&lt;/a&gt;. This mainly a concern for newbie Perl programmers but it is still a real problem. Also using CPAN modules is not always an option.&lt;br /&gt;&lt;br /&gt;Of course in your Perl program you can fail back to using same external programs you would use in a shell script but then you lose advantages of Perl over shell scripting. So .. don&#39;t do this if possible. As interesting example of this principle: Perl before version 5.6.0 would fail back to shell to execute operation &lt;a href=&quot;http://perldoc.perl.org/functions/glob.html&quot;&gt;glob&lt;/a&gt;. That was causing various problems for Perl developers: for example I saw Perl programs using &lt;i&gt;glob&lt;/i&gt; to fail when run on one tightly secured web hosting server because binary Perl was calling was simply removed from the server for security reasons. In later versions of Perl the implementation of &lt;i&gt;glob&lt;/i&gt; was changed: it is implemented purely in Perl now and doesn&#39;t use external programs.&lt;br /&gt;&lt;br /&gt;To be continued in Part II: mapping between common shell operations and corresponding Perl modules.&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;a name=&quot;ref1&quot;&gt;[*]&lt;/a&gt; New versions of bash support &lt;a href=&quot;http://tldp.org/LDP/abs/html/arrays.html&quot;&gt;arrays&lt;/a&gt;. I&#39;d argue that usefulness of arrays in bash is limited as programs you call from shell scripts cannot use them to pass output data. You are still limited to string streams and exit codes. Not to mention that this is not very portable across different systems.</description><link>http://www.martynov.org/2007/09/perl-as-replacement-for-shell-scripting.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2yitN_O_4VKrsdUbs1Beu9fhMf1l4t_3SEPndFvX0yXf3pNneC92pxdsv2StnjuBNnL-jy6iOz8hN6HVDsPyzL37sHU2Q3JPSo1pN5TG4Tgs5BOwKyXFDtPQlEkL4OFOssy6sPVZAzQ/s72-c/IMG_6676.jpg" height="72" width="72"/><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-8317552320229065965</guid><pubDate>Thu, 23 Aug 2007 20:54:00 +0000</pubDate><atom:updated>2007-09-07T00:02:04.928+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">c++</category><category domain="http://www.blogger.com/atom/ns#">documentation</category><category domain="http://www.blogger.com/atom/ns#">libxml++</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">xerces c++</category><category domain="http://www.blogger.com/atom/ns#">xml</category><title>libxml++ vs xerces C++</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFVuCO9pstYzeOivHWy0-CZRV1g2JSqaoTQdmqRiGX1KJlils1FTLQ9OmZgdWoHicOn77AYnWNRR91Gu3vfZemA1c316G4NR8TbZbLfomPNmgMmfY1u2uZ2w6yThmSfp7ddaUQ1y0gAw/s1600-h/IMG_6778.jpg&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFVuCO9pstYzeOivHWy0-CZRV1g2JSqaoTQdmqRiGX1KJlils1FTLQ9OmZgdWoHicOn77AYnWNRR91Gu3vfZemA1c316G4NR8TbZbLfomPNmgMmfY1u2uZ2w6yThmSfp7ddaUQ1y0gAw/s200/IMG_6778.jpg&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5102005413513190930&quot; /&gt;&lt;/a&gt; When I was reading &lt;a href=&quot;http://www.acmqueue.com/modules.php?name=Content&amp;pa=showpage&amp;amp;pid=488&quot;&gt;&quot;API: Design Matters&quot;&lt;/a&gt; I recalled one example of good API vs bad API. Actually my example is more about good API documentation vs bad API documentation but I suspect there is a correlation between these two things. It is definitely hard to write good documentation if your API sucks.&lt;br /&gt;&lt;br /&gt;So my story is that I had a task to read XML data in C++ application. XML data was small and performance of this part of the application was not critical so it looked like the simplest way to read this data was to load DOM tree for XML document and just use &lt;a href=&quot;http://www.w3.org/DOM/&quot;&gt;DOM API&lt;/a&gt; and maybe couple simple &lt;a href=&quot;http://www.w3.org/TR/xpath&quot;&gt;XPath&lt;/a&gt; queries. It was the first time I needed to do this in C++; I had no previous experience with any XML C++ libraries. So, I do google search (or maybe it was apt-cache search - I don&#39;t remember) and the first thing I find is &lt;a href=&quot;http://xml.apache.org/xerces-c/&quot;&gt;xerces C++&lt;/a&gt;. Quote from project&#39;s website:&lt;br /&gt;&lt;blockquote&gt;Xerces-C++ makes it easy to give your application the ability to read and write XML data.&lt;/blockquote&gt;Sounds good, just what I need. So I dig &lt;a href=&quot;http://xml.apache.org/xerces-c/api.html&quot;&gt;documentation&lt;/a&gt; and find it to be completely unhelpful as it is just Doxygen autogenerated &lt;a href=&quot;http://www.codinghorror.com/blog/archives/000451.html&quot;&gt;undocumentation&lt;/a&gt;. Fine, I can read code, let&#39;s check sample code then. I open sample code and I find that the shortest example how to parse XML into DOM tree and how to access data in the tree (&lt;a href=&quot;http://xml.apache.org/xerces-c/domcount.html&quot;&gt;DOMCount&lt;/a&gt;) consists of two files which are more then 600 lines long in total. Huh? I don&#39;t want to read 15 pages of code just to learn how to do two simple actions: parse XML into DOM and get data from DOM. Other examples are even more bad. Several files, several classes just to read and print freaking XML (&lt;a href=&quot;http://xml.apache.org/xerces-c/domprint.html&quot;&gt;DOMPrint&lt;/a&gt;). You&#39;ve got to be kidding me. It cannot be that hard.&lt;br /&gt;&lt;br /&gt;I don&#39;t really want to waste hours to learn API I&#39;m unlikely to use ever again. After all I don&#39;t write much C++ code and I definitely don&#39;t write much C++ code that needs XML. So time to search further. Next hit is &lt;a href=&quot;http://libxmlplusplus.sourceforge.net/&quot;&gt;libxml++&lt;/a&gt;. It is C++ wrapper over popular C XML library &lt;a href=&quot;http://www.xmlsoft.org/&quot;&gt;libxml&lt;/a&gt;. This time there is actually some &lt;a href=&quot;http://libxmlplusplus.sourceforge.net/docs/manual/html/index.html&quot;&gt;documentation&lt;/a&gt; that does try to explain how to use the library. And this documentation contains an example which while being just about 150 lines manages to demonstrate most of library&#39;s DOM API.&lt;br /&gt;&lt;br /&gt;End result: I finish my code to read my XML data in next 30 minutes using libxml++. It is simple, short and it works.&lt;br /&gt;&lt;br /&gt;So what&#39;s wrong with xerces C++? There is no introduction level documentation at all. Examples look too complex for the problem they are supposed to show solution for. And the reason for this is that API  is just bad: it requires writing unnecessary complex client code.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Update:&lt;/span&gt; &lt;a href=&quot;http://codesynthesis.com/~boris/blog/&quot;&gt;boris&lt;/a&gt; corrected me about lack of introduction level documentation in a comment to this blog post. Turned out I &lt;a href=&quot;http://xml.apache.org/xerces-c/program.html&quot;&gt;missed it&lt;/a&gt;. As a weak excuse I&#39;ll blame bad navigation on the project&#39;s site :)</description><link>http://www.martynov.org/2007/08/libxml-vs-xerces-c.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFVuCO9pstYzeOivHWy0-CZRV1g2JSqaoTQdmqRiGX1KJlils1FTLQ9OmZgdWoHicOn77AYnWNRR91Gu3vfZemA1c316G4NR8TbZbLfomPNmgMmfY1u2uZ2w6yThmSfp7ddaUQ1y0gAw/s72-c/IMG_6778.jpg" height="72" width="72"/><thr:total>12</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-239686291193840976</guid><pubDate>Thu, 16 Aug 2007 12:22:00 +0000</pubDate><atom:updated>2007-09-07T00:02:33.286+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">dba</category><category domain="http://www.blogger.com/atom/ns#">indexing</category><category domain="http://www.blogger.com/atom/ns#">mysql</category><category domain="http://www.blogger.com/atom/ns#">optimization</category><category domain="http://www.blogger.com/atom/ns#">sql</category><title>4 silly mistakes in use of MySQL indexes</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEickojf0uweRZksTEAMNdYAqtvrNmkmL8yefb6qAitu6EpAgNCNtmrjjIuh-Uwn7Bl5R1wjGqWuuV_YRM4XYUEnDYZLLbtXXHzXTYx6g13gz9RqjAShnUchARKndMcFMaZ1MAS8WirBUA/s1600-h/IMG_7362.jpg&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEickojf0uweRZksTEAMNdYAqtvrNmkmL8yefb6qAitu6EpAgNCNtmrjjIuh-Uwn7Bl5R1wjGqWuuV_YRM4XYUEnDYZLLbtXXHzXTYx6g13gz9RqjAShnUchARKndMcFMaZ1MAS8WirBUA/s200/IMG_7362.jpg&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5102004064893459970&quot; /&gt;&lt;/a&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;1. Not learning how to use EXPLAIN SELECT&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I&#39;m really surprised how many developers who use MySQL all the time and who do not know or understand how to use &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/explain.html&quot;&gt;EXPLAIN SELECT&lt;/a&gt;. I&#39;ve seen several times developers proposing serious architectural changes to their code to minimize, partition or cache data in their database when the actual solution was to spend 30 minutes thinking over result of &lt;span style=&quot;font-style: italic;&quot;&gt;EXPLAIN SELECT&lt;/span&gt; and adding or changing couple indexes.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;2. Wasting space with redundant indexes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you have multicolumn index it means you don&#39;t need a separate index which is subset of the first index. It is easier to explain with an example:&lt;pre&gt;CREATE TABLE table1 (&lt;br /&gt;   col1 INT,&lt;br /&gt;   col2 INT,&lt;br /&gt;   PRIMARY (col1, col2),&lt;br /&gt;   KEY (col1)&lt;br /&gt;);&lt;/pre&gt;Index on &lt;span style=&quot;font-style: italic;&quot;&gt;col1&lt;/span&gt; is redundant as any search on &lt;span style=&quot;font-style: italic;&quot;&gt;col1&lt;/span&gt; can use primary index. This just wastes disk space and might make some queries which change this table a bit slower.&lt;br /&gt;&lt;br /&gt;There is one &lt;span style=&quot;font-weight: bold;&quot;&gt;but&lt;/span&gt;! See below..&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;3. Incorrect order of columns in index&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Order of columns in multicolumn index is &lt;span style=&quot;font-weight: bold;&quot;&gt;important&lt;/span&gt;. From &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html&quot;&gt;MySQL documentation&lt;/a&gt;: &lt;blockquote&gt;MySQL cannot use an index if the columns do not form a leftmost prefix of the index.&lt;/blockquote&gt;Example: &lt;pre&gt;CREATE TABLE table2 (&lt;br /&gt;   id INT PRIMARY,&lt;br /&gt;   col1 INT,&lt;br /&gt;   col2 INT,&lt;br /&gt;   col3 INT,&lt;br /&gt;   KEY (col1, col2)&lt;br /&gt;);&lt;/pre&gt;MySQL wont use any indexes for query like &lt;pre&gt;SELECT * FROM table2 WHERE col2=123&lt;/pre&gt;&lt;span style=&quot;font-style: italic;&quot;&gt;EXPLAIN SELECT&lt;/span&gt; shows this instantly. If you want to run this query faster either change order of columns in the index or add another one.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;4. Not using multicolumn indexes when you need to&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;MySQL can use only one index per table in a time so if you query by several columns in the table you may need to add multicolumn index. Example: &lt;pre&gt;CREATE TABLE table3 (&lt;br /&gt;   id INT PRIMARY,&lt;br /&gt;   col1 INT,&lt;br /&gt;   col2 INT,&lt;br /&gt;   col3 INT,&lt;br /&gt;   KEY (col1)&lt;br /&gt;);&lt;/pre&gt;Query like &lt;pre&gt;SELECT * FROM table2 WHERE col1=123 AND col2=456&lt;/pre&gt; would use the index on &lt;span style=&quot;font-style: italic;&quot;&gt;col1&lt;/span&gt; to reduce number of rows to check but MySQL can do much better if you add multicolumn index which covers both &lt;span style=&quot;font-style: italic;&quot;&gt;col1&lt;/span&gt; and &lt;span style=&quot;font-style: italic;&quot;&gt;col2&lt;/span&gt;. The effect of adding such index is very easy to see with &lt;span style=&quot;font-style: italic;&quot;&gt;EXPLAIN SELECT&lt;/span&gt;.</description><link>http://www.martynov.org/2007/08/4-silly-mistakes-in-use-of-mysql.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEickojf0uweRZksTEAMNdYAqtvrNmkmL8yefb6qAitu6EpAgNCNtmrjjIuh-Uwn7Bl5R1wjGqWuuV_YRM4XYUEnDYZLLbtXXHzXTYx6g13gz9RqjAShnUchARKndMcFMaZ1MAS8WirBUA/s72-c/IMG_7362.jpg" height="72" width="72"/><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-4538232776422440397</guid><pubDate>Tue, 31 Jul 2007 14:58:00 +0000</pubDate><atom:updated>2007-09-07T00:02:57.542+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">boost</category><category domain="http://www.blogger.com/atom/ns#">c++</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">threads</category><category domain="http://www.blogger.com/atom/ns#">volatile</category><title>volatile and threading</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRM2ZbQLfEKAI-N4vSUWP0qFXWAvRbf7VKEUngTjg5CMvDF39mqLQqPwvtjF12C9musOh3RXNrpyNMojs8NLZG5KLqV4vVNRvW73suMCMS1GW_q2EC_38qG1dEoh_g9m4RU_yuCjsG7Q/s1600-h/IMG_7393.jpg&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRM2ZbQLfEKAI-N4vSUWP0qFXWAvRbf7VKEUngTjg5CMvDF39mqLQqPwvtjF12C9musOh3RXNrpyNMojs8NLZG5KLqV4vVNRvW73suMCMS1GW_q2EC_38qG1dEoh_g9m4RU_yuCjsG7Q/s200/IMG_7393.jpg&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5097456131194187442&quot; /&gt;&lt;/a&gt;Until recently I hadn&#39;t much experience writing multi-threading programs in C++ so when I tried to I found that I&#39;m really confused how multi-threading programs mix with &lt;i&gt;volatile&lt;/i&gt; variables. So I did a little research and quick summary is: this topic &lt;b&gt;is&lt;/b&gt; confusing. It looks like if you put locks around global variables shared between threads you shouldn&#39;t care about &lt;i&gt;volatile&lt;/i&gt; flag. Definitely under POSIX threads and most likely when using other threading libraries as well. If you don&#39;t and rely on &lt;a href=&quot;http://en.wikipedia.org/wiki/Atomic_operations&quot;&gt;atomic operations&lt;/a&gt; it seems that you have to use &lt;i&gt;volatile&lt;/i&gt; flag for shared global variables but concerning portability it is a grey area.&lt;br /&gt;&lt;br /&gt;Longer story is below:&lt;br /&gt;&lt;br /&gt;Suppose we have a piece of code which waits for a certain external condition to happen. The code could look like&lt;br /&gt;&lt;pre&gt;bool gEvent = false;&lt;br /&gt;&lt;br /&gt;void waitLoop() {&lt;br /&gt;    while (!gEvent) {&lt;br /&gt;        sleep(1);&lt;br /&gt;    }&lt;br /&gt;    ...&lt;br /&gt;}&lt;/pre&gt;Let&#39;s assume that this is a single threaded program and the external condition we are waiting for is a Unix signal. The signal handler is very simple - it simply sets &lt;i&gt;gEvent&lt;/i&gt; to true:&lt;br /&gt;&lt;pre&gt;void wakeUp() {&lt;br /&gt;    gEvent = true;&lt;br /&gt;}&lt;/pre&gt;The problem with the code above is that compiler would optimize out check of the condition inside &lt;i&gt;waitLoop()&lt;/i&gt; incorrectly assuming from local analysis of the code that &lt;i&gt;gEvent&lt;/i&gt; never changes. The fix is to declare &lt;i&gt;gEvent&lt;/i&gt; with &lt;i&gt;volatile&lt;/i&gt; modifier which basically tells compiler that the variable can be changed at any time and that is unsafe to perform any optimization based on the analysis of local code:&lt;br /&gt;&lt;pre&gt;volatile bool gEvent = false;&lt;/pre&gt;Let&#39;s take another example. The code is same but this time it is a mutli-threaded program where one thread waits for another. So &lt;i&gt;waitLoop()&lt;/i&gt; runs inside one thread and &lt;i&gt;wakeUp()&lt;/i&gt; eventually called from another. Is the code still correct? Probably yes if we keep &lt;i&gt;volatile&lt;/i&gt; flag and if operations which read or write &lt;i&gt;gEvent&lt;/i&gt; variable can be considered as atomic. The later assumptions seems to be correct for most (all?) platforms.&lt;br /&gt;&lt;br /&gt;But what if we cannot treat operations which read or write &lt;i&gt;gEvent&lt;/i&gt; variable as atomic? For example it might be an instance of a more complex type; for example an instance of class which contains other information then just a information whenever event have happened or not:&lt;br /&gt;&lt;pre&gt;struct EventInfo {&lt;br /&gt;    EventInfo(bool happened = false, const string&amp; source = &quot;&quot;)&lt;br /&gt;        : fHappened(happened), fSource(source)&lt;br /&gt;    {}&lt;br /&gt;    bool fHappened;&lt;br /&gt;    string fSource;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;volatile EventInfo gEventInfo;&lt;br /&gt;&lt;br /&gt;void waitLoop() {&lt;br /&gt;    while (!fEventInfo.fHappened) {&lt;br /&gt;        sleep(1);&lt;br /&gt;    }&lt;br /&gt;    const string&amp; eventSource = fEventInfo.fSource;&lt;br /&gt;    ...&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void wakeUp() {&lt;br /&gt;    gEventInfo = EventInfo(true, &quot;wakeUp&quot;);&lt;br /&gt;}&lt;/pre&gt;This code is still ok for single-threaded program where &lt;i&gt;wakeUp()&lt;/i&gt; is a signal handler but is unsafe for multi-threaded program where &lt;i&gt;wakeUp()&lt;/i&gt; runs in a separate thread as operations on &lt;i&gt;gEventInfo&lt;/i&gt; cannot be treated as atomic anymore.&lt;br /&gt;&lt;br /&gt;So how do we fix it? We should surround places where code reads or writes &lt;i&gt;gEventInfo&lt;/i&gt; with locks to make sure only one thread accesses &lt;i&gt;gEventInfo&lt;/i&gt; at a time. I&#39;ll use &lt;a href=&quot;http://www.boost.org/doc/html/thread.html&quot;&gt;boost thread&lt;/a&gt; library in the example.&lt;br /&gt;&lt;pre&gt;boost::mutex gMutex;&lt;br /&gt;&lt;br /&gt;void waitLoop() {&lt;br /&gt;    string eventSource;&lt;br /&gt;&lt;br /&gt;    for (bool eventHappened = false; !eventHappened; ) {&lt;br /&gt;        {&lt;br /&gt;            boost::mutex::scoped_lock lock(gMutex);&lt;br /&gt;            eventHappened = fEventInfo.fHappened;&lt;br /&gt;            eventSource = fEventInfo.fSource;&lt;br /&gt;        }&lt;br /&gt;        sleep(1);&lt;br /&gt;    }&lt;br /&gt;    ...&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void wakeUp() {&lt;br /&gt;    boost::mutex::scoped_lock lock(gMutex);&lt;br /&gt;&lt;br /&gt;    gEventInfo = EventInfo(true, &quot;wakeUp&quot;);&lt;br /&gt;}&lt;/pre&gt;Comparing this code with earlier examples it looks like we still need to declare &lt;span style=&quot;font-style: italic;&quot;&gt;gEventInfo &lt;/span&gt;variable as &lt;i&gt;volatile&lt;/i&gt; but it turns out we don&#39;t really need to. Quote from &lt;a href=&quot;http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf&quot;&gt;Thread Cannot be Implemented as a Library [PDF]&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;In practice, C and C++ implementations that support&lt;br /&gt;Pthreads generally proceed as follows:&lt;ol&gt;&lt;li&gt;Functions such as &lt;span style=&quot;font-style: italic;&quot;&gt;pthread_mutex_lock()&lt;/span&gt; that are guaranteed by the standard to “synchronize memory” include hardware instructions (“memory barriers”) that prevent hardware reordering of memory operations around the call.&lt;/li&gt;&lt;li&gt;To prevent the compiler from moving memory operations around calls to functions such as &lt;span style=&quot;font-style: italic;&quot;&gt;pthread_mutex_lock()&lt;/span&gt;, they are essentially treated as calls to opaque functions, about which the compiler has no information. The compiler effectively assumes that &lt;span style=&quot;font-style: italic;&quot;&gt;pthread_mutex_lock()&lt;/span&gt; may read or write any global variable. Thus a memory reference cannot simply be moved across the call. This approach also ensures that transitive calls, e.g. a call to a function &lt;span style=&quot;font-style: italic;&quot;&gt;f()&lt;/span&gt; which then calls &lt;span style=&quot;font-style: italic;&quot;&gt;pthread_mutex_lock()&lt;/span&gt;, are handled in the same way more or less appropriately, i.e. memory operations are not moved across the call to &lt;span style=&quot;font-style: italic;&quot;&gt;f()&lt;/span&gt; either, whether or not the entire user program is being analyzed at once.&lt;/li&gt;&lt;/ol&gt;&lt;/blockquote&gt;So at least if you using POSIX threads (boost::threads under Linux uses them) your code is probably safe without use of &lt;i&gt;volatile&lt;/i&gt; as long as you use locks around global variables shared between several threads. Good question whenever this example code is portable to other platforms; after all boost::threads supports threading libraries other then POSIX which may have other rules for mutexes and locks. I haven&#39;t researched this yet as for now I don&#39;t really care about other platforms.&lt;br /&gt;&lt;br /&gt;Some interesting links on this topic:&lt;a href=&quot;http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/faq.html&quot;&gt;&lt;br /&gt;&lt;/a&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/faq.html&quot;&gt;A Memory model for C++: FAQ&lt;/a&gt; - mentions shortly reasons why &lt;i&gt;volatile&lt;/i&gt; keyword is insufficient to ensure synchronization between threads and has links on papers for further reading.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://www.artima.com/cppsource/threads_meeting.html&quot;&gt;http://www.artima.com/cppsource/threads_meeting.html&lt;/a&gt; - Not much to read there but I love this quote: &lt;span style=&quot;font-style: italic;&quot;&gt;&quot;Not all the dragons were so easily defeated, unfortunately. Among the issues guaranteed to          waste at least 20 minutes of group time with little or nothing to show ...            What does volatile mean?&quot;&lt;/span&gt; (this in context of multi-threaded programs). If C++ experts cannot agree on this ...&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://groups.google.com/group/comp.programming.threads/browse_frm/thread/399797d84a5c37d5/bd2cb64e70c9d155?&amp;amp;hl=en&quot;&gt;Another person gets confused&lt;/a&gt; over use of &lt;i&gt;volatile&lt;/i&gt; and threads. Interesting discussion on comp.programming.threads.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;</description><link>http://www.martynov.org/2007/07/volatile-and-threading.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRM2ZbQLfEKAI-N4vSUWP0qFXWAvRbf7VKEUngTjg5CMvDF39mqLQqPwvtjF12C9musOh3RXNrpyNMojs8NLZG5KLqV4vVNRvW73suMCMS1GW_q2EC_38qG1dEoh_g9m4RU_yuCjsG7Q/s72-c/IMG_7393.jpg" height="72" width="72"/><thr:total>10</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-1505494991516056298</guid><pubDate>Wed, 25 Jul 2007 16:35:00 +0000</pubDate><atom:updated>2007-08-03T00:49:34.584+04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">boost</category><category domain="http://www.blogger.com/atom/ns#">c++</category><category domain="http://www.blogger.com/atom/ns#">documentation</category><category domain="http://www.blogger.com/atom/ns#">programming</category><category domain="http://www.blogger.com/atom/ns#">threads</category><title>boost::thread and boost::mutex tutorial</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmRTmnTdZlX7b8v3cEdLx3VSUmbXX_LAW5fZd6HW-3KDQWmE5PZO5IBTDiW4F6M3tiQPRWiG3atXqVA74RyPLz4eLHCuTwzla2ewtQhE7pPqc3-dyZtTyn9uwgEv3_SuPQG07Ynmpq7w/s1600-h/IMG_6641.jpg&quot;&gt;&lt;img style=&quot;margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmRTmnTdZlX7b8v3cEdLx3VSUmbXX_LAW5fZd6HW-3KDQWmE5PZO5IBTDiW4F6M3tiQPRWiG3atXqVA74RyPLz4eLHCuTwzla2ewtQhE7pPqc3-dyZtTyn9uwgEv3_SuPQG07Ynmpq7w/s200/IMG_6641.jpg&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5093440091959291554&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;For most &lt;a href=&quot;http://www.boost.org/&quot;&gt;Boost&lt;/a&gt;&#39;s libraries its documentation requires you to read everything from the start to the end before you can write any code. Compare that with most of &lt;a href=&quot;http://search.cpan.org/&quot;&gt;CPAN&lt;/a&gt; modules where usually you can start using CPAN module after quickly scanning &lt;span style=&quot;font-style: italic;&quot;&gt;synopsis&lt;/span&gt; and maybe &lt;span style=&quot;font-style: italic;&quot;&gt;description&lt;/span&gt; parts of it&#39;s POD documentation. POD documentation as a rule has good examples right on top of the page. Boost&#39;s documentation usually doesn&#39;t.&lt;br /&gt;&lt;br /&gt;So I was looking for basic usage examples for boost::thread and boost::mutex classes and initially I couldn&#39;t find any because I was using wrong search keywords.  In the end I figured out how to use  boost::thread and boost::mutex classes in my application hard way by reading &lt;a href=&quot;http://www.boost.org/doc/html/thread.html&quot;&gt;Boost documentation&lt;/a&gt; without relying on any examples. But afterwards I did find a very good article on this topic with many simple examples:  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;/span&gt;&lt;a href=&quot;http://www.drdobbs.com/dept/cpp/184401518&quot;&gt;The Boost.Threads Library&lt;/a&gt; on Dr.Dobb&#39;s. So I&#39;m posting this link here for google. It is in top 10 hits for some relevant keywords but it is not for others (for example for &lt;a href=&quot;http://www.google.com/search?q=boost+thread+mutex+tutorial&quot;&gt;boost thread mutex tutorial&lt;/a&gt;) and this is why I missed it initially. If my blog post helps any Boost.Threads newbie to get started then I would consider time spent writing this post to be not wasted.</description><link>http://www.martynov.org/2007/07/boostthreadboostmutex-tutorial.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmRTmnTdZlX7b8v3cEdLx3VSUmbXX_LAW5fZd6HW-3KDQWmE5PZO5IBTDiW4F6M3tiQPRWiG3atXqVA74RyPLz4eLHCuTwzla2ewtQhE7pPqc3-dyZtTyn9uwgEv3_SuPQG07Ynmpq7w/s72-c/IMG_6641.jpg" height="72" width="72"/><thr:total>57</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-7764126179108633515.post-3705498351940460209</guid><pubDate>Wed, 18 Jul 2007 11:46:00 +0000</pubDate><atom:updated>2007-09-26T01:00:17.921+04:00</atom:updated><title>Starting new blog</title><description>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1lzNZGsN5AQ2S8stUuRkIkQWbSlVxUETXK8PpA-GD3dgm0hft3G-VnrXXoscIluznVoz_e7WWlmsxZuLQNMMYDlhme6ZRZ-yi170l8nqsh1co3R5i0_M41KqQ13n4nvhI2kVIPh72Yg/s1600-h/IMG_6195.jpg&quot;&gt;&lt;img style=&quot;margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1lzNZGsN5AQ2S8stUuRkIkQWbSlVxUETXK8PpA-GD3dgm0hft3G-VnrXXoscIluznVoz_e7WWlmsxZuLQNMMYDlhme6ZRZ-yi170l8nqsh1co3R5i0_M41KqQ13n4nvhI2kVIPh72Yg/s200/IMG_6195.jpg&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5088510382512968418&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;I used to have a blog at &lt;a href=&quot;http://use.perl.org/%7EIlyaM/journal/&quot;&gt;use.perl.org&lt;/a&gt;  but  I just was too lazy to write often. One problem it seems you need a certain discipline to keep doing that. Also I just didn&#39;t like this site blogging engine. It just looked too simple and offered little control.&lt;br /&gt;&lt;br /&gt;At the certain point I tried to switch over a blog on &lt;a href=&quot;http://martynov.org/&quot;&gt;my personal site&lt;/a&gt;&lt;span class=&quot;on down&quot; style=&quot;display: block;&quot; id=&quot;formatbar_CreateLink&quot; title=&quot;Link&quot; onmouseover=&quot;ButtonHoverOn(this);&quot; onmouseout=&quot;ButtonHoverOff(this);&quot; onmouseup=&quot;&quot; onmousedown=&quot;CheckFormatting(event);FormatbarButton(&#39;richeditorframe&#39;, this, 8);ButtonMouseDown(this);&quot;&gt;&lt;/span&gt;but instead of actually blogging I got carried away by designing a &quot;perfect&quot; system for my blog. I spend hours evaluating different software for my blog and I had very exotic requirements like being able to use SCM software to store my posts. That implied I need a blog software which uses raw files to store posts. I ended up hacking something monstrous what was a combination of &lt;a href=&quot;http://blosxom.sourceforge.net/&quot;&gt;Blosxom&lt;/a&gt;, &lt;a href=&quot;http://darcs.net/&quot;&gt;darcs&lt;/a&gt; and make. And it wasn&#39;t that convenient to use at all either. In the end I probably spent much more time setting up all this then actually blogging.&lt;br /&gt;&lt;br /&gt;So now I want to start from the scratch: pick some blogging engine which doesn&#39;t get into a way and discipline myself to actually write periodically. From my experience learning new programming languages you learn much faster when you have an actual project you are trying to implement in the new language. In a similar venue I&#39;d expect it would be much easier to find new topics for my blog each day if I have a certain new fun project on my mind. And this new project is going to be teaching myself OCaml. Let&#39;s see how it goes.</description><link>http://www.martynov.org/2007/07/starting-new-blog.html</link><author>noreply@blogger.com (Ilya Martynov)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1lzNZGsN5AQ2S8stUuRkIkQWbSlVxUETXK8PpA-GD3dgm0hft3G-VnrXXoscIluznVoz_e7WWlmsxZuLQNMMYDlhme6ZRZ-yi170l8nqsh1co3R5i0_M41KqQ13n4nvhI2kVIPh72Yg/s72-c/IMG_6195.jpg" height="72" width="72"/><thr:total>0</thr:total></item></channel></rss>