<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Torsten's STDOUT</title>
    <author>
        <name>Torsten Becker</name>
    </author>
    <rights>Copyright 2009-2015 Torsten Becker</rights>
    <link type="text/html" href="https://torsten.io" rel="alternate"/>
    <link type="application/atom+xml" href="https://torsten.io/stdout.atom" rel="self"/>
    <generator>Jekyll</generator>
    <id>https://torsten.io/stdout.atom</id>
    <updated>2015-06-16T12:46:38-04:00</updated>

	
	    <entry>
            <id>https://torsten.io/stdout/how-to-profile-clojure-code</id>
            <link type="text/html" href="https://torsten.io/stdout/how-to-profile-clojure-code" rel="alternate"/>
            <updated>2015-05-22T00:00:00-04:00</updated>
			<title>How to Profile Clojure Code</title>
            <content type="html">&lt;p&gt;I had some trouble to find a straight answer on how to profile a Clojure application to find CPU hotspots. After some searching and tinkering, I ended up with these steps that worked for me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1&amp;#x2e;&lt;/strong&gt; Download, install, and launch &lt;a href=&quot;http://visualvm.java.net/download.html&quot;&gt;VisualVM&lt;/a&gt;. VisualVM is definitely &lt;em&gt;good enough&lt;/em&gt; for basic profiling. If you want more detail, you might want to give &lt;a href=&quot;https://www.yourkit.com/&quot;&gt;YourKit&lt;/a&gt; a try. You&amp;#39;ll have to pay a license for YourKit, though.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2&amp;#x2e;&lt;/strong&gt; Add the following to your Leiningen &lt;code&gt;project.clj&lt;/code&gt; file. Either in a profile or on the top level:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;clojure language-clojure&quot; data-lang=&quot;clojure&quot;&gt;&lt;span class=&quot;ss&quot;&gt;:jvm-opts&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;-Dcom.sun.management.jmxremote&amp;quot;&lt;/span&gt;
           &lt;span class=&quot;s&quot;&gt;&amp;quot;-Dcom.sun.management.jmxremote.ssl=false&amp;quot;&lt;/span&gt;
           &lt;span class=&quot;s&quot;&gt;&amp;quot;-Dcom.sun.management.jmxremote.authenticate=false&amp;quot;&lt;/span&gt;
           &lt;span class=&quot;s&quot;&gt;&amp;quot;-Dcom.sun.management.jmxremote.port=43210&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will enable Java Management Extensions (JMX) for the Clojure JVM process. VisualVM uses these extensions to profile the JVM process. The port &lt;code&gt;43210&lt;/code&gt; is arbitrary, pick what you like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3a&amp;#x2e;&lt;/strong&gt; If you are running the Clojure application on local host &lt;a href=&quot;#step6&quot;&gt;continue with step 6&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3b&amp;#x2e;&lt;/strong&gt; If the Clojure application is running on a remote host:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4&amp;#x2e;&lt;/strong&gt; Connect to the remote host and &lt;a href=&quot;http://linux.die.net/man/1/ssh&quot;&gt;open a SOCKS proxy&lt;/a&gt; with ssh:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;bash language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh remote_host -D 9191
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will allow VisualVM to connect to JMX on the remote host. Strangely, just forwarding a single port (with &lt;code&gt;-L&lt;/code&gt;) has not worked for me. Apparently VisualVM needs more than one connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5&amp;#x2e;&lt;/strong&gt; Enable SOCKS proxy in VisualVM&lt;/p&gt;

&lt;p&gt;On Mac OS X, go to &lt;em&gt;VisualVM&lt;/em&gt; → &lt;em&gt;Preferences...&lt;/em&gt;. Then select the &lt;em&gt;Network&lt;/em&gt; tab in the VisualVM app.&lt;/p&gt;

&lt;p&gt;Enable &lt;em&gt;Manual proxy settings&lt;/em&gt;; then set the SOCKS Proxy: to &lt;code&gt;localhost&lt;/code&gt; with port &lt;code&gt;9191&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Change the &lt;em&gt;No Proxy hosts&lt;/em&gt; to &lt;code&gt;local, *.local, 169.254/16, *.169.254/16, localhost, *.localhost&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It is important that &lt;code&gt;127.0.0.1&lt;/code&gt; is not part of that list.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/profiling-clojure/settings.png&quot; alt=&quot;Preferences Screen Shot&quot;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&quot;step6&quot;&gt;&lt;strong&gt;6&amp;#x2e;&lt;/strong&gt;&lt;/a&gt; Attach to the running Clojure process:&lt;/p&gt;

&lt;p&gt;Go to &lt;em&gt;File&lt;/em&gt; → &lt;em&gt;Add JMX Connection...&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;Enter &lt;code&gt;127.0.0.1:43210&lt;/code&gt; as the &lt;em&gt;Connection&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/profiling-clojure/connect.png&quot; alt=&quot;Add JMX Connection Screen Shot&quot;&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7&amp;#x2e;&lt;/strong&gt; Start profiling&lt;/p&gt;

&lt;p&gt;Select the &lt;em&gt;Sampler&lt;/em&gt; tab and click the &lt;em&gt;CPU&lt;/em&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/profiling-clojure/profile.png&quot; alt=&quot;Profiling Screen Shot&quot;&gt;&lt;/p&gt;

&lt;p&gt;Happy optimizing!&lt;/p&gt;
</content>
        </entry>
	
	    <entry>
            <id>https://torsten.io/stdout/expanding-json-arrays-to-rows</id>
            <link type="text/html" href="https://torsten.io/stdout/expanding-json-arrays-to-rows" rel="alternate"/>
            <updated>2013-12-12T00:00:00-05:00</updated>
			<title>Expanding JSON arrays to rows with SQL on RedShift</title>
            <content type="html">&lt;p&gt;Amazon&amp;#39;s RedShift is a really neat product that solves a lot of our problems at work.  However, its SQL dialect has some limitations when compared to Hive or PostgresSQL. I hit a limit when I needed table-generating functions but found a work-around.&lt;/p&gt;

&lt;p&gt;Some of the data we store in RedShift contains JSON arrays. However, when running analytical queries, there is no out-of-the box way to join on &amp;quot;nested data&amp;quot; inside of arrays so up until now this data was very hard to use.&lt;/p&gt;

&lt;p&gt;In October, RedShift added new &lt;a href=&quot;http://docs.aws.amazon.com/redshift/latest/dg/json-functions.html&quot;&gt;functions to work with JSON&lt;/a&gt;&lt;a id=&quot;fn-1&quot; title=&quot;How I used to parse JSON in plain SQL before the addition of JSON functions would be another blog post.&quot;&gt;&lt;sup&gt;&amp;thinsp;1&amp;thinsp;&lt;/sup&gt;&lt;/a&gt; but the support is missing something like &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-BuiltinTableGeneratingFunctions%28UDTF%29&quot;&gt;Hive&amp;#39;s explode()&lt;/a&gt; or &lt;a href=&quot;http://www.postgresql.org/docs/9.2/static/functions-array.html#ARRAY-FUNCTIONS-TABLE&quot;&gt;Postgres&amp;#39; unnest()&lt;/a&gt; functions to expand an array from one column into one row for each element.&lt;/p&gt;

&lt;p&gt;As a work-around, I came up with a simple hack: Joining the JSON array with a predefined sequence of integers and then extracting the element at each index into a new relation through that join.&lt;/p&gt;

&lt;!--

-- Related blog post to this Gist:
-- https://torsten.io/stdout/expanding-json-arrays-to-rows

-- Run these commands on a interactive RedShift session:

CREATE TEMP TABLE clusters AS (
    SELECT 1 AS id, '[1, 2]' AS node_sizes UNION ALL
    SELECT 2 AS id, '[5, 1, 3]' AS node_sizes UNION ALL
    SELECT 3 AS id, '[2]' AS node_sizes
);


-- This is just a temp table to leave no trace after the session
-- In practice I am using a `CREATE VIEW` and more rows.

CREATE TEMP TABLE seq_0_to_100 AS (
    SELECT 0 AS i UNION ALL
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4 UNION ALL
    SELECT 5
    -- I am stopping here, you could easily generate this as a VIEW with 100 real rows...
);


-- To see the intermediate relation:

SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(node_sizes, seq.i) AS size
FROM clusters, seq_0_to_100 AS seq
WHERE seq.i &lt; JSON_ARRAY_LENGTH(node_sizes)
ORDER BY 1, 2;


-- To fetch the maximum size:

WITH exploded_array AS (
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(node_sizes, seq.i) AS size
    FROM clusters, seq_0_to_100 AS seq
    WHERE seq.i &lt; JSON_ARRAY_LENGTH(node_sizes)
  )
SELECT max(size)
FROM exploded_array;

--&gt;

&lt;p&gt;If you want to follow along the queries and play with the data, I &lt;a href=&quot;https://gist.github.com/torsten/7935120&quot;&gt;created a Gist&lt;/a&gt; which has all the queries to create the dummy tables and that fills them in with test&amp;nbsp;data.&lt;/p&gt;

&lt;p&gt;In this example, I am assuming a table &lt;code&gt;clusters&lt;/code&gt; where each row represents a cluster of &amp;quot;things&amp;quot; and each cluster consists of many nodes modeled as a JSON array. Each node then has its size stored in this array – you could ask &lt;em&gt;&amp;quot;What is the maximum node size over all clusters?&amp;quot;&lt;/em&gt;.&lt;/p&gt;

&lt;table&gt;&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;node_sizes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&amp;#39;[1, 2]&amp;#39;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&amp;#39;[5, 1, 3]&amp;#39;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&amp;#39;[2]&amp;#39;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;Assuming the above data in the table &lt;code&gt;clusters&lt;/code&gt;, you can use the following SQL query in RedShift to extract the maximum node size from all arrays:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exploded_array&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON_EXTRACT_ARRAY_ELEMENT_TEXT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_sizes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;size&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clusters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq_0_to_100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON_ARRAY_LENGTH&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_sizes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exploded_array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The example query above uses a &lt;a href=&quot;http://www.postgresql.org/docs/9.1/static/queries-with.html&quot;&gt;Common Table Expression&lt;/a&gt; to create a intermediate relation &lt;code&gt;exploded_array&lt;/code&gt; which looks like this:&lt;/p&gt;

&lt;table&gt;&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;As you can see, each array was expanded into many rows, but the &lt;code&gt;id&lt;/code&gt; is still the same for each element.&lt;/p&gt;

&lt;p&gt;So where does the magical &lt;code&gt;seq_0_to_100&lt;/code&gt; in the above queries come from?&lt;/p&gt;

&lt;p&gt;Since RedShift is currently missing any kind of sequence-generating functions, I had to emulate this as well. For that I created a view like this:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VIEW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq_0_to_100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- You get the idea...&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;99&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One constraint of the presented technique is that the maximum length of the longest array has to be known upfront to generate the sequence of integers.  This constraint did not provide any obstacles for me in practice yet since this size can just be queried in most cases:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;JSON_ARRAY_LENGTH&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_sizes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clusters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Of course, this example is not all you can do. Once the data is in the shape of &lt;code&gt;exploded_array&lt;/code&gt; you can work with the resulting intermediate relation in any other way and join it with all the things in your data warehouse.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;Thanks to &lt;a href=&quot;https://twitter.com/twojing&quot;&gt;Jenny&lt;/a&gt; and &lt;a href=&quot;https://twitter.com/i0rek&quot;&gt;Hans&lt;/a&gt; for reading drafts of this post!&lt;/small&gt;&lt;/p&gt;
</content>
        </entry>
	
	    <entry>
            <id>https://torsten.io/stdout/patching-pgbouncer</id>
            <link type="text/html" href="https://torsten.io/stdout/patching-pgbouncer" rel="alternate"/>
            <updated>2013-09-17T00:00:00-04:00</updated>
			<title>Patching PgBouncer to Drop Slow Queries</title>
            <content type="html">&lt;p&gt;The other day my friend and colleague Hans was troubleshooting problems with one of our production PostgreSQL databases at &lt;a href=&quot;http://6wunderkinder.com/&quot;&gt;6Wunderkinder&lt;/a&gt;.  The master was under high load and slowed down the whole system.  So we needed to figure out why this happened and fix the problem.&lt;/p&gt;

&lt;p&gt;Our setup includes one master PostgreSQL database and two slaves for reading.  These database are connected to a &lt;a href=&quot;http://wiki.postgresql.org/wiki/PgBouncer&quot;&gt;PgBouncer&lt;/a&gt; which pools connections and is accessed by our app servers.&lt;/p&gt;

&lt;p&gt;The problem was that something was running a bad query on the master:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tasks&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* complex condition */&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query is very problematic for the master because it is not usually doing these kind of reads with very complex &lt;code&gt;WHERE&lt;/code&gt; clauses and it caused high I/O wait times which slowed down the master noticeably.&lt;/p&gt;

&lt;p&gt;After searching the whole codebase and logs, we could not find any plausible line where this query could come from. So eventually we had to stop and yield to ActiveRecord’s ORM methods which (un)fortunately hide most raw SQL queries.&lt;/p&gt;

&lt;p&gt;We tried getting more informations from PostgreSQL by running:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;procpid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current_query&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_activity&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current_query&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;SELECT COUNT(*) FROM tasks%&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above query can help identifying the machine which is running a matching query. Since we have a PgBouncer in our setup, &lt;code&gt;client_addr&lt;/code&gt; always points to the PgBouncer.&lt;/p&gt;

&lt;p&gt;So what’s the solution? Of course: Patching PgBouncer to kill all bad queries. This would trigger exceptions in the Ruby code and would then make the bad code visible in the logs.  Causing a few 500s would also not be a problem because all our clients have offline support.&lt;/p&gt;

&lt;p&gt;So we put on our C-heads, fired up gdb, and dug a couple minutes through the PgBouncer source.  Eventually we found a neat spot: &lt;code&gt;safe_send()&lt;/code&gt;.  This function was called for every packet being forwarded from the client to the PostgreSQL server.&lt;/p&gt;

&lt;p&gt;While this is not a very clean solution (because this is a place in the generic network library), we rewrote all task count queries to be invalid just to test our hack:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;safe_send&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;flags&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;memcmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT COUNT(*) FROM tasks&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;memcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT * FROM 1337;--&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And woohoo, this actually worked smooth as butter on our development machine! Testing it on production was also without much harm since PgBouncer has a very useful &lt;a href=&quot;http://pgbouncer.projects.pgfoundry.org/doc/usage.html&quot;&gt;-R&amp;nbsp;flag&lt;/a&gt; to perform an online restart.&lt;/p&gt;

&lt;p&gt;So after some cleanup we ended up with a small patch for &lt;a href=&quot;https://github.com/i0rek/pgbouncer-dev/commit/af2f47035eff6ded27fad61b6db0f0c5eeeb7ca5&quot;&gt;PgBouncer&lt;/a&gt; and &lt;a href=&quot;https://github.com/i0rek/libusual/commit/e911be84a1539dc2aa6212cf2e04554bd42e9391&quot;&gt;LibUsual&lt;/a&gt;. The snippet above is the heart of the patch, but we added more: Logging the killed queries and a config option, which enables turning this feature on and off without re-compiling and without even restarting PgBouncer.&lt;/p&gt;

&lt;p&gt;If you ever end up with untraceable slow queries in PostgreSQL, apply this patch without any guarantees and at your own risk.  And then grep the logs for &lt;code&gt;1337&lt;/code&gt;!&lt;/p&gt;
</content>
        </entry>
	
</feed>
