Jeremy Thomerson

Debugging MySQL Slow Queries With Many Joins

Jeremy Thomerson — Fri, 28 Jun 2013 00:42:14 +0000

This week I encountered an issue that I hadn’t seen in a while. The ORM in a CMS project that I work on automatically joins to many subclass tables, causing normal queries to load an object to be joined to a dozen tables or so. Then when you combine that to join to another of the same type of object, you can easily be joining to twenty or more tables.

We noticed a query in the slow query log that was running a fairly simple query, with straight-forward criteria in the where clause that was well-indexed with good cardinality. All of the joins were done on primary/foreign key combinations that were properly indexed. But this simple query was consistently taking over 1.5 seconds to run.

So, of course, I ran the query through "DESCRIBE " in MySQL and saw that if I changed the order that the tables were joined, MySQL altered the query plan slightly. The other option that resulted in the more efficient query plan was providing optimizer (index) hints, which also resulted in a different order of joining the tables. I even considered whether or not this might be one of those strange situations where doing a straight join might be needed. But I didn’t want to rely on any of these hacks as the way to “fix” this problem, for two reasons:

they are mostly non-deterministic (re-ordering the table joins isn’t guaranteed to continue providing similar results in production or even over time as the dataset changes and the optimizer adapts)
we don’t really have control over the order of the joins since these queries are built by an ORM

Then it dawned on me: even when I was running "DESCRIBE " in MySQL, it was taking over 1.5 seconds, but when I ran "DESCRIBE " it was only taking 0.04 seconds. This meant that it wasn’t actually running the query that took so long; it was the time MySQL took to figure out the query plan! The fix? The MySQL optimizer_search_depth parameter (more info here)

Essentially, this setting tells MySQL how hard it should try to find the absolutely most efficient query plan for a given query. The more joins you have, the harder MySQL tries to find, and the more possibilities that it has to filter through. And, to top it all off, the default is a horrid 62! By changing this value to zero, you can tell MySQL “you decide how hard to try”, which seems to mean: min($numberOfTables, 7) according to one source. For us, changing this value to zero resulted in the query running in 0.04 seconds or less. Essentially, MySQL will try a few query plans (instead of a ton) and then pick the best of those, and since all your queries should be using indexes anyway (your queries all are well indexed, right?) even if it’s not the absolute best query plan, it’s still much better than actually searching through every possible query plan!

For more info, see Peter’s great blog on the MySQL Performance Blog: http://www.mysqlperformanceblog.com/2012/04/20/joining-many-tables-in-mysql-optimizer_search_depth/

Always say “Thank You”

Jeremy Thomerson — Wed, 12 Jun 2013 02:03:36 +0000

I’m a big fan of O’Reilly tech books, which is partly why I was just perusing the site for Fluent Conf, which ended a little over a week ago (even while the rest of the nerdom is enthralled with WWDC). I came across this great eight minute video of a presentation by Nicholas Zakas. It’s worth your time to watch.

A ‘Thank You’ Can Change Your Life (click to watch video after going to the overview page)

URLEncoder Function for MySQL

Jeremy Thomerson — Thu, 30 May 2013 16:55:18 +0000

Today I needed to convert some URL segments that are stored in a MySQL database from raw UTF-8 strings (many of which contain multi-byte characters in several hundred languages) into URL-encoded strings. This is for some work I’m doing using the latest version of SilverStripe (see Making SiteTree Fields Longer in SilverStripe 3.X for more info). Since I could not find any good examples of MySQL functions already built to do this, I ended up writing my own. Code and explanations posted below the fold…

Note first of all that I did find this example on DZone. However, it was lacking in a few ways:

The main thing: it does not handle multi-byte characters
It encodes characters that it doesn’t need to (i.e. hyphen, period, underscore, and tilde).
It doesn’t encode spaces correctly (according to RFC they should be %20 and not the plus symbol like the old-school days)

Because of this, I needed to write my own. I tried starting with the one from DZone as a base, but ended up basically rewriting the entire thing. This should (hopefully) be compliant with RFC 3986 that defines percent-encoding of URL segments. You can also see the Wikipedia Percent Encoding article for a little bit easier-to-read definition. Here is the entire piece of code.

Here are some interesting things to take note of regarding this implementation:

I tested it against 127,465 URL segments I have in my database in several hundred languages to see if it returned the exact same results as PHP’s rawurlencode. It did.
It’s probably worth mentioning that this obviously should not be used on entire URLs since it will encode reserved characters like slashes, etc. It can only be used on, for instance, individual segments (between slash separators) or individual query string values (not the entire query string or key/value pairs).
There’s one obvious flaw that’s worth mentioning so I don’t get flamed for it: If you really do take in a string that is 4096 characters long and it contains even a single character that has to be encoded, your return value will end up botched. I suppose it will end up with a MySQL truncation warning, but I haven’t tested this. Since I’m encoding individual segments of a URL, none of them should ever be anywhere near that long. If your URLs are that long you are going to definitely have other issues unrelated to this, so I don’t feel this is a major flaw.

And now to the code itself:

DELIMITER ;

DROP FUNCTION IF EXISTS urlencode;

DELIMITER |

CREATE FUNCTION URLENCODE(str VARCHAR(4096) CHARSET utf8) RETURNS VARCHAR(4096) CHARSET utf8
DETERMINISTIC
CONTAINS SQL
BEGIN
   -- the individual character we are converting in our loop
   -- NOTE: must be VARCHAR even though it won't vary in length
   -- CHAR(1), when used with SUBSTRING, made spaces '' instead of ' '
   DECLARE sub VARCHAR(1) CHARSET utf8;
   -- the ordinal value of the character (i.e. ñ becomes 50097)
   DECLARE val BIGINT DEFAULT 0;
   -- the substring index we use in our loop (one-based)
   DECLARE ind INT DEFAULT 1;
   -- the integer value of the individual octet of a character being encoded
   -- (which is potentially multi-byte and must be encoded one byte at a time)
   DECLARE oct INT DEFAULT 0;
   -- the encoded return string that we build up during execution
   DECLARE ret VARCHAR(4096) DEFAULT '';
   -- our loop index for looping through each octet while encoding
   DECLARE octind INT DEFAULT 0;

   IF ISNULL(str) THEN
      RETURN NULL;
   ELSE
      SET ret = '';
      -- loop through the input string one character at a time - regardless
      -- of how many bytes a character consists of
      WHILE ind <= CHAR_LENGTH(str) DO
         SET sub = MID(str, ind, 1);
         SET val = ORD(sub);
         -- these values are ones that should not be converted
         -- see http://tools.ietf.org/html/rfc3986
         IF NOT (val BETWEEN 48 AND 57 OR     -- 48-57  = 0-9
                 val BETWEEN 65 AND 90 OR     -- 65-90  = A-Z
                 val BETWEEN 97 AND 122 OR    -- 97-122 = a-z
                 -- 45 = hyphen, 46 = period, 95 = underscore, 126 = tilde
                 val IN (45, 46, 95, 126)) THEN
            -- This is not an "unreserved" char and must be encoded:
            -- loop through each octet of the potentially multi-octet character
            -- and convert each into its hexadecimal value
            -- we start with the high octect because that is the order that ORD
            -- returns them in - they need to be encoded with the most significant
            -- byte first
            SET octind = OCTET_LENGTH(sub);
            WHILE octind > 0 DO
               -- get the actual value of this octet by shifting it to the right
               -- so that it is at the lowest byte position - in other words, make
               -- the octet/byte we are working on the entire number (or in even
               -- other words, oct will no be between zero and 255 inclusive)
               SET oct = (val >> (8 * (octind - 1)));
               -- we append this to our return string with a percent sign, and then
               -- a left-zero-padded (to two characters) string of the hexadecimal
               -- value of this octet)
               SET ret = CONCAT(ret, '%', LPAD(HEX(oct), 2, 0));
               -- now we need to reset val to essentially zero out the octet that we
               -- just encoded so that our number decreases and we are only left with
               -- the lower octets as part of our integer
               SET val = (val & (POWER(256, (octind - 1)) - 1));
               SET octind = (octind - 1);
            END WHILE;
         ELSE
            -- this character was not one that needed to be encoded and can simply be
            -- added to our return string as-is
            SET ret = CONCAT(ret, sub);
         END IF;
         SET ind = (ind + 1);
      END WHILE;
   END IF;
   RETURN ret;
END;
 
|

DELIMITER ;

The heart of this is the MySQL ORD function which converts the multi-byte character into an integer representation. Then I just do some bitwise operations to get each respective byte as an integer, convert that to hexadecimal, and then concatenate the string. Note that while UTF-8 can have four-byte characters (or, for the nit-picky: pre-2003, it could have six-byte characters), in this case we will only ever encounter three-byte characters since we are not using MySQL’s utf8mb4 encoding.

For you math or computer science majors out there – I’m sure you’ll know more efficient or elegant ways of doing the actual encoding – especially the bitwise operators for handling each byte of a multi-byte character. Anyway, if anyone is able to use this and it helps you, or if you notice an improvement, please leave a comment or email me. I’d be happy to get your feedback.

Making SiteTree Fields Longer in SilverStripe 3.X

Jeremy Thomerson — Thu, 30 May 2013 16:18:04 +0000

Aside: In my ongoing internal battle to try to write regularly on my blog, I have decided to try blogging about little things that I do in my day-to-day programming (and other tech endeavors) that may be helpful to others – especially things that seem like they are searched for a lot, but lack adequate answers. Here’s my first attempt at this new style of blogging:

I have been using SilverStripe 2.4.x for about eighteen months now on a couple of projects. As part of what the client wanted, I needed to make the SiteTree MenuTitle field longer. If I remember correctly I first tried just redefining this field in the $db array of my Page class since that extends SiteTree, and that didn’t work. So, I resorted to a bit of a hack in the _config.php file to override the static variable value:

SiteTree::$db['MenuTitle'] = 'VARCHAR(255)';

In SilverStripe 3.0 or 3.1 (I can’t remember off the top of my head), all of those static class variables that SilverStripe uses to define configuration of a class (like $db and others) have become private. So, the hack I previously had didn’t work. However, SilverStripe now has a very nice configuration system that is used all over the place. Of note for this post is that all those static variables are basically read into the config system and then the values in configuration are what’s used instead of accessing those static variables directly. This means that in our _config.php we can now do the following hack/workaround to make a field longer:

$fields = Config::inst()->get('SiteTree', 'db', Config::UNINHERITED);
$fields['MenuTitle'] = 'VARCHAR(255)';
Config::inst()->update('SiteTree', 'db', $fields);

This is also notable because in the 3.x line of SilverStripe they have started storing URL segments (the SiteTree.URLSegment field) in URL-encoded form. In 2.4.x they tried stripping non-ASCII characters out of the URLs, so it didn’t really support multi-byte characters, etc, out of the box. I had overridden it to do so where I was using it since we needed to support multiple languages. But now since it is built into SilverStripe, we can get this out of the box if we simply URL encode those UTF8 strings that we have stored in the SiteTree.URLSegment column. However, that means the data could easily be truncated. For instance, SilverStripe defines the URLSegment column as VARCHAR(255). That’s 255 characters – not bytes. So, in a worst-case scenario – where all the characters were three-byte characters, you would end up with 255 characters * 3 bytes each * 3 characters (because each byte will be %HH where HH is a hexdecimal representation of that byte), which means your column would now need to be VARCHAR(2295)! Although you are unlikely to really want a single URL segment that long, you may need to use the tip above to make your URLSegment field longer. It would simply look like this:

// in _config.php
$fields = Config::inst()->get('SiteTree', 'db', Config::UNINHERITED);
$fields['URLSegment'] = 'VARCHAR(1024)';
Config::inst()->update('SiteTree', 'db', $fields);

There may be a much better way of handling this. You might also be able to override these values in a YAML file, but I haven’t tried it. If you find a nicer (less hack-ish) way of doing this, please comment below and let us know!

Blog Moved

Jeremy Thomerson — Fri, 21 Dec 2012 04:11:49 +0000

I finally moved my blog off of my old server to my new one. Most users should not see any change (unless you notice I chopped www and /blog/ off the URL). Hopefully now I’ll start writing some more posts!

Catching all feedback messages that aren’t rendered by other feedback panels

Jeremy Thomerson — Tue, 04 Jan 2011 16:09:51 +0000

There are many times in writing our Wicket applications that we want to render feedback messages close to the component that registered the message – especially in forms. But, we also typically want to display all the other messages in one “catch-all” feedback panel near the top of the page. Sometimes this can be difficult to do, especially if your form component feedback panels are added by borders, etc. Here is one implementation that will allow you to have a single “catch-all” feedback panel on any page.

Every feedback panel can take a feedback message filter, so let’s start by showing a couple of feedback panels added to individual form components:

// create the form above
form.add(new FeedbackPanel("feedback", new ComponentFeedbackMessageFilter(form)));

final TextField name = new TextField("name", new PropertyModel(productModel, "name"));
name.setRequired(true);
form.add(new FeedbackPanel("nameFeedback", new ComponentFeedbackMessageFilter(name)));
form.add(name);

final TextField price = new TextField("price", new PropertyModel(productModel, "price"));
price.setRequired(true);
price.add(new MinimumValidator(0d));
form.add(new FeedbackPanel("priceFeedback", new ComponentFeedbackMessageFilter(price)));
form.add(price);

From this, you can see that we have a form, a name field, and a price field. Each has their own FeedbackPanel that shows the messages on just that component. So, the validators for the price field will register their messages on the price field, and the FeedbackPanel next to it will display those messages. If you call error(String) or info(String) on the form itself (for example, in onSubmit), those messages will show up in the form’s feedback panel. But, now how do we show all the other feedback messages that might otherwise be left unrendered? For instance, what if someone registers a feedback message directly on the session?

It’s not very difficult. We start by creating a generic feedback message filter that does the opposite of what a list of other filters does:

public class AllExceptFeedbackFilter implements IFeedbackMessageFilter {
	private static final long serialVersionUID = 1L;

	private IFeedbackMessageFilter[] filters = null;
	
	public AllExceptFeedbackFilter() {
		this(null);
	}

	public AllExceptFeedbackFilter(IFeedbackMessageFilter[] filters) {
		this.filters = filters;
	}

	@Override
	public boolean accept(FeedbackMessage message) {
		IFeedbackMessageFilter[] localFilters = getFilters();
		for (IFeedbackMessageFilter filter : localFilters) {
			if (filter.accept(message)) {
				return false;
			}
		}
		return true;
	}
	
	protected IFeedbackMessageFilter[] getFilters() {
		return filters;
	}

}

You can see that if any of the filters returned by getFilters accepts a message, our feedback filter will not accept it. It only accepts ones that are not accepted by any other filter. Now, if we use this in a feedback panel that auto-discovers all other feedback panels, we can combine that to have a “catch-all” feedback panel. Here’s how:

final FeedbackPanel pageFeedback = new FeedbackPanel("pageFeedback");

pageFeedback.setFilter(new AllExceptFeedbackFilter() {
	private static final long serialVersionUID = 1L;

	@Override
	protected IFeedbackMessageFilter[] getFilters() {
		final List filters = new ArrayList();
		getPage().visitChildren(FeedbackPanel.class, new IVisitor() {
			@Override
			public Object component(FeedbackPanel component) {
				if (pageFeedback.equals(component)) {
					return CONTINUE_TRAVERSAL_BUT_DONT_GO_DEEPER;
				}
				filters.add(component.getFilter());
				return CONTINUE_TRAVERSAL;
			}
		});
		return filters.toArray(new IFeedbackMessageFilter[filters.size()]);
	}
});

add(pageFeedback);

As you can see, in our getFilters method, we traverse the component hierarchy and gather all of the filters for all other feedback panels on the page except ourselves. This way, our AllExceptFeedbackFilter will accept any message that isn’t accepted by some other filter on the page. You could turn this into a concrete class (rather than an anonymous inner class) and use it in all of your applications. One thing worth noting is that you can only have a single one of these on the page or else it won’t work – only one of them would show the messages.

Neat, huh?

More great reasons to go to ApacheCon US 2009

Jeremy Thomerson — Wed, 14 Oct 2009 03:09:49 +0000

ApacheCon US 2009 is fast approaching. And for all you Wicket lovers out there, or anybody interested in getting started with Wicket, you know that I will be presenting a one day training as well as an Introduction to Wicket session. So if you haven’t done so already, go sign up!

More info on the Wicket training: http://www.jeremythomerson.com/blog/2009/07/wicket-training-at-apachecon-us-2009/

CLICK HERE TO REGISTER

But what about more great reasons to register? I’ve put together a list of the sessions that you might want to attend if you are interested in Wicket. Of course, there are many great sessions, and you may (like me) have a hard time choosing. You may be interested in the entire track on Lucene / Hadoop and the family. Or in the business track. But here are some that you may particularly like:

Wednesday – 11:00am – Tomcat Community Overview
Wednesday – 1:30pm – An Introduction to Apache Velocity 1.6
Wednesday – 2:30pm – Introduction to Wicket
Wednesday – 4:00pm – Securing your Tomcat installation
Wednesday – 5:00pm – mod_jk / mod_proxy and others
Thursday – 9:00am – Content Driven Portals with Jetspeed and Jackrabbit

Thursday – 10:00am – It’s a toss-up between:
Thursday – 2:00pm – Scalable Internet Architectures
Thursday – 4:30pm – Recent Developments in SSL and Browsers

Friday – 9:00am – Welcome to the Future! (httpd)
Friday – 10:00am – Deciphering mod_ssl: Using SSL with the Apache HTTP Server
Friday – 11:15am – Selling Open Source E-commerce and ERP
Friday – 2:00pm – Building Intelligent Search Applications with the Lucene Ecosystem
Friday – 3:00pm – Realtime Search

For those who are more business-focused, here are a few alternative sessions that you may be interested in:

Wednesday – 4:00pm – Open Source Business for Hackers
Wednesday – 5:00pm – Apache License as a Business Model – Challenges and Opportunities
Thursday – 9:00am – Making Sense of Open Source Licenses

So, what are you waiting for?
CLICK HERE TO REGISTER

And don’t forget to register for your Wicket training!

Wicket Training at ApacheCon US 2009

Jeremy Thomerson — Tue, 07 Jul 2009 14:38:50 +0000

I am very happy to announce that there will be a one day training course at the 2009 ApacheCon conference in Oakland, CA (USA). Before I tell you more about it, consider the following “top ten” list:

Top Ten Reasons You Should Attend ApacheCon US 2009:
10: Hacking is encouraged at the Apache Hackathon two day event.
9: Free beer! http://wiki.apache.org/apachecon/ApacheConUs2009Program
8: Meet members of your favorite projects (i.e. me last year getting Martijn to sign my copy of Wicket in Action: see Martijn signing my book)
7: Free two day BarCamp
6: Free meetups three nights of the week
5: It’s always a good time to visit California: http://oaklandcvb.com/
4: Support the tenth anniversary of the Apache Software Foundation and the many other great projects that will be there.
3: Did I mention FREE BEER?
2: Two attendees in the class will receive FREE copies of Wicket in Action
1: WICKET TRAINING! (more info)

More details will be coming soon, but if you are looking to get your feet wet with Wicket, you should certainly start making plans now to attend the 2009 US ApacheCon, and the Wicket training class that will be held. Those who register early get discounts, too!

The class will consist of fast-moving explanations of core design principles, Wicket components, and “The Wicket Way”, and each section will be followed by a coding practice where you can put into use what you just learned. We will focus on laying a foundation – how to use Wicket, create pages, organize your application, and create a Wicket application.

We will cover the following:
- The fundamentals of Wicket
- Handling data / working with objects and models
- Standard components provided by framework
- Containers / Application / Session / Page
- Effective code reuse strategies

ApacheCon site: http://www.us.apachecon.com

November 2-6, 2009 in Oakland, CA. Classes will be held on Monday and Tuesday. Wednesday through Friday will be for the conference sessions. The Wicket class will be held on Tuesday.

Follow ApacheCon on Twitter: http://twitter.com/apachecon

I’m now a Wicket core developer

Jeremy Thomerson — Sat, 07 Mar 2009 04:51:58 +0000

My intention is not to blow my own horn, but I was so excited to have been asked to join the Wicket development team (link) that I knew it was time to dust off the old blog and start trying to write some articles again. And no more than an hour after it was announced, I was asked about when 1.5 would be released! (a well-meaning joke). I have a lot of respect for all of the Wicket developers who have brought the great framework this far, and I hope that I will do well in assisting to carry the torch further.

Thank you to everyone who has contributed countless hours to making Wicket a great product! Now it’s time to roll up my sleeves and get to work!

Wicket Stuff Reorganization

Jeremy Thomerson — Mon, 01 Dec 2008 20:05:39 +0000

So last week and this weekend I was swamped spending pretty much every night trying to get the Wicket Stuff project organized. When I started, the trunk of Wicket Stuff had over 85 folders (subprojects) in it. It was a mess. Tons of these have been abandoned over time, with no work done since the 1.3 release of Wicket was cut and trunk changed to 1.4 development. The biggest problem I wanted to address (and the community overwhelmingly agreed) was that there was no standard release pattern for nearly any WicketStuff project.

What we decided

It was decided to create a “core” project for WicketStuff where other projects would reside under it (using Maven modules). We would get this core building and releasing snapshots in the wicketstuff.org maven repo so that if you were developing against Wicket 1.4-SNAPSHOT, you could also do the same for the WicketStuff projects. Then, whenever a numbered release of Wicket came out (i.e. 1.4-rc2 soon), we would cut a release with the same number for WicketStuff. This should make it much easier to use the WicketStuff projects, many of which have never had any numbered releases (you always had to compile your own to use).

What was accomplished

Here’s a quick summary of what was accomplished:

21 projects were moved into the core (including “examples” projects) (UPDATE: more are being migrated – this number has grown)
32 folders were removed from trunk into the attic
All that in over 73 commits!
This left us with around 30-something folders left in trunk – and hopefully most of those will move into the core project.

Here’s a status page that I’ll be updating as more progress is made:

http://wicketstuff.org/confluence/display/STUFFWIKI/WicketStuff+Reorg+-+Status+and+List+of+Changes