<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Conscious Incompetence</title>
 <link href="http://systay.github.com/atom.xml" rel="self"/>
 <link href="http://systay.github.com/"/>
 <updated>2021-03-25T13:18:10+00:00</updated>
 <id>http://systay.github.com/</id>
 <author>
   <name>Andres Taylor</name>
   <email>andres@taylor.se</email>
 </author>
 
 
 <entry>
   <title>Code generation in Vitess</title>
   <link href="http://systay.github.com/blog/2021/03/23/code-generation-in-vitess"/>
   <updated>2021-03-23T00:00:00+00:00</updated>
   <id>http://systay.github.com/blog/2021/03/23/code-generation-in-vitess</id>
   <content type="html">&lt;h1 id=&quot;code-generation-in-vitess&quot;&gt;Code generation in Vitess&lt;/h1&gt;

&lt;p&gt;Golang is a wonderful language. It’s simple, and most of the time not confusing or surprising.
This makes it easy to jump into library code and start reading and quickly understand what’s going on.
On the other hand, coming from other languages, there are a few features that would make our lives easier.&lt;/p&gt;

&lt;p&gt;We are building Vitess using mostly golang, and most of us are happy with this choice.
However, because of missing features in the language, we’ve had to build some tooling manually.&lt;/p&gt;

&lt;p&gt;Here follows a list of how we are using meta programming in Vitess.&lt;/p&gt;

&lt;h3 id=&quot;grpc-messages-and-endpoints&quot;&gt;GRPC messages and endpoints&lt;/h3&gt;

&lt;p&gt;Everyone uses code generation for protobuf, so that’s very interesting to write about. Moving on.&lt;/p&gt;

&lt;h3 id=&quot;sql-parser&quot;&gt;SQL Parser&lt;/h3&gt;

&lt;p&gt;We use goyacc to build our parser.
Goyacc reads input in the form of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sql.y&lt;/code&gt; file, and it outputs the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sql.go&lt;/code&gt; parser we use.
Writing this manually would not really have been an option. The code would probably become slow and very difficult to maintain.
To speed up the parser to ludicrous speed, we forked the goyacc code and adapted it to our needs. You can read more about this work &lt;a href=&quot;https://github.com/vitessio/vitess/pull/7669&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;memory-usage-for-plans&quot;&gt;Memory usage for plans&lt;/h3&gt;

&lt;p&gt;Query planning is a resource intensive task, and to make sure we don’t have to do that more than necessary, we cache plans.
Whenever you are caching, you need to be concerned about the size of your caches - you don’t want the cache to eat too much memory.
To do that, you need information about how much memory plan tree consume.
And this is one of the shortcoming of golang - it’s very difficult to do this.
So, again, meta generation came to the rescue.&lt;/p&gt;

&lt;p&gt;Go comes with excellent parser and tokeniser tools to allow you to read Go code as a stream of AST objects.
Unfortunately, we need more than syntax when looking at our plans to make sense of them - we also need dependencies and type information.
To get this, we use golang.org/x/tools/go/packages.&lt;/p&gt;

&lt;p&gt;What we do is to first find the plan struct, and from that, we find all types that are used by the fields of the plan.&lt;/p&gt;
&lt;div class=&quot;language-go highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Plan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Type&lt;/span&gt;         &lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StatementType&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;Original&lt;/span&gt;     &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Instructions&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Primitive&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;BindVarNeeds&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BindVarNeeds&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Warnings&lt;/span&gt;     &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;querypb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryWarning&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For every type that we encounter, we create a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CachedSize&lt;/code&gt; method that can calculate the memory size of an instance.
If the type happens to be an interface, we instead find all implementations and do the same exercise again.
This is done until we have a method for all types that might show up in a plan-tree. You can look at what the these functions look like &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/cached_size.go&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;ast-tooling&quot;&gt;AST tooling&lt;/h3&gt;

&lt;p&gt;After parsing, the tree structure that contains the original query has been turned into an abstract syntax tree, or an AST.
It contains the information we need from the query, with the uninteresting bits such as whitespaces removed. 
It’s a type safe version of the query the user sent us.&lt;/p&gt;

&lt;p&gt;Our AST is pretty large, and many developers work on it, adding new types and fields all the time.&lt;/p&gt;

&lt;p&gt;In order to plan queries, we needed a couple of utilities for our AST that Go does not provide for us.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;First of, we needed to be able to traverse the AST - a plain old visitor that could traverse the whole tree quickly. &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast_visit.go&quot;&gt;Check out the code&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We also need the ability to replace parts of the node. This is much like the visitor, but with a way to replace the node that is currently being visited. &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast_rewrite.go&quot;&gt;Check out the code&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We need a way of doing equality comparisons without using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reflect.DeepEqual&lt;/code&gt;.
Struct comparisons in Go work as expected, until you have references in your structs. Then all bets are off.
DeepEqual does what we need, but it does so slowly. Comparing our generated equality methods vs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reflect.DeepEqual&lt;/code&gt; gives:&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;name       old time/op  new time/op  delta
Equals-16   813µs ± 0%    11µs ± 1%  -98.64%  (p=0.000 n=9+9)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reflect.DeepEqual&lt;/code&gt; is &amp;gt;72 times slower than our generated comparator. &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast_equals.go&quot;&gt;Check out the code&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;We also need to be able to do a deep-clone of the AST. While exploring different alternative plans, we clone parts of the AST, so we can change it without changing the original. &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast_clone.go&quot;&gt;Check out the code&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Finally, our AST knows how to print itself. 
This is a little trickier than you might think, because we do precedence calculations for expressions to figure out where we need parenthesis. 
To make this easy to write, we use something that looks a lot like printf - this allows us developers to write nicely readable code.
Unfortunately, this is not very fast, since it basically means that we have to parse strings to be able to produce strings.
Again, using the go/packages library, we can read the astPrintf lines, and then output a faster form of the same.
The method that a developer would write would look something like:&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-golang highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ComparisonExpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrackedBuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astPrintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%l %s %r&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Left&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Operator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ToString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Right&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Escape&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astPrintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; escape %v&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Escape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After code generation, the output becomes:&lt;/p&gt;
&lt;div class=&quot;language-golang highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ComparisonExpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;formatFast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrackedBuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;printExpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Left&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteByte&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Operator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ToString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteByte&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;printExpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Right&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Escape&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot; escape &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;printExpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Escape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;how-to-make-it-work&quot;&gt;How to make it work&lt;/h3&gt;

&lt;p&gt;One learning we have had, is that it’s a good idea to hide the generated code behind an easy-to-use API.
That is the method that the rest of the code base will interact with, not directly with the generated code.
This gives us the chance to drastically change what the generated code looks like, but not have to change anything else in the code base.
We use this pattern for the &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/parser.go&quot;&gt;parser&lt;/a&gt;, for the &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/rewriter_api.go&quot;&gt;rewriter&lt;/a&gt;, and for the &lt;a href=&quot;https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast_funcs.go#L37&quot;&gt;visitor&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;/h3&gt;
&lt;p&gt;We use code generation for two main reasons. 
We are easily bored people, so writing lots of repetitive code would be no fun. 
This code would be difficult to write correctly, and annoying to review.
Using meta programming, we can avoid the repetitive code that is hard to get right and easy to mess up.&lt;/p&gt;

&lt;p&gt;The second reason is that it’s just easier to write fast code this way. 
We benchmark and profile the generated code pretty hard, and make sure to squeeze as much juice as possible.
Then we change the generator, and wham! 642 rewriter methods have been updated. 
This would not really have been possible if we had to change those methods manually.&lt;/p&gt;

&lt;p&gt;Honorable mention:
Most of this code is either written by, or heavily influenced by the latest developer luminary to join the PlanetScale and Vitess ranks - &lt;a href=&quot;http://github.com/vmg&quot;&gt;@vmg&lt;/a&gt;&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Life of a Vitess Query - Query Graph</title>
   <link href="http://systay.github.com/blog/2021/02/08/life-of-a-query-3"/>
   <updated>2021-02-08T00:00:00+00:00</updated>
   <id>http://systay.github.com/blog/2021/02/08/life-of-a-query-3</id>
   <content type="html">&lt;h1 id=&quot;life-of-a-vitess-query---query-graph&quot;&gt;Life of a Vitess Query - Query Graph&lt;/h1&gt;

&lt;p&gt;(This is a post in a series. You should start here: &lt;a href=&quot;/blog/2021/02/03/life-of-a-query-1&quot;&gt;Parsing and Rewriting&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;The process of plan building is all about evaluating in which order to execute joins find the cheapest join combination we can use.&lt;/p&gt;

&lt;p&gt;So, we need a better data structure than a tree to represent this, and the title of this post has probably spoiled which one that is.&lt;/p&gt;

&lt;p&gt;A graph is perfect for this problem.&lt;br /&gt;
Tables are the nodes, and the join predicates are the relationships between them.&lt;/p&gt;

&lt;p&gt;As a struct, what I ended up using looks something like this:&lt;/p&gt;

&lt;div class=&quot;language-go highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryGraph&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;     &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryTable&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;predicates&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;semantics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TableSet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Expr&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// queryTable is a single FROM table, including all predicates particular to this table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryTable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tableID&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;semantics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TableSet&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;alias&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AliasedTableExpr&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TableName&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;predicates&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Expr&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Sub queries complicate things, but I’m keeping it simple in these posts and not handling them.&lt;/p&gt;

&lt;p&gt;I think of the query graph as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; of the query. When doing join ordering, this is actually the only information we need - we don’t need anything else from the AST&lt;/p&gt;

&lt;p&gt;Let’s look at those structs a little more. 
A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;queryTable&lt;/code&gt; struct contains information about a table, and all predicates that only depend on this particular single table.&lt;/p&gt;

&lt;p&gt;To find join predicates between two tables, we just do a bit OR between the two TableSet, and use that as the key to search for join predicates between the two tables.&lt;/p&gt;

&lt;p&gt;The common solution would probably have been an adjacency matrix, but using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map[semantics.TableSet][]sqlparser.Expr&lt;/code&gt; makes it possible to deal with N-way joins, something that would mean extra logic using a matrix.&lt;/p&gt;

&lt;p&gt;Just to recap - we start with a query as a string, which is parsed into an AST, that is distilled into a query graph.&lt;/p&gt;

&lt;p&gt;This is what we need to do join ordering.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Life of a Vitess Query - Semantic Analysis</title>
   <link href="http://systay.github.com/blog/2021/02/04/life-of-a-query-2"/>
   <updated>2021-02-04T00:00:00+00:00</updated>
   <id>http://systay.github.com/blog/2021/02/04/life-of-a-query-2</id>
   <content type="html">&lt;h1 id=&quot;life-of-a-vitess-query---semantic-analysis&quot;&gt;Life of a Vitess Query - Semantic Analysis&lt;/h1&gt;

&lt;p&gt;(This is a post in a series. You should start here: &lt;a href=&quot;/blog/2021/02/03/life-of-a-query-1&quot;&gt;Parsing and Rewriting&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;By now we have a well formed tree structure representing the query the user sent. Next step is called semantic analysis.&lt;/p&gt;

&lt;p&gt;I’ll use the following query to illustrate how it’s done.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;otherTable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As a tree:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;   SELECT (S1)
    ├── Exprs
    │   ├── tbl.col
    │   └── SELECT (S2)
    │       ├── Exprs
    │       │   └── COUNT
    │       │       └── tbl.col
    │       └── FROM
    │           └── otherTable as tbl
    └── FROM
        └── tbl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The main problem we are trying to solve - make it easy for the planner to know which table is meant when it encounters &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tbl.col&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We start by figuring out which scopes exist, and which tables exist in each scope.&lt;/p&gt;

&lt;p&gt;Given this information, we can look at column expressions and resolve which tables are being referenced.&lt;/p&gt;

&lt;p&gt;Let’s go over what this would look like for this query.&lt;/p&gt;

&lt;p&gt;We start traversing the tree at the root, &lt;strong&gt;S1&lt;/strong&gt;. Since this is a SELECT struct, we push a new scope on to the scope stack, and visit the FROM clause next.&lt;/p&gt;

&lt;p&gt;The scope stack, after visiting &lt;strong&gt;S1&lt;/strong&gt; and its FROM:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(tbl)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next we visit the SELECT expressions of &lt;strong&gt;S1&lt;/strong&gt;. When we visit the first expression, we take note of the current scope, and move on.
Now we know which scope this expression lives in and what tables are available to it.&lt;/p&gt;

&lt;p&gt;Next, we’ll visit the subquery that contains &lt;strong&gt;S2&lt;/strong&gt;. When encountering &lt;strong&gt;S2&lt;/strong&gt;, we push another scope on the stack and add the tables &lt;strong&gt;S2&lt;/strong&gt; to this new scope.&lt;/p&gt;

&lt;p&gt;The scope stack, after visiting &lt;strong&gt;S2&lt;/strong&gt; and its FROM:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(otherTable as tbl)
(tbl)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When visiting the expressions of &lt;strong&gt;S2&lt;/strong&gt;, we note the current scope for the expressions.&lt;/p&gt;

&lt;p&gt;There is nothing left of &lt;strong&gt;S2&lt;/strong&gt; to visit, so we exit the sub query. When coming back up from &lt;strong&gt;S2&lt;/strong&gt;, we pop the top-most scope, and we are ready to do the binding process.&lt;/p&gt;

&lt;p&gt;Binding is visiting all the column expressions, and figuring out, given the scoping information, which table the column belongs to. If the user has not provided a qualifier, we need to look up column information for all available tables and check if we can uniquely identify which one the user means. This binding information is stored the semantic table - the output of this process.&lt;/p&gt;

&lt;p&gt;To do this as fast as possible, we do this in a single tree traversal, not in to separate steps. This way, we don’t have to remember scope information per expression - we just use the current scope stack, looking down the stack until we find the table.&lt;/p&gt;

&lt;p&gt;Also, instead of using strings to reference the dependencies, we use bitmasks, where each table is a bit in an uint64 value.
At the end of the semantic processing, we have the dependency information we need in for all expressions in the query.&lt;/p&gt;

&lt;p&gt;The devil is in the details, so let me spell it out:&lt;/p&gt;

&lt;div class=&quot;language-go highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TableSet&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint64&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tableName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alias&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SemTable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Tables&lt;/span&gt;           &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;exprDependencies&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqlparser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Expr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TableSet&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The way to get the TableSet for a table is to find the offset in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tables&lt;/code&gt; for the table we are looking for, and then left shift 1 that many steps.&lt;/p&gt;

&lt;div class=&quot;language-go highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;st&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SemTable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TableSetFor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TableSet&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;range&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;st&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tables&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This information is used to create the query graph, the next stop in the journey to a Vitess execution plan.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Life of a Vitess Query - Parsing and rewriting</title>
   <link href="http://systay.github.com/blog/2021/02/03/life-of-a-query-1"/>
   <updated>2021-02-03T00:00:00+00:00</updated>
   <id>http://systay.github.com/blog/2021/02/03/life-of-a-query-1</id>
   <content type="html">&lt;h1 id=&quot;life-of-a-vitess-query---parsing-and-rewriting&quot;&gt;Life of a Vitess Query - Parsing and rewriting&lt;/h1&gt;

&lt;p&gt;SQL is a declarative language, which means you tell it what you want done, and the database engine figures out how to do it for you.
When I started with databases, this seemed an almost magical process to me, and I have been fascinated by it for decades now.&lt;/p&gt;

&lt;p&gt;As my 20% project, I have been working on a new query planner for Vitess, and I wanted to share what the design looks like and what the important characters in this story are.
The project is being tracked &lt;a href=&quot;https://github.com/vitessio/vitess/issues/7280&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before I write more about the details, let me tell you about one driving force behind the design.
During plan building, the process of creating and evaluating differ access patterns is the most costly.
This is just because of the search space that has to be explored - the number of possible plans grows exponentially with the number of tables involved in the query.
So a lot of the steps and complications that come before are done so that this expensive evaluation can be done as efficiently as possible.&lt;/p&gt;

&lt;p&gt;Now to the first step during query execution&lt;/p&gt;

&lt;h3 id=&quot;step-1---parsing&quot;&gt;Step 1 - Parsing&lt;/h3&gt;

&lt;p&gt;When a query comes to the vtgate executor, it will first parse the query.
The input is your query string, and the output is a struct called the AST - abstract syntax tree. 
Like the name implies, this is a tree shaped struct that represents the interesting parts of query.&lt;/p&gt;

&lt;p&gt;Example query:&lt;/p&gt;
&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bar&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'dud'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This query becomes a tree that, simplified, looks something like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  SELECT
    ├── Exprs
    │   └── +
    │       ├── t.col
    │       └── s.col
    ├── FROM
    │   └── JOIN
    │       ├── t
    │       ├── s
    │       └── ON
    │           └── =
    │               ├── t.id
    │               └── s.t_id
    └── WHERE
        └── AND
            ├── =
            │   ├── t.foo
            │   └── 42
            └── =
                ├── s.bar
                └── &quot;dud&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We do this because working with strings is slow and clunky, compared to using type safe datastructures like this.
Instead of string matching, we can check the fields and types in structs, in order to understand what the user is asking for.
It also removes lots of unnecessary details. 
Parentheses, for example, are only needed because the query comes in as a one-dimensional string.
In the tree, the grouping and precedence of expressions is clearly visible in the structure of the tree, and so parenthesis do not exist in the AST.
When we need to turn the AST back into a string, we can figure out where we need to inject parens.&lt;/p&gt;

&lt;p&gt;I’m not going to go into details about our parsing or AST. 
It’s fast, used by lots of other projects, and built using yacc.&lt;/p&gt;

&lt;h3 id=&quot;step-2---ast-rewriting&quot;&gt;Step 2 - AST rewriting&lt;/h3&gt;

&lt;p&gt;Before passing on the AST, we rewrite it a little. 
We do this for three reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;We want queries to share the same plan when possible, so we remove literals.&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT 42&lt;/code&gt;&lt;/p&gt;

    &lt;p&gt;is turned into&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT :_vtlit1&lt;/code&gt;&lt;/p&gt;

    &lt;p&gt;If we later encounter &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT 5&lt;/code&gt;, we can use the cached plan.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To minimize the work the planner needs to do, we normalize the AST, so the planner doesn’t have to understand two equivalent ways of expressing the same thing.&lt;/p&gt;

    &lt;p&gt;An example of this is that we make sure that columns are on the left-hand side of equality comparisons.&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;42 = t.col&lt;/code&gt;&lt;/p&gt;

    &lt;p&gt;is turned into&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t.col = 42&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;A very important part of what Vitess does is to create an illusion for the user.&lt;/p&gt;

    &lt;p&gt;It’s the illusion of a dedicated connection to a single database, when in reality, Vitess will do connection pooling for you, and spread out your query to sometimes hundreds of MySQL instances.
To create the illusion, we rewrite away variables, both system variables and user defined variables, and lots of function calls.
We also change queries against the information_schema to use the table and database names that are really used in MySQL.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the end of these two steps, we are still working with the input query that the user sent, but it’s been massaged into a form that is easier for the planner to work with.&lt;/p&gt;

&lt;p&gt;Next up is &lt;a href=&quot;/blog/2021/02/04/life-of-a-query-2&quot;&gt;semantic analysis&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Cypher - A view from a recovering SQL DBA</title>
   <link href="http://systay.github.com/blog/2011/11/06/cypher-a-view-from-a-recovering-sql-dba"/>
   <updated>2011-11-06T00:00:00+00:00</updated>
   <id>http://systay.github.com/blog/2011/11/06/cypher---a-view-from-a-recovering-sql-dba</id>
   <content type="html">&lt;h1 id=&quot;cypher---a-view-from-a-recovering-sql-dba&quot;&gt;Cypher - A view from a recovering SQL DBA&lt;/h1&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;An SQL query walks into a bar and sees two tables.&lt;/em&gt;&lt;br /&gt;
&lt;em&gt;He walks up to them and asks ‘Can I join you?’&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An SQL query walks into a NOSQL bar, and finds no tables.&lt;/em&gt;&lt;br /&gt;
&lt;em&gt;So he leaves.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;For many years, I worked with SQL databases. I got to know the relational model and various SQL implementations very well, but then the world changed with the advent of NOSQL, and I changed too when I became heavily involved with Neo4j.&lt;/p&gt;

&lt;p&gt;I understand that changing from the familiar SQL to the unfamiliar NOSQL query languages is hard: no schemas, JSON all over the place, and no joins? But I’ve made it through the learning curve and so can you. This guide is all about people like us - people who understand SQL. We can use that prior knowledge to quickly get going with Cypher and start exploring Neo4j.&lt;/p&gt;

&lt;h1 id=&quot;start&quot;&gt;START&lt;/h1&gt;

&lt;p&gt;SQL starts with the result you want - we SELECT what we want and then declare how to source it. In Cypher, the START clause is quite a different concept which specifies starting points in the graph from which the query will execute.&lt;/p&gt;

&lt;p&gt;From a SQL point of view, the identifiers in a START are like table names that point to a set of nodes or relationships. The set can be listed literally, come via parameters, or as I show in the following example, be defined by an index look-up.&lt;/p&gt;

&lt;p&gt;So in fact rather than being SELECT-like, the START clause is somewhere between the FROM and the WHERE clause in SQL.&lt;/p&gt;

&lt;h3 id=&quot;sql&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM  Person
WHERE firstName = &quot;Anakin&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START anakin=node:persons(firstName = &quot;Anakin&quot;)
RETURN anakin
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Cypher allows multiple start points. This should not be strange from a SQL perspective - every table in the FROM clause is another start point.&lt;/p&gt;

&lt;h1 id=&quot;match&quot;&gt;MATCH&lt;/h1&gt;

&lt;p&gt;Unlike SQL which operates on sets, Cypher predominantly works on subgraphs. The relational equivalent is the current set of tuples being evaluated during a SELECT query.&lt;/p&gt;

&lt;p&gt;The shape of the subgraph is specified in the MATCH clause. The MATCH clause is analogous to the JOIN in SQL. A normal a–&amp;gt;b relationship is an inner join between nodes a and b - both sides have to have at least one match, or nothing is returned.&lt;/p&gt;

&lt;p&gt;A simple example, where we find all nodes that are connected to node with id 101, through an incoming relationship.&lt;/p&gt;

&lt;h3 id=&quot;sql-1&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT bar.*
FROM foo 
JOIN bar ON foo.id = bar.foo_id
WHERE foo.id = 101
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-1&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START foo=node(101)
MATCH foo--&amp;gt;bar
RETURN bar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is no join table here, but if one is necessary writing the pattern relationship like so: -[foo_bar]-&amp;gt; will introduce (the equivalent of) a join table named foo_bar. In reality this is a named relationship in Cypher, so we’re saying “join foo to bar via foo_bar.”
To illustrate this, consider this image, comparing the SQL modell and Neo4j/Cypher.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://i.imgur.com/YkVFN.png&quot; alt=&quot;Join&quot; title=&quot;Join and patterns&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;sql-2&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT bar.*, foo_bar.*
FROM foo 
  JOIN foo_bar ON foo.id = foo_bar.foo_id 
  JOIN bar ON foo_bar.bar_id = bar.id
WHERE foo.id = 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-2&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START foo=node(1)
MATCH foo-[foo_bar]-&amp;gt;bar
RETURN bar, foo_bar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;An &lt;a href=&quot;http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html&quot;&gt;outer join&lt;/a&gt; is just as easy. Add a question mark -[?:KNOWS]-&amp;gt; and it’s an optional relationship between nodes - the outer join of Cypher.&lt;/p&gt;

&lt;p&gt;Whether it’s a left outer join, or a right outer join is defined by which side of the pattern has a starting point. This first example is a left outer join, because the bound node is on the left side:&lt;/p&gt;

&lt;h3 id=&quot;sql-3&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT bar.*
FROM foo 
LEFT JOIN bar ON foo.id = bar.foo_id
WHERE foo.id = 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-3&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START foo=node(1)
MATCH foo-[?]-&amp;gt;bar
RETURN bar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the right side is has the start point, it is a right outer join. And if both sides have starting points, it’s a full outer join, like this:&lt;/p&gt;

&lt;h3 id=&quot;sql-4&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT bar.*
FROM foo 
  FULL OUTER JOIN bar ON foo.id = bar.foo_id
WHERE foo.id = 1 and bar.id = 2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-4&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START foo=node(1), bar=node(2)
MATCH foo-[r?]-&amp;gt;bar
RETURN r
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Relationships in Neo4j are first class citizens - it’s like the SQL tables are pre-joined with each other. So, naturally, Cypher was designed to be able to handle highly connected data easily.&lt;/p&gt;

&lt;p&gt;One such domain is tree structures - anyone that has tried storing tree structures in SQL knows that you have to work hard to get around the limitations of the relational model. There are even books on the subject.&lt;/p&gt;

&lt;p&gt;To find all the groups and sub-groups that Anakin belongs to,  this query is enough in Cypher:&lt;/p&gt;

&lt;h3 id=&quot;cypher-5&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START user=node:person(name=&quot;Anakin&quot;)
MATCH group&amp;lt;-[:BELONGS_TO*]-user
RETURN group
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The * after the relationship type means that there can be multiple hops across BELONGS_TO relationships between group and user. Some SQL dialects have recursive abilities, that allow the expression of queries like this, but personally I’ve always had a hard time wrapping my head around those. Expressing something like this in SQL is hugely impractical if not practically impossible.&lt;/p&gt;

&lt;h1 id=&quot;where&quot;&gt;WHERE&lt;/h1&gt;

&lt;p&gt;This is the easiest thing to understand - it’s the same animal in both languages. It filters out result sets/subgraphs. Not all predicates have a equivalent in the other language, but the concept is the same.&lt;/p&gt;

&lt;h3 id=&quot;sql-5&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT person.*
FROM person
WHERE person.age &amp;gt;35 OR person.hair = &quot;blonde&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-6&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START person = node:persons(&quot;name:*&quot;)
WHERE person.age &amp;gt;35 OR person.hair = &quot;blonde&quot;
RETURN person
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;return&quot;&gt;RETURN&lt;/h1&gt;

&lt;p&gt;This is SQL’s SELECT. We just put it in the end because it felt better to have it there - you do a lot of matching and filtering, and finally, you return something.&lt;/p&gt;

&lt;p&gt;Aggregate queries work just like they do in SQL, apart from the fact that there is no explicit GROUP BY clause. Everything in the return clause that is not an aggregate function will be used as the grouping columns.&lt;/p&gt;

&lt;h3 id=&quot;sql-6&quot;&gt;SQL&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT person.name, count(*)
FROM Person
GROUP BY person.name
ORDER BY person.name
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cypher-7&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START person=node:persons(&quot;name:*&quot;)
RETURN person.name, count(*)
ORDER BY person.name
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Order by is the same in both languages - ORDER BY expression ASC/DESC. Nothing weird here.&lt;/p&gt;

&lt;h1 id=&quot;use-the-right-tool&quot;&gt;Use the right tool&lt;/h1&gt;

&lt;p&gt;No database is the silver bullet for data persistence and querying. That is what NOSQL means to us - look at your data and what you want to do with it, and then choose the appropriate tool for the job. Neo4j and Cypher are custom built for the challenges of heavily connected data. Compare the shortest path query &lt;a href=&quot;http://stackoverflow.com/questions/6873772/sql-postgres-shortest-path-in-graph-recursion/6900257#6900257&quot;&gt;here&lt;/a&gt; (all 43 lines of it) with what it looks like in Cypher:&lt;/p&gt;

&lt;h3 id=&quot;cypher-8&quot;&gt;Cypher&lt;/h3&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;START lucy=node(1000), kevin=node(759)
MATCH p = shortestPath( lucy-[*]-kevin )
RETURN p
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap up&lt;/h1&gt;

&lt;p&gt;The performance characteristics are radically different when you move from a relational data store to Neo4j. Things that a SQL developer might fear because the performance bug has bitten there before, might not at all be expensive in a graph database.&lt;/p&gt;

&lt;p&gt;Relational databases have a different underlying model than graph databases, and so the query languages for them naturally have different design goals. Cypher was designed to make querying of complex, heavily interconnected data as natural as possible. It should not only make the querying possible, but we aim to have a query language that helps you think about your data query.&lt;/p&gt;

&lt;p&gt;If you know SQL well, you will quickly be productive with Cypher.&lt;/p&gt;
</content>
 </entry>
 
 
</feed>