<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[OK I GIVE UP]]></title><description><![CDATA[<a href="https://twitter.com/morphotactics" style="text-decoration:none;" class="icon-twitter"></a>]]></description><link>http://okigiveup.net/</link><generator>Ghost 0.7</generator><lastBuildDate>Thu, 01 Jul 2021 09:08:36 GMT</lastBuildDate><atom:link href="http://okigiveup.net/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Golang JSON Gotchas That Drove Me Crazy But I Have Learned to Deal With]]></title><description><![CDATA[<p><strong>Assumed reader level</strong>: Intermediate <br>
<strong>Content level</strong>: Advanced beginner</p>

<p>JSON is JSON, it's everywhere, and if you're working with Go you're most probably doing tons of JSON <a href="https://golang.org/pkg/encoding/json/">marshalling and unmarshalling</a>. Having experience in languages that have nearly identical built-in syntax for JSON (Javascript and Python), I repeatedly ran into certain issues,</p>]]></description><link>http://okigiveup.net/golang-json-gotchas-that-drove-me-crazy-but-i-have-learned-to-deal-with/</link><guid isPermaLink="false">41ea9f62-4c32-4a80-91e5-59db98ba70ef</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Sun, 28 Feb 2021 19:34:19 GMT</pubDate><content:encoded><![CDATA[<p><strong>Assumed reader level</strong>: Intermediate <br>
<strong>Content level</strong>: Advanced beginner</p>

<p>JSON is JSON, it's everywhere, and if you're working with Go you're most probably doing tons of JSON <a href="https://golang.org/pkg/encoding/json/">marshalling and unmarshalling</a>. Having experience in languages that have nearly identical built-in syntax for JSON (Javascript and Python), I repeatedly ran into certain issues, having to do with Go's idiosyncracies and my deep-seated habits. Keep in mind that these points apply to other <a href="https://golang.org/pkg/encoding/">encodings in the Go standard library</a>, and generally to all packages that implement the same interfaces and patterns.</p>

<h3 id="onlypublicfieldsareunmarshalled">Only public fields are (un)marshalled</h3>

<p>This is the gotcha that annoyed me the most until I got it carved into my mind after spending countless minutes debugging it. Traditionally, JSON object keys start with lowercase letters, whereas Go uses capitalization to determine public vs private. When code accesses the fields of a struct within the same module, you will not get into trouble with private fields, as they are treated as accessible within the same module. JSON is a different module, however, and it will not be able to write to these private fields. For example, the following will work:</p>

<pre><code class="language-go">var data struct {  
    Key string
}
jsonData := []byte(`{"Key": "Value"}`)  
json.Unmarshal(jsonData, &amp;data)  
fmt.Printf("%v\n", data)  
</code></pre>

<p>This should print <code>{Value}</code>. But lest you forget that <code>Key</code> has to be capitalized, so that <code>encoding/json</code> can write to it; then it will not help you even to set the field tag, as follows:</p>

<pre><code class="language-go">var data struct {  
    key string `json:"the_key"`
}
jsonData := []byte(`{"the_key": "Value"}`)  
err := json.Unmarshal(jsonData, &amp;data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)  
</code></pre>

<p>This will, of course, print <code>{}</code>, and a <code>nil</code> error. What <em>does</em> catch this error, however, is <code>go vet</code>, which prints the following friendly error message:</p>

<pre><code>./jsong.go:10:3: struct field key has json tag but is not exported
</code></pre>

<p>You will not get this message, though, if you don't have JSON tags. Long story short: Use tags even when keys match, and use <code>go vet</code>.</p>

<h3 id="unmarshalingisnotforerrorchecking">Unmarshaling is not for error checking</h3>

<p>As <code>encoding/json</code> unmarshals a JSON-encoded byte array to a struct, you would expect some kind of error checking to happen. Let's take the following example:</p>

<pre><code class="language-go">type Data struct {  
       IntField  int  `json:"intfield"`
       BoolField bool `json:"boolfield"`
}
jsonData := []byte(`{"intfield": "yolo", "boolfield": "ctulhu ftaghn (whatever the hell that means)"}`)  
var data Data  
err := json.Unmarshal(jsonData, &amp;data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)  
</code></pre>

<p>As you can see, we are packing all kinds of junk in the JSON object keys that correspond to the <code>Data</code> struct fields. Go deserializes this as far as it can, and when it can't do so anymore, leaves the rest of the fields as they were beforehand. The error message reports the last field that could not be deserialized, resulting with the following output in the above case:</p>

<pre><code>{0 false}
json: cannot unmarshal string into Go struct field Data.intfield of type int
</code></pre>

<p>So keep in mind: Deserialization is not validation. For purposes of validation, you should use a library such as <a href="https://github.com/go-playground/validator/">https://github.com/go-playground/validator/</a>, or even better, something that validates the input JSON directly (which I haven't found a library for yet).</p>

<h3 id="structtagsarenoterrorcheckedinanymanner">Struct tags are not error-checked in any manner</h3>

<p>When logic is put into strings in a programming language, trouble is inevitable. Language capabilities go out the window, and you are left alone with your tired eyes and mind to catch errors. Go's struct tags are no exception. Since their contents are not code, any errors you make go straight through the Go compiler without any warnings. Let's have a look at this example:</p>

<pre><code class="language-go">type Data struct {  
    IntField int `json:"int_field or something`
}
jsonData := []byte(`{"int_field": 43}`)  
var data Data  
err := json.Unmarshal(jsonData, &amp;data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)  
</code></pre>

<p>This will print <code>{0}</code> and no error. One might think that struct tags are always simple, such as those for JSON deserialization, and an average programmer <em>should</em> be able to deal with them in a normal state. Unfortunately, tags are used by all kinds of libraries, which implement their own syntax embedded in the tag string. One example is the tag structure used by the validation library I linked to above, demonstrated in the following type definition:</p>

<pre><code class="language-go">type IntegrationInput struct {  
    IntegrationTypeID int32 `json:"integration_type_id" validate:"gte:1"`
}
</code></pre>

<p>Can you see the error here? The <code>validate</code> tag has to be <code>"gte=1"</code> and not <code>"gte:1"</code>. Things like this are difficult to get right and debug, especially when multiple tags are interacting, as in this example. As with unexported struct fields, <code>go vet</code> can help you with tags, generating the following error for the first example:</p>

<pre><code>./tags.go:10:3: struct field tag `json:"int_field or something` not compatible with
  reflect.StructTag.Get: bad syntax for struct tag value
</code></pre>

<p>But <code>go vet</code> cannot help you with the <code>validate</code> tag, because those tags have their own logic. So use <code>go vet</code> to avoid type field tags, but also pay extra attention to the format of the more complex tags.</p>

<h3 id="bonusstructtagmatchingiscaseinsensitive"><em>Bonus</em>: struct tag matching is case-insensitive</h3>

<p><em>Thanks to <a href="https://www.reddit.com/r/golang/comments/lup3ib/golang_json_gotchas_that_drove_me_crazy_but_i/gpa2450?utm_source=share&amp;utm_medium=web2x&amp;context=3">procach</a> on /r/golang for this tip.</em></p>

<p>You would think that, if you use field tags to match JSON fields, you would be able to precisely match the case of fields in JSON data. This is not really the case, however. Even if you use tags, the match is case-insensitive, as the following example shows:</p>

<pre><code>var data struct {
    Key string `json:"TheKey"`
}
jsonData := []byte(`{"thekey": "Value"}`)
err := json.Unmarshal(jsonData, &amp;data)
fmt.Printf("%v\n", data)
fmt.Printf("%s\n", err)
</code></pre>

<p>This will output <code>{Value}</code>. Even though <code>TheKey</code> and <code>thekey</code> are differing strings, <code>encoding/json</code> will match the fields to each other. Another thing to keep in the back of your head, in case unmarshalling behaves in unexpected ways.</p>

<h3 id="conclusion">Conclusion</h3>

<p>I consider it a useful restriction-<em>cum</em>-feature that Go requires you to convert JSON data to native structures to manipulate them conveniently. Languages like Python, which have built-in syntax for similar structure, can lead to JSON-driven development, which I had discussed in <a href="http://okigiveup.net/arguments-against-json-driven-development/">another blog post</a>. If you want to have a good time converting between JSON and Go, make sure you don't skip error checks, regularly use <code>go vet</code>, and pay attention to capitalization, and you should be all fine and dandy.</p>]]></content:encoded></item><item><title><![CDATA[Structured Debugging]]></title><description><![CDATA[<p>In this piece I would like to describe a practice I adopted a few years ago, after seeing how effectively colleagues were applying it. The idea of what I will call <em>structured debugging</em> is very simple: Document every step of the debugging effort of a complicated bug in an interactive</p>]]></description><link>http://okigiveup.net/structured-debugging/</link><guid isPermaLink="false">7aac5bb6-79d7-4c71-acd1-94be5853a661</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Mon, 10 Aug 2020 12:59:55 GMT</pubDate><content:encoded><![CDATA[<p>In this piece I would like to describe a practice I adopted a few years ago, after seeing how effectively colleagues were applying it. The idea of what I will call <em>structured debugging</em> is very simple: Document every step of the debugging effort of a complicated bug in an interactive environment in such a way that your thought process and deductions can be followed and verified. Despite its usefulness, simplicity and effectivenes, I see structured debugging practiced rarely. It helped me with complex bugs on multiple occasions, especially in distributed applications, and also leads to artefacts that have value on their own, independent of the debugging.</p>

<p>The first step of structured debugging is figuring out where to document your progress. If you are happy working with the bug tracker that is at your disposal, you can document your work as comments on the bug ticket. If you prefer working in your editor, as I do, I would advise you to create a markdown file named after the ID of the bug. When you want to quickly switch to this buffer, you can use the bug ID, and once you are done, or want to notify others of your work-in-progress, you can copy-paste the contents. Every half decent bug tracker out there accepts markdown input these days; by editing the report in your editor as markdown, you will have best of both worlds, with local shortcuts and utilities, in addition to clickable links and decent formatting for code snippets etc.</p>

<p>Once you have picked out the documentation environment, you should start proceeding in a systematic way. This can be done in the standard debug loop (gather information, set up conjectures, test them, repeat). What you want to achieve iterating over this loop is a <em>reproducible narrative</em>: Document each of the steps in such a way that any other developer with acquintence to the code base can open the ticket, follow through the comments and repeat any commands, arriving at the same conclusions as you do. When gathering information, it is common to make use of SQL queries, for example, or even simple scripts that join data from multiple sources. You should gather all of these in your report, together with the results at the point you ran them. One nice side effect of making this information available in a nice form is that you can take the time to make them as informative and simple as possible, for example by using joins instead of multiple queries in SQL. In order to gather information from your colleagues, you can tag them in the bug ticket, so that they can write their responses there, enriching the bug hunt.</p>

<p>The most relevant source of information in debugging live systems is logs. In the old days, web application logs were stored either as text files on servers, rotated and zipped regularly, or piped to syslog. Both made accessing these in a linkable form problematic. More recently, however, dedicated systems for log analysis have become more and more widely used. These all (or nearly all; CloudWatch doesn't allow linking to a single line) have means for linking to individual lines, time windows or the results of specific queries. Instead of just copy-pasting the relevant log lines, or in addition to that, consider using these links. Any readers can open these links and try alternative searches. Another important source of information for especially complicated bugs is diagrams for explaining workflows, relationships or complex constellations. Timeline of events, for example can be explained using sequence diagrams, which are much better than convoluted text. When the difficulty of the bug warrants it, these are a great addition to the narrative.</p>

<p>Structured debugging can cause significant extra work, but it has major benefits. Most importantly, the end result will leave little doubt as to anything was missed. The conclusions you derive will not be based on conjectures and assumptions, but concrete data and tooling, open for everyone to read and verify. Furthermore, the artefacts resulting from structured debugging are valuable on their own. Not only once did I see the tools used in such debugging actually getting incorporated into internal products, such as SQL queries turned into internal web pages &amp; reports, or Kibana searches that were added to dashboards as graphs. In case the bug proves tougher than you thought, or something more important comes in between, the report will prove invaluable: Once it's taken up again, you or anyone else can read it, and easily take off from where you left. Last but not least, this method will make it visible to the team what gaps in visibility and diagnostics exist, making analyzing and linking the whole system harder or incomplete.</p>]]></content:encoded></item><item><title><![CDATA[The How and Why of Go, Part 1: Tooling]]></title><description><![CDATA[<p>I'm one of those people surprised at the success of the Go programming language. Here is a language that prides itself in offering less than languages designed decades ago, unabashedly not OOP, and without a decent dependency management system (at least initially), but still wildly successful, with a number of</p>]]></description><link>http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/</link><guid isPermaLink="false">f1b528f2-288d-421b-8acc-87e5ba9fc070</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Tue, 30 Jun 2020 12:49:29 GMT</pubDate><content:encoded><![CDATA[<p>I'm one of those people surprised at the success of the Go programming language. Here is a language that prides itself in offering less than languages designed decades ago, unabashedly not OOP, and without a decent dependency management system (at least initially), but still wildly successful, with a number of significant open source projects written in it (e.g. Docker, <a href="https://github.com/hashicorp/terraform">Terraform</a> &amp; <a href="https://github.com/kubernetes/kubernetes">Kubernetes</a>). Another intriguing thing is that people who use Go as their primary language rarely complain about it (maybe a <em>generics would be nice</em> here and there), while those who come into initial contact with it can't stop swearing, at least initially (<em>mea culpa</em>). I used this gap in affinity as a chance to understand the intricacies of Go by diving into the platform, and writing down what I think is necessary knowledge for newcomers to become productive on it. The target reader group would be developers already proficient in one language and platform; as the text is already quite long, I didn't want to explain common programming terminology. Unavoidably, my perspective is skewed by the technologies I'm acquainted with, especially Python, with which I frequently compare Go, but it should be useful for all newcomers, even those without too much programming experience. I hope that those already working in Go can also find a useful tip here and there.</p>

<p>And now to the "Why" in the title. The design of Go is a bit curious, in that it leaves out most features of other popular programming languages, going for simplicity rather than recommending itself with more features. My aim was to make a proper attempt at understanding the context for this choice, by following the path from expectations from a language, to design principles, to language features, and finally to the embodiment of the language in terms of compiler, runtime and tooling. This process is never perfect for any technical product, as there are incidental turns taken at every step, but I think the knowledge of how different aspects of a language came to be the way they are, while depending on each other and the context, is very important and useful. I therefore attempted to start with an overview of the "intellectual" history of Go, connecting following discussions of features to this history.</p>

<p>This first part in what I intend to be a two-part series will concentrate on the Go toolchain, that is, the set of tools for writing, verifying, compiling and maintaning Go applications. As we will see, Go tooling has come a long way, and offers a first-class development environment for writing correct and performant applications. The second installment will deal with the most important features of the language, also in the light of Go design decisions. I would like both parts to stay as up-to-date and relevant as possible, so if you have any comments on improving, do leave a comment, and I will make sure to address it.</p>

<p>Since the text is rather long, here is a table of contents, in case you want to jump to a subsection:</p>

<ul>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org57d594f">Why is Go the way it is?</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orga61d15f">The Go toolchain</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org7f84510">Built-in Go tools</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orgd297668">Environment variables</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org35c3eab">Organizing your code in modules and packages</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orgfa3e47d">Other useful subcommands</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orgd64375b">Dependency management and the build system</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org73f03d5">Testing</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orgab9c2eb">Further Go tools</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org804d6c2">Debugging</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#org5afb9c3">In the next episode</a></li>
<li><a href="http://okigiveup.net/the-how-and-why-of-go-part-1-tooling/#orgeb8223a">Resources</a></li>
</ul>

<p><a id="org57d594f"></a></p>

<h3 id="whyisgothewayitis">Why is Go the way it is?</h3>

<p>In order to appreciate the design decisions that went into the Go language, it's important to understand where the language designers started from, and what problems they expected their language to solve. As Rob Pike explains in detail in <a href="https://talks.golang.org/2012/splash.article">this article</a> from 2012, Go was not designed to experiment with PLT concepts. The languages and technologies it intended to replace were those in daily use at Google (C++, Java and Python); Go was designed to solve the problems these platforms presented at Google scale. A rough three-part categorization of the problems Go is intended to address can be done as follows:</p>

<p><strong>Build issues</strong>: The problems that C and C++ model of compilation presents are well-known. As explained in the linked article, compiling a moderately large C++ codebase can lead to gigabytes of IO. Go avoids this and similar issues by making unused imports an error, and replacing header files and includes with an inverted dependency model of compilation. Circular imports are also not allowed. One interesting side effect of this stress on dependency hygiene is that copy-paste is preferred to importing large packages:</p>

<blockquote>
  <p>Dependency hygiene trumps code reuse. One example of this in practice is that the (low-level) net package has its own integer-to-decimal conversion routine to avoid depending on the bigger and dependency-heavy formatted I/O package. Another is that the string conversion package strconv has a private implementation of the definition of 'printable' characters rather than pull in the large Unicode character class tables; that strconv honors the Unicode standard is verified by the package's tests.</p>
</blockquote>

<p><strong>Developer ergonomics</strong>: Go is a minimalist language that tries to do away with features of languages used at Google, such as C++, Java and Python, which are not coincidentally also quite popular in the rest of the dev community. In fact, as <a href="https://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html">this candid blog post</a> explains, the initial drive to develop Go came from unpleasant experiences developing concurrent code in C++, and the intention of the standards committee to make the language even more complex. While differentiating itself from popular languages, Go cannot stray too far away from them, as it is intended to be used in production at a company the size of Google. As such, it has to be easy to learn, and familiar to junior developers. This simplicity is at the service of solving modern programming challengs, foremost being concurrency. Concurrency in Go is provided through <a href="https://en.wikipedia.org/wiki/Communicating_sequential_processes">communicating sequential processes (CSP)</a>, the advantage of which is that it is easy to integrate into a procedural language. Another modern feature that now meets a C-like language in Go is garbage collection. Due to the type system and memory allocation features of Go, however, garbage collection is different from the way it works in languages like Java or Python; we will discuss this in the follow-up post.</p>

<p><strong>Google-scale</strong>: The design of Go is optimized for disambiguity and parsability. The underlying reasons are the ease of writing external tools and avoiding discord in large developer teams. As an example, Pike <a href="https://talks.golang.org/2012/splash.article#TOC_4.">mentions</a> that having languages that are whitespace-sensitive, such as Python, is not an issue in itself, but Python embedded in SWIG declarations turns out to be a huge problem. In order to preclude such nuisances, Go has curly braces and clear formatting rules. Another example is the now famous auto-formatting tool that provides a standard through implementation. This, and similar tools like <code>gofix</code> which we will discuss later, are possible because the language is easy to parse and unambiguous (in comparison to e.g. C++, which can have <a href="https://en.wikipedia.org/wiki/Most_vexing_parse">statements that can be parsed multiple ways</a>). These tools enable standards to be set within large groups, and also systematic changes such as API changes to be made on large code bases. Generally, it can also be said that the rest of the design concerns gathered in the previous categories also contribute to scaling Go, especially those concerning concurrency primitives and strong standard library support. Another aspect of scale is the number of people working on a project. As Pike correctly observes, developers tend to stick to a subset they understand of a complex language with many features. As Go has a rather limited set of features, there is no subset to agree on.</p>

<p>Obviously, not all the design choices that went into Go can be explained through these points. There are quite a number of things that are put to good use in other languages, but are explicitly shunned in Go, such as OOP, exceptions and generics. In my opinion, there is one general thread that connects the dots, which is that in Go, things that are a pain in the large are not allowed in the small, either. Or in <a href="https://talks.golang.org/2012/splash.article#TOC_7.">Rob Pike's words</a>:</p>

<blockquote>
  <p>As with many of the design decisions in Go, it forces the programmer to think earlier about a larger-scale issue &#x2026; that if left until later may never be addressed satisfactorily.</p>
</blockquote>

<p>Another important aspect you need to keep in mind when reading Go documentation, and wondering at how <em>basic</em> it is, is that priorities in keeping the implementation simple and feasible in certain areas led the designers to simply omit what one takes for granted in other languages, but leads to hidden complexity in the implementation. As stated in the <a href="https://golang.org/doc/faq#Why_doesnt_Go_have_feature_X">official FAQ</a>:</p>

<blockquote>
  <p>Go was designed with an eye on felicity of programming, speed of compilation, orthogonality of concepts, and the need to support features such as concurrency and garbage collection. Your favorite feature may be missing because it doesn't fit, because it affects compilation speed or clarity of design, or because it would make the fundamental system model too difficult.</p>
</blockquote>

<p><a id="orga61d15f"></a></p>

<h3 id="thegotoolchain">The Go toolchain</h3>

<p>The core of the Go toolchain is the <code>go</code> command line tool that bundles the most important components, including the compiler. In the rest of this text, Go refers to the language and ecosystem, whereas go (with lowercase g) refers to the command line tool. Go adheres to the recent pattern of delivering development tools where the entry point is one single command which accepts a first argument as an action (other examples could be git and kubectl). In daily work, when building Go code, one rarely has to deal directly with the actual compiler or linker, which are hidden somewhere in the Go distribution. The Go compiler is in fact written in Go itself, and thanks to this fact and the snappy compile times, boostrapping the Go toolchain is one of the simplest ways to get an up-to-date version on your computer. You will first need to get the out-of-date but still useful version of go from system repositories, as in <code>apt install golang</code>. Afterwards, download the latest source package from the <a href="https://golang.org/dl/">official downloads page</a>. After unpacking it, run the command <code>./make.bash</code> in the <code>src/</code> subdirectory. This will compile the compiler, various other tools and the library. On my relatively outdated i7 2.40GHz computer it took 5 minutes in total. The compiler will now reside in the <code>bin/</code> directory; you can either use it by referencing it explicitly or by setting the search path with <code>export PATH=`pwd`/bin:$PATH</code>. If you pack the following into the file <code>hello.go</code>, and compile it with <code>go build hello.go</code>, you should have the traditional Hello World:  </p>

<pre><code class="language-go">package main

import "fmt"

func main() {  
    fmt.Println("Hello world")
}
</code></pre>

<p>The executable is created by default with the name of the file, i.e. <code>hello</code>, which means no more <code>a.out</code>. As we mentioned, the compiler and linker are being called in the background; you can find out how and where by running the same command with the <code>-x</code> option, i.e. <code>go build -x hello.go</code>. In this verbose output, you can see how go creates a temporary working directory, creates some files for specifying where the various obkect files are, and then brings everything together.</p>

<p>You can also run the Hello World file with <code>go run hello.go</code>; this will directly execute the code without creating an executable. The argument to this command does not have to be a file; it can also be <a href="https://golang.org/doc/go1.11#run">a package or directory</a> (with a main package; more on this later).</p>

<h4 id="crosscompilinggo">Cross-compiling Go</h4>

<p>As mentioned, it is possible to cross-compile Go code for another platform. This can be achieved using environment variables that specify the target operating system and architecture. In order to compile for Linux on the Raspberry Pi which uses an ARM chip, for example, you would need to run the following:</p>

<pre><code>GOOS=linux GOARCH=arm go build hello.go
</code></pre>

<p>If you now look at what kind of a file the resulting binary is with <code>file hello</code>, you will see that it's an <em>ELF 32-bit LSB executable, ARM</em>.</p>

<p><a id="org7f84510"></a></p>

<h3 id="builtingotools">Built-in Go tools</h3>

<p>As the Go language targets large teams of developers without too much experience, the toolchain contains a couple of tools that effectively set standards by implementing them. The best-known of these is the <code>gofmt</code> tool that automatically formats code. The aim is avoiding bikeshedding discussions by providing one correct, automated way of formatting Go code. The <code>gofmt</code> tool is delivered as a part of the Go codebase; if you built the code from source as explained above, you should have it lying next to the <code>go</code> tool. In daily usage, <code>gofmt</code> is called using the alias <code>go fmt</code>, which is simply <code>gofmt -l -w</code>. With these options, <code>gofmt</code> reformats the files in-place, and prints their names. This isn't all <code>gofmt</code> can do, however. It is also a useful tool for simple transformations using the <code>-r</code> option. Let's say that you modified a frequently called function <code>yarbl</code>, and changed the order of its arguments; the first and second arguments have to be switched. That is, instead of <code>yarbl(x, y, z)</code>, you need <code>yarbl(y, x, z)</code>. The following command will update all calls to <code>yarbl</code> in file <code>code.go</code> (we will see later how to refer to a module or package) and make them fit the new signature:</p>

<pre><code>gofmt -r 'yarbl(x, y, z) -&gt; yarbl(y, x, z)' -w code.go
</code></pre>

<p>In the pattern specification, you need to use single lowercase letters to match sub-expressions; anything else will be matched exactly. With the above pattern, the following code:  </p>

<pre><code class="language-go">yarbl(x, y, z)  
yarbl(foo, bar, zap)  
</code></pre>

<p>will be changed to the following:  </p>

<pre><code class="language-go">yarbl(y, x, z)  
yarbl(bar, foo, zap)  
</code></pre>

<p>This feature is rather useful for refactoring Go code, e.g. in order to change the name of a function or variable in order to export it publicly. Another switch <code>gofmt</code> accepts is <code>-s</code>, which can be used to simplify your code, but the transformations carried out by this option are relatively limited, in complexity and in number.</p>

<p><a id="orgd297668"></a></p>

<h3 id="environmentvariables">Environment variables</h3>

<p>Before we go any further, I would like to explain the role of a couple of environment variables in the functioning of the <code>go</code> tool. We already saw above how the target platform and operating system can be passed into the Go compiler via environment variables. There are three more environment variables that determine the way Go looks for, stores and compiles code. In the order of importance, these are <code>GOPATH</code>, <code>GOBIN</code> and <code>GOROOT</code>. Other environment variables affect other functionality, as you can read in the <a href="https://golang.org/cmd/go/#hdr-Environment_variables">official documentation</a> (or on the commandline with <code>go help environment</code>), but they are not as significant. You can also print all the environment variables that Go consults by running <code>go env</code>, or <code>go env VARNAME</code> to get the value of a specific variable. These commands will also print the default values if the particular variables are not set.</p>

<p><a href="https://golang.org/cmd/go/#hdr-GOPATH_environment_variable"><code>GOPATH</code></a> used to have a very central role in how code under development was organized; you needed to place your code in a very specific place under <code>GOPATH</code> for the <code>go</code> tool to work, but this situation has changed with the new module system, which we discuss below. By setting <code>GOPATH</code>, you can determine where <code>go</code> downloads third party packages and source code. If not set, it defaults to the <code>go</code> directory in user home. You can set it to an arbitrary directory, for example the directory you are in with <code>export GOPATH=`pwd`"</code>.</p>

<p><code>GOBIN</code> determines where Go saves executables that are compiled with the two other very frequently used go commands, <code>go install</code> and <code>go get</code>. These commands are used for compiling and putting executables to the <code>GOBIN</code> directory from local and remote code respectively. You can get a taste of the first command by running <code>go install hello.go</code> in the directory where the Hello World code resides. This should place the <code>hello</code> binary in the <code>GOBIN</code> directory. When not set, <code>GOBIN</code> defaults to <code>$GOPATH/bin</code>.</p>

<p><code>GOROOT</code> is the directory in which Go looks for the standard library. In normal usage, you don't need to set this yourself: <code>go</code> will figure this out by looking at where it's running.</p>

<p><a id="org35c3eab"></a></p>

<h3 id="organizingyourcodeinmodulesandpackages">Organizing your code in modules and packages</h3>

<p>The Hello World example above had as its first line the declaration <code>package main</code>. Every Go code file needs such a line at the very top (optionally after some comments for documentation), telling the compiler in which package the code in the file belongs. In order to understand and use packages, we need to start at a higher level of abstraction, namely modules. Modules are the distribution units of Go code, be it libraries or executables. Technically, they are collections of packages that have common dependencies and compilation conditions. In the old way of doing things, modules were determined by the path in which Go files were located with respect to the <code>GOPATH</code>, but this is not necessary anymore; you can define modules with a single command, as we will see later, which creates a <code>go.mod</code> file in a certain format. Once defined this way, you can organize your code into packages, just like the main package that we used above. Before we continue with examples, I would like to point out that you can get rather detailed documentation on modules on the command line with the <code>go help modules</code> command (available online <a href="https://golang.org/cmd/go/#hdr-Modules__module_versions__and_more">here</a>). As per this documentation, the module-related behavior of the go tool can be controlled in detail using certain environment variables. Generally, however, you can assume that if you're in a module (i.e. there is a <code>go.mod</code> file in a supervening directory), you are in module mode, and the instructions here apply. We will later handle downloading and installing Go code without the use of modules.</p>

<p>We will create our module in an empty directory; the name of the directory is not important. Within this directory, run the command <code>go mod init myprinter</code>. This will create the aforementioned <code>go.mod</code>, which should have the following content:  </p>

<pre><code class="language-go">module myprinter

go 1.14  
</code></pre>

<p><strong>Side note</strong>: In the Go world, module names are connected to how they can be found on the internet; the conventional way of naming a module is prefixing it with its repository URL. We will deal with this topic later, in order not to complicate the matters at this point.</p>

<p>Obviously, these are the module name and the Go version with which it was created. Let's add some code to this module; add the following to the file <code>myprinter.go</code> right next to <code>go.mod</code>:  </p>

<pre><code class="language-go">package main

import (  
    "fmt"
    "os"
    "path/filepath"
)

func main() {  
    dir, _ := filepath.Abs(filepath.Dir(os.Args[0]))
    fmt.Println(dir)
}
</code></pre>

<p>This file has the package declaration <code>main</code>, but the filename is completely different, which go permits. You can in fact put code for the same package in different files in the same directory, with the restriction that there is only one package in a directory. The only exception to this single package rule is the test package; more on this later. Now within the same directory, run the command <code>go install myprinter</code>. Before doing so, however, make sure that you have set <code>GOBIN</code> to a practical location. You should end up with the executable <code>myprinter</code> in the <code>GOBIN</code> directory, and when you run it, its output should be the path to the executable you just ran. The <code>main</code> package has a special meaning in Go. When you ask Go to create an executable from a code directory, it will look for the <code>main</code> package within that directory and compile it to an executable, with the <code>main</code> function as the entry point. That is, you cannot create an executable out of an arbitrary file; it has to be a main package, even if it's a subpackage. For subpackages, the last part of the path specification will be taken as the name of the executable. If the <code>main</code> package is at the base, as with the toy example here, it will be the name of the module. Fittingly, you cannot create a <code>main</code> package and import code from it; Go will complain that the location you are trying to import from "is a program, not an importable package".</p>

<p>Now let's move the logic for finding the path of the current executable to a separate package. To add a new package to our module, create the subdirectory <code>pathfinder</code> and put the following in the file <code>pathfinder.go</code> in that directory:  </p>

<pre><code class="language-go">package pathfinder

import (  
    "os"
    "path/filepath"
)

func Find() string {  
    dir, _ := filepath.Abs(filepath.Dir(os.Args[0]))
    return dir
}
</code></pre>

<p>Here is something you should pay attention to: the <code>Find</code> function <em>has</em> to be capitalized. Otherwise Go will complain that it cannot be found when accessed in <code>main.go</code>. This is an interesting feature of the Go language: Visibility is tied to capitalization. We will see more on this in the second installment, but you should keep it in mind in case you see an error. Also modify the <code>main.go</code> file to look like this:  </p>

<pre><code class="language-go">package main

import (  
    "fmt"
    "myprinter/pathfinder"
)

func main() {  
    fmt.Println(pathfinder.Find())
}
</code></pre>

<p>As you can see, we are importing our new package as <code>myprinter/pathfinder</code>. Go does not have relative imports; every import path has to uniquely identify the package it is importing &#x2013; another feature through a lack of feature, making code analysis and refactoring easier. You can now run <code>go install myprinter</code>, and it should create a binary in the same location that does the same thing. The last argument to <code>go install</code> is optional; when omitted, Go will build and install the main package in the current directory. The command <code>go build</code> we saw earlier will do something very similar, simply dropping the compiled executable in the current directory instead of moving it to <code>$GOBIN/bin</code>.</p>

<p>You might be asking yourself, how can one check whether a package that is not an executable but simply a library is error-free and can be compiled? This can be done with both <code>go build</code> and <code>go install</code>. For non-main packages, both of these commands will compile the intermediate package binary, and then discard it (this behavior is new with modules; in the past, <code>go install</code> used to compile packages to <code>$GOPATH/pkg</code>).</p>

<p><a id="orgfa3e47d"></a></p>

<h3 id="otherusefulsubcommands">Other useful subcommands</h3>

<p>In addition to <code>fmt</code>, <code>build</code> and <code>install</code>, the base <code>go</code> tool has a number of very useful subcommands. You can list these by simply running <code>go</code>. Additional information on each subcommand can be printed by running e.g. <code>go help build</code>. I would strongly recommend you to read these help pages every now and then; I found out about <code>go build -x</code> while going through the help page, for example. In this section, I would like to go into a bit more detail on two subcommands that are rather useful, <code>go list</code> and <code>go doc</code>. The <code>go list</code> subcommand prints information about the packages specified as arguments, or the packages in the current directory if none are specified. We can list the packages under our <code>myprinter</code> module, for example, with the command <code>go list myprinter</code>. You have to do this while inside the directory, because otherwise module mode will not be activated, and the module will not be found. The output will simply be the name of the base package, <code>myprinter</code>. What if we want to refer to all the packages of a module recursively? Ellipsis, or three dots, is the operator you need for this purpose, as in <code>go list myprinter/...</code>. All <code>go</code> subcommands accept an argument with ellipsis; for example, to build all of the <code>myprinter</code> module, we could run <code>go build myprinter/...</code>, which would be totally useless in this case. We will see more useful applications of ellipsis later.</p>

<p>If we run <code>go list myprinter/...</code>, we will get the following list:</p>

<pre><code>myprinter
myprinter/pathfinder
</code></pre>

<p>This is all nice and dandy, but not that useful; the same could be achieved with some grep (well, someone else could do it, at least). The real power of <code>go list</code> is in the use of the template argument, documented in <code>go help list</code>. The template can be given as the argument <code>-f</code>, and can include statements that interpolate from the <code>Package</code> data structure (also documented in the help printout). For example, for each path, we can print the package path and the imports, as follows:</p>

<pre><code>$ go list -f "{{ .ImportPath }}: {{ .Imports }}" myprinter/...
myprinter: [fmt myprinter/pathfinder]
myprinter/pathfinder: [os path/filepath]
</code></pre>

<p>If you want to try out this command on a large module, you can try something from the standard library, such as <code>net/...</code>. Alternatively, you can also use the special argument <code>all</code>, which will print information on <em>all</em> "active" packages, meaning those that are depended on, including those in the standard library. <code>go list</code> can also be used to print information about modules, with the flag <code>-m</code>. With this flag, the struct that is used for interpolation is, as one would expect, <code>Module</code> instead of <code>Package</code>. For both packages and modules, there is a lot of extra information that can be printed out, which can be rather useful for automated analysis and overviews of large code bases.</p>

<p>Once you list out the packages in a module, you will probably want to get more information about what's in them. The command for this purpose is <code>go doc</code>. If you go ahead and try to print documentation on our toy module with <code>go doc myprinter</code>, you will see that an empty line is printed out; this is because there is no documentation. Let's add the following to the top of the file <code>main.go</code>:</p>

<pre><code>// A module with an entry point that prints the path to the binary.
//
// This module is for demo purposes. It does not do anything useful.
// You can read the blog post at http://okigiveup.net.
</code></pre>

<p>If you now run <code>go doc myprinter</code>, you will see the above text. This is the <a href="https://blog.golang.org/godoc">convention for documenting Go packages</a>: a short description, and then a longer text, both as comments at the very top, and separated by a blank line. By default, <code>go doc</code> does not print any members from a main package. If we run it on the <code>pathfinder</code> subpackage, we will see that it prints information on the <code>Find</code> function:  </p>

<pre><code class="language-go">package pathfinder // import "myprinter/pathfinder"

func Find() string  
</code></pre>

<p>When given a single argument that is a package path, <code>go doc</code> will print the documentation for the package and list the exported symbols (as mentioned above, this is done by capitalizing their names). As you would be prone to guess, we could get extra documentation on the <code>Find</code> function, but we don't have any. The <code>go doc</code> tool looks for a comment block right before a function as its documentation (the same is valid for constants, package variables etc); let's add the following to <code>pathfinder.go</code> right before <code>Find</code> :</p>

<pre><code>// Find finds and returns the path to the currently executing binary
</code></pre>

<p>Now, in order to get this documentation, we would need to refer to the <code>Find</code> function somehow. There are two ways of doing this: either with <code>myprinter/pathfinder.Find</code>, or by providing a second argument, as in <code>go doc myprinter/pathfinder Find</code>. Both should give you the following result:  </p>

<pre><code class="language-go">package pathfinder // import "myprinter/pathfinder"

func Find() string  
    Find finds and returns the path to the currently executing binary
</code></pre>

<p>Another built-in tool that is useful for checking the correctness of Go code is <a href="https://golang.org/cmd/vet/"><code>go vet</code></a>. There are certain kinds of errors that are possible in Go code which the compiler can't (or won't) find; for example, string interpolation arguments can be missing or invalid (a <code>%d</code> where a string is specified), or nil checks can be unnecessary because a value cannot be nil. <code>go vet</code> has a number of built-in checks that are all applied by default; you can see a list with <code>go doc vet</code> (or by following the above link). When you run a test using <code>go test</code> (details of this command will be discussed later), <code>go vet</code> is applied with a subset of these checks, such as the <code>printf</code> check, which concerns the aforementioned string interpolation. If you have a CI pipeline, it makes sense to add <code>go vet</code> to catch any subtle issues that might otherwise slip through.</p>

<p><a id="orgd64375b"></a></p>

<h3 id="dependencymanagementandthebuildsystem">Dependency management and the build system</h3>

<p>Dependency management in the Go world is a curious story. In earlier versions of Go dependency management was, mildly put, quite difficult to get used to. It was essentially very close to a bash script that used <code>go list</code> to print out and clone all the git repositories referenced in the code. For a long time <a href="https://research.swtch.com/vgo-intro#versioning_and_api_stability">it wasn't even possible to pin versions of dependencies</a>. The recommended way to get reproducible builds for a project was to copy dependencies into the project repository (see the last paragraph of the previous link). You also had to put your own code in a very specific place, along with the dependencies, which had a weird feeling of propagating the dependencies <em>up</em> the code hierarchy, instead of down (i.e. in a subfolder like <code>node_modules</code>). Fortunately, the new module system, available <a href="https://golang.org/doc/go1.11#modules">since version 1.11</a>, frees developers from this sorry state of affairs. It is the result of a nearly two year long design discussion; you can read the various posts that explain the state of the design, together with extensive discussion in the comments section, <a href="https://research.swtch.com/vgo">here</a>. The resulting dependency management system is the new standard, and is miles better than the old way of doing things. Therefore I will not discuss the old GOPATH-based one, and concentrate solely on the module-based dependency management system here.</p>

<p>A very interesting decision Go has taken from the beginning is to combine the package system with code hosting. Above, we called our module <code>myprinter</code>; this is actually not the conventional way of naming packages. What we should have done is to name the module after the version control location where it would be hosted, i.e. something like <code>github.com/afroisalreadyinu/myprinter</code>. When you do so, Go can fetch and install these modules without any additional work on your or the community's part, like hosting a module index such as PyPI, the Python Package Index. The details of the remote import path specification can be found in the documentation with <code>go help importpath</code>. The gist of it is as follows. Certain well-known code hosting sites, such as Github and Bitbucket, have built-in support so that you can use them in package paths. You can also directly use VCS urls, such as ones that end with <code>.git</code> for Git repos. The VCS's with which Go can work is not limited to git; you can also point to bazaar, fossil, mercurial and subversion repositories. A third, more general remote import mechanism is possible through the use of meta tags on HTML pages. If a page has a specifically formatted meta tag that points to a location that hosts a code repository, the URL of that page can be used in an import path. The details can be found in the importpath documentation mentioned above, or online in the <a href="https://golang.org/cmd/go/#hdr-Remote_import_paths"><code>go</code> command documentation</a>.</p>

<p>This relatively simple scheme does make the import strings longer than usual, but it is actually a nice solution to the perennial problem of specifying which package you are referring to in which import. Since the import path refers also to the location, you will not run into problems using libraries that share a name, and you can easily clone a repository to same other location, and use that version instead of the "canonical" one. Go faced some criticism that tying package names to code hosting sites would centralize package distribution, especially considering the dominance of Github in this space, but compared to other package hosting solutions, such as Python's PyPI and the registry of node, Go's solution is actually more decentralized, since one can host a package on many different, easy to set up locations. Go also has a well thought out module proxy protocol; you can read about it in <code>go help goproxy</code>. This proxy protocol enables one to host dependencies without resorting to any public infrastructure with very little pain, as there are multiple independent implementations. You can read up on using a module proxy, and reasons you should host one, in <a href="https://arslan.io/2019/08/02/why-you-should-use-a-go-module-proxy/">this blog post</a>.</p>

<p>So how do you add a dependency to your project? By simply importing it. Let's say we would like to print our message to the terminal in color using <a href="https://github.com/fatih/color">github.com/fatih/color</a>. In order to do so, we first modify <code>myprinter.go</code> to import and use it:  </p>

<pre><code class="language-go">package main

import (  
    "github.com/fatih/color"
    "myprinter/pathfinder
)

func main() {  
    color.Blue(pathfinder.Find())
}
</code></pre>

<p>If you now run <code>go install myprinter</code>, you should see go fetch the new dependency and place it in <code>$GOPATH/pkg/mod</code> directory, with subdirectories named in the same scheme as the URL module path. In addition, the <code>go.mod</code> file should get updated, and the following line added:</p>

<pre><code>require github.com/fatih/color v1.9.0
</code></pre>

<p>When you add a new dependency as we did right now, and then run a go subcommand (such as <code>build</code>, <code>install</code>, <code>test</code> or <code>list</code>) Go will pick the latest stable release version, based on semantic versioning, download it, and add it to <code>go.mod</code>. What Go won't do is to extract and add the dependencies of the new package to <code>go.mod</code>. If you look at the <a href="https://github.com/fatih/color/blob/master/go.mod"><code>go.mod</code> of the new dependency</a>, you will see that it depends on two other packages, but these are not in the updated <code>go.mod</code> of our module. This is in comparison to pip in Python, for example, where all dependencies will be spit out if you do a <code>pip freeze</code>. If you remove a dependency from your code, you can reliably remove it from <code>go.mod</code> by running <code>go mod tidy</code>. As we will see later, there is one more file that is edited when new dependencies are added, but before that we need to discover the <code>go get</code> command.</p>

<h4 id="installingandupdatingsoftwarewithgoget">Installing and updating software with go get</h4>

<p>We have seen how one can build locally available code with <code>go install</code> and <code>go build</code>. What if we want to install a command, such as <a href="https://github.com/rsc/goversion"><code>goversion</code></a>, which gives information on the compilation context of a binary? The command we need is <code>go get</code>. It accepts the same URL of the package that you would use in an <code>import</code> statement. Using <code>go get</code>, we can install <code>goversion</code> either from Github, or from the domain of its developer (rsc.io), which redirects to the correct repository. Let's opt for the latter:</p>

<pre><code>go get rsc.io/goversion
</code></pre>

<p>This will download, compile and install the executable as <code>$GOBIN/goversion</code>. If you run <code>go get</code> from within the module directory, you will see that it has been added to the <code>go.mod</code> file as a dependency, but with the comment <code>indirect</code> at the end of the line. An indirect dependency of a module is one that is not directly visible from the code. Using <code>go get</code> to install an executable is one way to get such a dependency; the other is updating a dependency-of-a-dependency (called a <a href="https://en.wikipedia.org/wiki/Transitive_dependency">transitive dependency</a>) manually, which is also done with <code>go get</code>. In semantic versioning, version numbers are specified in the format <code>MAJOR.MINOR.PATCH</code>. As we will see later, in Go, major version changes are never done automatically; they are pretty much treated as a different module. One can use Go tooling, however, to view and apply minor and patch updates. We saw above the <code>go list</code> command; we can use it to view all the dependencies of a module, with <code>go list -m all</code>. This will list <em>all</em> the dependencies, also the transitive ones. There is another very useful flag that adds update status information to this output; running <code>go list -m -u all</code> will list, for each dependency, the current version and the available update. What do you do if there is a dependency in there that you don't know how it got in there? There is a command for it; <code>go mod why -m MODULE</code> will figure out the shortest direct path to that module through your dependencies and print it.</p>

<p>Our toy module depends on <code>github.com/fatih/color</code>, which has been frozen for a while and did not have its dependencies updated. When I run <code>go list -m -u all</code>, I can see that there are a number of dependencies with available updates. In such a situation, we principally have three options: Update one single dependency, update all transitive dependencies stemming from a direct dependency, or update all dependencies. Go allows all of these. The first one, updating a single dependency, can be achieved with e.g. <code>go get github.com/mattn/go-isatty</code>; this will update to the highest version in the currently used major version (i.e. minor and patch updates). If you want to update to a specific version instead, you can do this by specifying the version explicitly, as in <code>go get github.com/mattn/go-isatty@v0.0.13</code>. Keep in mind that Go always expects that single-letter <code>v</code> prefix wherever a version has to be specified; <code>@0.0.12</code> will not work. The version here can also be provided as <code>@latest</code>, which will mean the highest version under the current major version.</p>

<p>The second option, updating all transitive dependencies stemming from a direct dependency, can be achieved using the <code>-u</code> flag. If we run <code>go get -u github.com/fatih/</code>, go will fetch the next valid update version for all the dependencies of this one dependency, and update them. If you want to run only patch updates, you can use <code>-u=patch</code>. The last action, updating all dependencies, can be done by omitting all arguments, and running <code>go get -u</code> at the base of the module. With any of these update commands, if you also append <code>-t</code>, test dependencies will also be updated.</p>

<p>Whichever way we update indirect dependencies, the new versions will be tagged as indirect in <code>go.mod</code>. The next time <code>myprinter</code> is built, these new versions of the transitive dependencies will be used, overriding the dependencies in <code>github.com/fatih/color</code>. In case the latter is updated, however, obviating the need for the indirect dependency, the next go command will remove the indirect dependency from <code>go.mod</code>. If you want to do this explicitly, you can run <code>go mod tidy</code>.</p>

<h4 id="filehashchecking">File hash checking</h4>

<p>As already mentioned, there is another file in addition to <code>go.mod</code> that is changed when dependencies of a module change. This file is <code>go.sum</code>, which contains the cryptographic hash of each module, even the transitive dependencies that were not included in <code>go.mod</code>. An error will be raised if the contents of a module do not hash to this value that is first saved when the dependency is added. In fact, even if you haven't already installed a module before, <a href="https://golang.org/cmd/go/#hdr-Module_authentication_failures">Go will check its hash against a central database</a> to make sure the code has not been modified (or manipulated, if you are so inclined) since the version has been published. The URL of this service is stored in the <code>GOSUMDB</code> environment variable, with the default value <code>sum.golang.org</code>. If this environment variable's value is <code>off</code>, or if the go command is called with the <code>-insecure</code> flag (also turning off HTTPS certificate validation), checksum validation is skipped. The sum is done lazily, only when a module is downloaded. If you want to make sure that the locally cached dependencies have not been tempered with, and have the same sum as when they were downloaded, you can run the command <code>go mod verify</code>.</p>

<p>These correctness checks might sound tad excessive &#x2013; they are definitely much more detailed than the ones I'm used to from other languages &#x2013; but they are direct results of the priorities set in the design discussion of the Go build system. These priorities are discussed in detail in the blog post <a href="https://research.swtch.com/vgo-repro">Reproducible, Verifiable, Verified Builds</a>, where it's explained that the Go build mechanism should provide builds that have the following three properties:</p>

<ul>
<li><p>Reproducible: When repeated, a <code>go install</code> or <code>go build</code> will create the same result</p></li>
<li><p>Verifiable: A build artefact should record information on how it was exactly produced.</p></li>
<li><p>Verified: Build process should check that the expected source code packages are being used.</p></li>
</ul>

<p>The use of <code>go.mod</code> and <code>go.sum</code> as explained above enable reproducible and verified builds. In order to make build output verifiable, the Go compiler packs in the necessary build information into its output. We can use the <code>goversion</code> tool that we installed above to print this information. By default, <code>goversion</code> only prints the Go version with which a binary has been built, but it can be made to print the complete build context. If we run it on our little executable with <code>$GOBIN/goversion -mh $GOBIN/myprinter</code>, you should get something similar to the following:</p>

<pre><code>/home/ulas/go/bin/myprinter go1.14
    path  myprinter
    mod   myprinter                      (devel)
    dep   github.com/fatih/color         v1.9.0                              h1:8xPHl4/q1VyqGIPif1F+1V3Y3lSmrq01EabUW3CoW5s=
    dep   github.com/mattn/go-colorable  v0.1.4                              h1:snbPLB8fVfU9iwbbo30TPtbLRzwWu6aJS6Xh4eaaviA=
    dep   github.com/mattn/go-isatty     v0.0.11                             h1:FxPOTFNqGkuDUGi3H/qkUbQO4ZiBa2brKq5r0l8TGeM=
    dep   golang.org/x/sys               v0.0.0-20191026070338-33540a1f6037  h1:YyJpGZS1sBuBCzLAR1VEpK193GlqGZbnPFnPV/5Rsb4=
</code></pre>

<p>Given a Go binary, a user has complete access to the build context. I find the design of the build system rather impressive, as it strictly adheres to clear principles without compromising on usability. Especially for mission-critical applications that need to be testable with different dependency configurations, and debuggable deep into the dependency tree, Go offers a very convincing toolchain without burdening the developer with too many tools and commands.</p>

<h4 id="replacingpackageswithlocalcopies">Replacing packages with local copies</h4>

<p>One thing that I frequently do in Python is open the code of a dependency and edit it or add debug statements while developing my own code. If you use virtual environments, the Python tool for isolating dependency contexts, this is particularly easy, as it would affect only a single such environment. How would one go about doing this in Go? One could fiddle around with the code in the package cache, but this is not recommended practice, and it will break the hash validation. In fact, the source files of dependencies downloaded by go are not even editable on my computer. The supported way of doing this would be to use the replace feature of <code>go.mod</code>. One can tell the module system, through a line in the <code>go.mod</code> file, that a local directory should be used for satisfying a dependency instead of downloading it. Let's say that I checked out <code>github.com/fatih/color</code> locally to <code>/home/ulas/code/color</code>, made a couple of changes to it, and would like to make sure it works with our sample repo. I can tell go to use this local checkout with the following command:</p>

<pre><code>go mod edit -replace=github.com/fatih/color=/home/ulas/code/color
</code></pre>

<p>This will add the following line to <code>go.mod</code>:</p>

<pre><code>replace github.com/fatih/color =&gt; /home/ulas/code/color
</code></pre>

<p>One can of course add this line manually, instead of using a command. Now, when we build <code>myprinter</code>, the local code checkout will be used. This replacement can be removed either by removing the <code>replace</code> directive from <code>go.mod</code>, or with the following command:</p>

<pre><code>go mod edit -dropreplace=github.com/fatih/color
</code></pre>

<h4 id="importpathsandmajorversions">Import paths and major versions</h4>

<p>Go takes semantic versioning rather seriously. The idea behind the major version in semantic versioning is that it signifies backwards-incompatible changes. Go treats such different major versions as different modules; you can import different major versions of a module, refer to them in the same package namespace, and have multiple references to different major versions in <code>go.mod</code>. This is called <a href="https://golang.org/cmd/go/#hdr-Module_compatibility_and_semantic_versioning">semantic import versioning</a>. In order to demonstrate this, I have forked <a href="https://github.com/syohex/gowsay">github.com/syohex/gowsay</a>, turned it into a library instead of an executable, and added two versions to it. Version <code>v1.0.0</code> is pretty straightforward: <code>gowsay.MakeCow</code> accepts a string to wrap and an options struct. Version <code>v2.0.2</code> (I had to up the version a couple of times because I didn't get things right) improves the interface by exporting enumerations for the cow types and accepting one as an argument. There are two things you have to pay attention to when writing a library for external use &#x2013; or rather, that I didn't pay attention to and cost me time. The first is that the module name in <code>go.mod</code> should be the same as how you would refer to it when used, i.e. with the repository path. In the case of <code>gowsay</code> the <a href="https://github.com/afroisalreadyinu/gowsay/blob/master/go.mod#L1">module name</a> has to be <code>github.com/afroisalreadyinu/gowsay</code>. The other thing is that the version tag has to start with a <code>v</code>; otherwise go will not recognize it as a valid version, and will simply use the latest state of the repo. Now let's use gowsay in our <a href="https://github.com/afroisalreadyinu/sample-go-module">demo codebase</a>, by modifying <code>main.go</code> to look like this:  </p>

<pre><code class="language-go">package main

import (  
    "github.com/afroisalreadyinu/gowsay"
    "github.com/fatih/color"
    "myprinter/pathfinder"
)

func main() {  
    path := pathfinder.Find()
    message, err := gowsay.MakeCow(path, gowsay.Mooptions{})
    if err != nil {
        message = path
    }
    color.Blue(message)
}
</code></pre>

<p>We see an example of error handling the Go way here; <code>gowsay.MakeCow</code> has multiple return values, with the second one being an error. If this error is not nil, we print only the path, and not the cow-wrapped path. If you now do a <code>go install</code>, you should see the following new line in the <code>require</code> section of <code>go.mod</code>:</p>

<pre><code>github.com/afroisalreadyinu/gowsay v1.0.0
</code></pre>

<p>Although there are two versions, Go automatically picks version <code>v1.0.0</code>, and not the highest version. Conceptually, the basic module path <code>github.com/afroisalreadyinu/gowsay</code> always refers to version 1.</p>

<h4 id="updatingmajorversions">Updating major versions</h4>

<p>What if we want to use gowsay version 2? The solution designers of Go have come up with is having the version built in to the import path. That is, if we import gowsay as <code>github.com/afroisalreadyinu/gowsay@v2</code>, any following command such as <code>go install myprinter</code> will download and compile version <code>v2.0.2</code>. A subtle and important point when changing the major version of a library you are working on is that you have to make sure to change the module name in <code>go.mod</code>. For version <code>v1.0.0</code> of gowsay, for example, the first line of <code>go.mod</code> will simply be the following:</p>

<pre><code>module github.com/afroisalreadyinu/gowsay
</code></pre>

<p>When we tag and release the next major release, we have to change this line to the following:</p>

<pre><code>module github.com/afroisalreadyinu/gowsay/v2
</code></pre>

<p>Otherwise, go will complain with a message similar to the following:</p>

<pre><code>go get github.com/afroisalreadyinu/gowsay@v2.0.2:
github.com/afroisalreadyinu/gowsay@v2.0.2: invalid version: module contains a
go.mod file, so major version must be compatible: should be v0 or v1, not v2
</code></pre>

<p>Another way to update the <code>go.mod</code> file and use the next major version of a dependency is to use <code>go get</code> with a higher version. But you have to be careful here: If you simply run <code>go get github.com/afroisalreadyinu/gowsay@v2.0.2</code>, you will get the same error message as above. The reason is that <code>github.com/afroisalreadyinu/gowsay</code> refers to major version 1. Go will check out version <code>v2.0.2</code> and will look for the module name without the <code>v2</code>, and failing at this, issue an error message.</p>

<p>Once you import version 2.0.2 of gowsay, you can use the new interface, referring to the new version using the same name as before:  </p>

<pre><code class="language-go">package main

import (  
    "github.com/afroisalreadyinu/gowsay/v2"
    "github.com/fatih/color"
    "myprinter/pathfinder"
)

func main() {  
    path := pathfinder.Find()
    message, err := gowsay.MakeCow(path, gowsay.BeavisZen, gowsay.Mooptions{})
    if err != nil {
        message = path
    }
    color.Blue(message)
}
</code></pre>

<p>It wouldn't be the case with our silly gowsay library, but if you felt the need to refer to different major versions of a library in the same go file, you can definitely do that. One of the imports, however, has to be prefixed with an alternative reference, so that the names do not clash, as in the following example:  </p>

<pre><code class="language-go">import (  
    "github.com/afroisalreadyinu/gowsay"
    gowsayTwo "github.com/afroisalreadyinu/gowsay/v2"
)
</code></pre>

<p>Go module system has a number of other features we will not go into detail here, such as vendoring, where code depended on is stored in the repository. The best place to read up on these is the <a href="https://github.com/golang/go/wiki/Modules">Modules page of the Go wiki on Github</a>, which is exhaustive as far as I can judge. I would highly recommend at least skimming through that page, in order to have an idea of the tools that are available, and get a glimpse of the versatility Go offers.</p>

<p><a id="org73f03d5"></a></p>

<h3 id="testing">Testing</h3>

<p>Testing is an integral part of Go, as one would expect of a language of our times. Beyond built-in support at the language and library level for automated testing, there are multiple tools for putting tests to use in various ways. A good starting point for testing in Go is the output of <code>go help test</code>. We can demonstrate Go testing facilities by adding a relatively useless test to our <code>myprinter</code> module. In terms of where to put the test code, our options would be either a separate directory, which would make matching code to tests very difficult, or having test files next to the code they exercise. The latter would put us in a difficult situation, since test code would have to be in the same namespace as functional code due to the one namespace per directory rule. In fact, Go allows a separate namespace for tests through a built-in exception. Any file that matches the pattern <code>*_test.go</code> is considered a test file. These files are excluded when normal application code is built. When you run <code>go test</code>, however, test files are compiled and linked against the application code. Test files can also have the package name <code>package_test</code>, where <code>package</code> is the package name of the application code. We can demonstrate this by putting the following into a file named <code>pathfinder_test.go</code> in the <code>pathfinder</code> directory of our mini-project:  </p>

<pre><code class="language-go">package pathfinder_test

import "testing"

func TestFind(t *testing.T) {  
    t.Fail()
}
</code></pre>

<p>If you now switch to the <code>pathfinder</code> subdirectory and run <code>go test</code>, you should see a report like the following.</p>

<pre><code>--- FAIL: TestFind (0.00s)
FAIL
exit status 1
FAIL    myprinter/pathfinder    0.002s
</code></pre>

<p>As with many other commands, <code>go test</code> will use the package in the current directory if no argument is supplied. If we wanted to run the failing test from the base directory, we would need to call the command as <code>go test myprinter/pathfinder</code>. What if we want to run all the tests in a project? One might expect <code>go test myprinter</code> to work, but that refers only to the myprinter base package; the way one can refer to all subpackages of a package is by using the ellipsis, as in <code>go test myprinter/...</code>.</p>

<p>There is an interesting feature of the go test runner. If you run the same non-failing tests consecutively without modifying relevant code, the tests will actually not be run; you will see a <code>(cached)</code> in the output next to the test's name. This is a great feature that lets you run all the tests of a package without unnecessary overhead, but in case you want to override the cache, you can enforce running them by using the <code>-count</code> option, as in <code>go test myprinter/... -count 1</code>. This option enables setting the exact number of times a set of tests is run.</p>

<p>A test that fails without a decent output is of course quite useless; we need assertions that provide more information. Go doesn't come with an assertions library, interestingly, but there are excellent third party alternatives. One widely used open-source package is <a href="https://github.com/stretchr/testify"><code>github.com/stretchr/testify/assert</code></a>. This library has many useful tools for writing better tests; you should definitely have a look at the readme. We can improve our test by asserting that <code>pathfinder.Find</code> does not return an empty string, which might be the case if the underlying <code>filepath.Abs</code> call fails:  </p>

<pre><code class="language-go">package pathfinder_test

import (  
    "github.com/stretchr/testify/assert"
    "myprinter/pathfinder"
    "testing"
)

func TestFind(t *testing.T) {  
    assert.NotEqual(t, "", pathfinder.Find())
}
</code></pre>

<h4 id="usefultestoptions">Useful test options</h4>

<p>The <code>go test</code> command has quite a few tricks up its sleeve, helping you get the most out of automated tests. The flags of <code>go test</code> are documented under <code>go help testflag</code>; don't be surprised if you can't find them under <code>go help test</code>. Among the arguments, the <code>count</code> argument was already mentioned; this is very useful when you are trying to debug intermittantly failing tests. If you want to run only a single test, you can use the <code>-run</code> option. This option accepts a regular expression and runs only the tests matching it. When you are running multiple tests, the test run will continue even if there are failing tests. You can override this behavior, and have the test run stop when a test fails, by supplying the <code>failfast</code> option.</p>

<p>Coverage analysis of tests are built into the test tool; you can enable it with the <code>-cover</code> flag. The coverage analysis tooling of Go is quite intricate and versatile; you can read the details in <a href="https://blog.golang.org/cover">this blog post</a> from the time of its release. Using only the <code>-cover</code> option will make go print the percentage of statements covered in the module targeted by a test file. If you want to get a detailed analysis of which lines were covered, you have to use the <code>-coverprofile</code> option to provide a filename in which coverage analysis will be saved. For example, we can do a coverage analysis of our pet project with the following command:</p>

<pre><code>go test ./... -coverprofile=cover.out
</code></pre>

<p>The resulting <code>cover.out</code> is a text file that can be turned into a nice HTML page using the command <code>go tool cover -html=cover.out</code>. Running this command will pop a browser window with a colorful display. Lines covered will be in green, whereas lines skipped will be in red. You can also see the exact number of times a line was called by running the test with <code>-covermode=count</code> option. When this option is used, the intensity of the green will actually change depending on how many times a line was executed; you can also see the exact count by hovering over a line. The default value of the <code>covermode</code> option is <code>set</code>, which records whether a line was run at all. The third and last option is <code>atomic</code>, which can be used in parallel tests, and which we will deal with in the second part of this tutorial.</p>

<p>You might wonder how tests are run, considering that Go is a compiled language and all code that runs must be packed into an executable. This is what Go does behind the curtains; tests are compiled in a per-package manner into executables in temporary directories and executed. You can achieve the same thing with the <code>-c</code> flag; in our module, if you switch to the <code>pathfinder</code> subdirectory and run <code>go test -c</code>, you should end up with an executable named <code>pathfinder.test</code>. This is not only useful, but pretty much necessary if you want to use a debugger (more on these later) to debug your tests.</p>

<p>One last useful option to <code>go test</code> worth mentioning here is <code>-race</code> that enables the built-in race detector. We will look at the concurrency features of Go in the second part; this option will be covered when the topic comes up.</p>

<p>There are two more areas handled by Go's <a href="https://golang.org/pkg/testing/"><code>testing</code></a> module: Benchmarking and example code. We will not go into the details of these here, but keep in mind that there is extensive support for these in the standard tools, and you don't need to roll out your own.</p>

<p><a id="orgab9c2eb"></a></p>

<h3 id="furthergotools">Further Go tools</h3>

<p>It is possible to speak of three levels of Go tools: The ones that are first-order subcommands of the go command, those that are available under <code>go tool</code>, and those that need to be installed with <code>go get</code>. We have dealt with those in the first group, such as <code>fmt, build, list</code> above already. The second group comprises a set of tools directed to more fine-grained compilation, analysis and debugging of Go programs. You can get a list if you run <code>go tool</code>. Covering all of these subcommands is beyond the scope of this tutorial, but you can have a quick look at the documentation for a command <code>CMD</code> with <code>go doc cmd/CMD</code>. Most of them are relevant for more involved work with the Go compiler and the language; we saw one, <code>go tool cover</code>, which can be used to convert coverage report output to html. Another important <code>go tool</code> subcommand is <code>pprof</code>, which is used for displaying profiling output.</p>

<h4 id="viewingdocumentationinthebrowserwithgodoc">Viewing documentation in the browser with godoc</h4>

<p>Among the third-party tools for working with Go code, a couple are very useful for daily work. The first of these is the <a href="https://godoc.org/golang.org/x/tools/cmd/godoc">godoc tool</a>, not to be confused with <code>go doc</code>. Whereas <code>go doc</code> prints documentation, godoc runs a server with documentation for all the packages that can be found in the standard library and installed modules. After installing it with <code>go get golang.org/x/tools/cmd/godoc</code>, you can start it with <code>$GOBIN/godoc -http=:6060</code>, the argument providing the location to listen at. If you now go to <a href="http://localhost:6060/">http://localhost:6060/</a> on your computer, you should see a web page looking very similar to the official Go documentation. This is way better than trying to figure out the right path to a symbol on the command line. Another useful feature of godoc is the <code>-index</code> flag that makes the documentation searchable. When called with this argument, a search box will be available on the top right of the page.</p>

<h4 id="goimports">goimports</h4>

<p>One tool I find very useful is <a href="https://godoc.org/golang.org/x/tools/cmd/goimports">goimports</a>, which makes it much easier to work with Go import statements. Because unused imports are an error in Go, one frequently needs to add and remove imports to a file as one tries things out. Particullary annoying is adding print statements with <code>fmt.Printf</code>, having the compiler tell you that you need to import it, removing the same statement after resolving the issue, and then having the compiler tell you that you now have an unused import. goimports is a tool that solves this issue by adding and removing the respective imports. After installing it with <code>go get golang.org/x/tools/cmd/goimports</code>, you can use it as a <code>gofmt</code> replacement, since it takes care of the imports in addition to running gofmt. In case you are using Emacs, integrating it with the default Go mode is as simple as setting it as the formatter with <code>(setq gofmt-command "path/to/goimports")</code>.</p>

<p>One remarkable thing here is that a tool like goimports is possible because the language is so simple and strict. In Python, for example, in order to figure out what a file imports, you pretty much have to execute it, as an import statement can happen anywhere. It is actually common practice to do an import within a function to beat circular imports, which are prohibited in Go. In Go, imports are allowed only in the header; that's why one can automate handling them, or create dependency graphs and analysis. That is, it was the thinking that went into the design of Go that allows tools like these to be written.</p>

<h4 id="errcheck">errcheck</h4>

<p>We saw an example of error handling in Go above: A function can return multiple values, one of which can be an error. The calling code has the responsibility to read this error value, and handle it accordingly. What frequently happens is that either the second return value is not read at all, by binding the return to a single variable, or it is bound to the blank identifier <code>_</code>. Although this might make sense in some contexts, you want to avoid it as much as possible in production code. <a href="https://github.com/kisielk/errcheck">errcheck</a> is a Go tool for detecting cases of error return values not being handled. You can install it with <code>go get github.com/kisielk/errcheck</code>, and call it in the same manner as other go tools, e.g. with <code>errcheck ./...</code> at the base of a module to check all packages. By default, errcheck will report only on the cases where the return value of a function with an error is not matched at all; passing in the <code>-blank</code> flag will make it also report cases where error values are matched to the blank identifier.</p>

<p><a id="org804d6c2"></a></p>

<h3 id="debugging">Debugging</h3>

<p>The last topic we will touch upon is debugging Go programs. A considerable subset of developers shun using debuggers, especially for compiled languages like Go or C, but mastering a debugger definitely pays back in reduced debugging time, even if you consider only the time spent adding new print statements and recompiling. Go does not include a built-in debugger, and opts for exporting debug symbols and providing lightweight support for <a href="https://www.gnu.org/software/gdb/">GDB, the GNU debugger</a>. Since the debug symbols are exported by default when building a go executable, you can start debugging our toy module with <code>gdb $GOBIN/myprinter</code>, once you have installed it. You will get a curious message when GDB starts; it will either tell you that a file name <code>runtime-gdb.py</code> has failed to load due to a configuration error, or that it has been loaded. This file, the only Python file in the Go source repository, is a <a href="https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html#Python">GDB extension</a> responsible for integrating Go types and concepts (such as goroutines) with GDB. If it could not be loaded, you can follow the directions in the initial output of GDB to enable it.</p>

<p>I will not go into the details of using GDB with Go; you can read up on it <a href="https://golang.org/doc/gdb">on the Go blog</a>. You will recognize, however, that even this post on the official Go blog recommends the third-party alternative Delve, instead of GDB. <a href="https://github.com/go-delve/delve">Delve</a>, a Go debugger written in Go, is in fact much easier to use, as it is integrated into the Go toolchain, and more complete. First, install it with <code>go get github.com/go-delve/delve/cmd/dlv</code>. To debug a Go executable, simply navigate to the main package directory (in our toy module the base directory) and run <code>$GOBIN/dlv debug</code>. You can also debug your tests, by switching into the appropriate directory and running <code>dlv test</code>. Once the debugger is started, there are a number of commands available. All frequently used commands have two forms: standard and a short alias. The most useful commands and their aliases are <code>break</code> (<code>b</code>) to set a breakpoint, <code>continue</code> (<code>c</code>) to continue execution until a breakpoint or termination, <code>next</code> (<code>n</code>) to execute one source line, <code>print</code> (<code>p</code>) to print the value of a variable, and <code>list</code> (<code>ls</code> or <code>l</code>) to show code. When you start delve with <code>dlv debug</code>, you land in the initialization of the executable; <code>list</code> will show you some go runtime C. You can land at the beginning of your program by setting a breakpoint there with <code>b main.main</code>, and continuing until the breakpoint with <code>c</code>. Delve will run the code until the start of the main function and print the surrounding code context. When debugging our toy module, you could for example enter <code>n</code> twice after the beginning of <code>main.main</code> is reached, landing at the line after <code>path</code> is set, and then print the value of this variable with <code>p path</code>. This is a very simple example of what delve can provide; I would highly recommend reading the <a href="https://github.com/go-delve/delve/blob/master/Documentation/cli/getting_started.md">getting started guide</a>, and going through the commands listed when you enter <code>help</code> at the delve console.</p>

<p>Both GDB and Delve are CLI debuggers. If you are more into visual debuggers, you can use one of the popular IDEs with Go integration, such as VSCode or Goland from JetBrains. Unfortunately, I'm not familiar with any of these, but a quick Google search shows that they can be used as debuggers for Go.</p>

<p><a id="org5afb9c3"></a></p>

<h2 id="inthenextepisode">In the next episode</h2>

<p>In this first part of this tutorial, we looked at the reasons Go was developed, the fundamental ideas behind its design, and the tooling packed mostly into the <code>go</code> command and some other third-party packages. You should now be ready to write, test, combine, debug and package Go code using these standard tools. In the next part of this tutorial, we will go into more detail of what kind of code to actually write.</p>

<p><a id="orgeb8223a"></a></p>

<h2 id="resources">Resources</h2>

<ul>
<li><p><a href="https://talks.golang.org/2012/splash.article">Go at Google: Language Design in the Service of Software Engineering</a> provides the reasoning behind the design of Go, the trade-offs and explicit non-goals. It's a great resource for understanding which problems the designers wanted to solve, and why they left certain things out. A similar, but less systematic text is <a href="https://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html">Less is exponentially more</a>, which details the very early driving guidelines and design decisions that went into Go.</p></li>
<li><p>The design of Go dependency management was discussed over a number of blog posts which are linked to on <a href="https://research.swtch.com/vgo">this page</a>; I would definitely recommend [??]. Once the design was finalized and implemented, it was announced over a couple of posts on the official Go blog; <a href="https://blog.golang.org/using-go-modules">the first post</a> has links to the further ones. These posts discuss practical aspects of working with modules and dependency management. If you want to read the intricate details and questions, these are discussed extensively on the <a href="https://github.com/golang/go/wiki/Modules">wiki of the go project</a> on Github.</p></li>
<li><p><a href="https://www.youtube.com/watch?v=uBjoTxosSys">Go tooling in action</a> is an excellent screencast on developing and improving Go code with standard and a couple of third-party tools, especially improving performance using pprof, go-torch and go-wrt. A nice display of how fast Go code, especially web service, can be made to perform.</p></li>
<li><p><a href="https://www.alexedwards.net/blog/an-overview-of-go-tooling">An Overview of Go's Tooling</a> is an excellent tutorial that covers a lot of similar ground to this post. I actually picked up quite a few tricks from it. It also covers a couple of topics such as compiler options and benchmarking that are not covered here.</p></li>
<li><p><a href="https://nullprogram.com/blog/2020/01/21/">Go's Tooling is an Undervalued Technology</a> is an enthusiastic look at a couple of aspects of Go tooling. It covers topics skipped in this post, such as vendoring.</p></li>
</ul>]]></content:encoded></item><item><title><![CDATA[Why Thinkpad X220 is the best laptop ever made]]></title><description><![CDATA[<p>I'm rather certain that you will come to the same conclusion, esteemed reader, as I did, namely that the Thinkpad X220 is the best laptop ever made, in consideration of the following points.</p>

<h4 id="1itcostme250euros">1 - It cost me 250 Euros</h4>

<p>You will have to accept that that's quite a bargain,</p>]]></description><link>http://okigiveup.net/why-thinkpad-x220-is-the-best-laptop-ever-made/</link><guid isPermaLink="false">c3fa4252-c8f8-4ebd-bc17-2d8633ee0687</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Sun, 10 May 2020 19:56:05 GMT</pubDate><content:encoded><![CDATA[<p>I'm rather certain that you will come to the same conclusion, esteemed reader, as I did, namely that the Thinkpad X220 is the best laptop ever made, in consideration of the following points.</p>

<h4 id="1itcostme250euros">1 - It cost me 250 Euros</h4>

<p>You will have to accept that that's quite a bargain, considering that a new laptop costs upwards of at least twice as much, and if you want something with a decent brand and maybe even a metal chassis, at least four times as much. Yes, I bought it used, but it has worked flawlessly since more than a year, it has what feels like a new keyboard, and with the addition of a 128 GB SSD that cost 20 bucks (how that is possible, I don't know), it actually freaking <em>does</em> stuff like run Emacs and a browser without melting down.</p>

<h4 id="2ithasthebestkeyboardintheworld">2 - It has the best keyboard in the world</h4>

<p>Have you ever typed on a X220 keyboard? If not, you should. This thing is heaven on earth. I stopped typing on my ergonomic mechanical clickity-clack keyboard just because of this thing. It has the <em>je ne sais quoi</em> of an old Thinkpad keyboard: robust, light, and I will have to admit, not clickity-clacky, but at least tippity-tap, in a rather pleasing way. It has a hardware volume up-down and mute button at the top, coupled with a blue, weird-looking ThinkVantage button that I mapped to Emacs. I'm pretty sure the designers intended it that way. If that is not enough, there is a big-ass escape key at the top left. This key is, like, the biggest escape I have ever seen, without exaggeration. Seriously. It's that big.</p>

<h4 id="3youcantwatchvideosonit">3 - You can't watch videos on it</h4>

<p>Twitter videos don't work at all, and any other video is way too taxing for the hardware. This is a feature. It's good. You won't get distracted.</p>

<h4 id="4youcanlistentomp3sonit">4. You can listen to MP3s on it</h4>

<p>Now is the time to ditch Spotify, that hog of memory and CPU, and go back to the late 90s vibe of MP3s. You can actually play music from the disc, and not have your computer heat up to the sun's surface temperature, would you believe that? Just throw in some winampy goodness with <a href="https://audacious-media-player.org/">audacious</a> and copy over a couple of gigabytes of pirated music from your college years (which should fall under the statute of limitations, right?), and you're good to go. The loudspeakers are crap, and there is no bluetooth (again, a feature!), but that's what the expensive headphones you bought two years before the Airpods came out are for.</p>

<h4 id="5thereisatinyflashlightonthelid">5. There is a tiny flashlight on the lid</h4>

<p>It comes with its own tiny cute flashlight to (I think) illuminate the keyboard, although it more illuminates the screen. You can turn it on and off with one keyboard shortcut, and it's the sweetest thing ever. When I'm bored, instead of watching videos, I turn the flashlight on and off. Much more entertaining.</p>

<h4 id="6ithasdifferentlyshapedports">6. It has differently shaped ports</h4>

<p>These days, if you get a new laptop, on the one side there are 2 of a certain port, and on the other side there are 3 of the same. Not on this bad mofo. It actually has 8 ports that all look different. USB, HDMI, audio,  VGA, you name it. I can't make any promises in terms of your adapter requirements, but damn it feels good to have a laptop with an actual CAT5 port (I hope that's what those creepy internet-from-telephone sockets are called). And what is that switch under the big-ass emptiness that looks like a 1980s disk drive? An actual physical flight-mode switch? Sweet!</p>

<h4 id="conclusion">Conclusion</h4>

<p>Seriously though, this computer is the perfect compromise between a work device and distraction monster. The keyboard is great, and the form factor is optimal for sticking it under your arm and setting off. Emacs and all kinds of compilers and runtimes work perfectly well on it, but various other distracting things like web video and news sites don't. If you want one, hit me up, I know a very good dealer.</p>]]></content:encoded></item><item><title><![CDATA[Discovering AWS with the CLI Part 2: ECS and Fargate]]></title><description><![CDATA[<p>In the <a href="http://okigiveup.net/discovering-aws-with-cli-part-1-basics/">first part of this tutorial</a>, we looked at provisioning AWS EC2 resources using the CLI client, and delved into the details of how various networking components function. In this second part, we will look at using containers instead of virtual machines to deploy applications. In the recent years,</p>]]></description><link>http://okigiveup.net/discovering-aws-with-the-cli-part-2-ecs-and-fargate/</link><guid isPermaLink="false">a64e77d9-ab03-4ca0-81dd-4e21fa6b376d</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Fri, 25 Oct 2019 09:13:58 GMT</pubDate><content:encoded><![CDATA[<p>In the <a href="http://okigiveup.net/discovering-aws-with-cli-part-1-basics/">first part of this tutorial</a>, we looked at provisioning AWS EC2 resources using the CLI client, and delved into the details of how various networking components function. In this second part, we will look at using containers instead of virtual machines to deploy applications. In the recent years, containers have become the predominant form of delivering server-side software, due to their versatility and limited resource use. Especially Docker has made it possible to package services and online applications so that they can be distributed from a central repository, and replicated with very little effort. ECS (Elastic Container Service) is AWS's entry into the container orchestration space, where other alternatives are Kubernetes, Mesos and the like. There are two different ways to use ECS: The old way, where you have to provision the computing resources manually, and the new way, where AWS is responsible for running the infrastructure. We will use the latter method, which is named Fargate.</p>

<p>As in the first part of the tutorial, we will be using the AWS CLI; you should install it and set up the necessary credentials and environment variables using the tips from the first post. In order to build the container images that will be deployed on Fargate, you will need Docker; it can be installed by following the <a href="https://docs.docker.com/install/linux/docker-ce/ubuntu/">standard installation instructions</a>. The necessary files for the container images and the demo applications are in <a href="https://github.com/afroisalreadyinu/aws-containers">this sample repo</a>. Finally, as with the first part, you can find all the commands in this tutorial in a <a href="https://github.com/afroisalreadyinu/aws-containers/blob/master/checkpoints-part-2.sh">bash script</a> in the same repository.</p>

<h2 id="organizationoffargate">Organization of Fargate</h2>

<p>As mentioned above, Fargate is a <em>launch type</em>, i.e. a method of deploying containers on ECS, the Elastic Container Service. In ECS, applications are deployed as tasks, which are collections of containers working together, similar to pods in Kubernetes, on clusters, groups of container and networking infrastructure that can spawn multiple AZs in a region (see <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html#welcome-features">here</a> for a diagram of how ECS clusters are organized). A set of tasks that are scheduled according to a scaling strategy, and on which load is distributed, is a service. There are two launch types on ECS. Fargate, the one we will use, offloads the management of computational resources to AWS, and leaves only the work of defining tasks and services, in addition to networking, to the user. The EC2 launch type, on the other hand, requires the user to create and manage the VMs on which the containers run.</p>

<p>An important component of ECS is the <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_agent.html">container agent</a>. This agent is installed on EC2 instances on which the tasks run, and is responsible for pulling, running and stopping the containers. When using the EC2 launch type, it's the user's duty to install and run the agent, but Fargate absolves the user of this task by automating it. You nevertheless need to be conscious of the fact that this agent is doing work for you in the background, however, as we will see later.</p>

<h3 id="preliminarycommands">Preliminary commands</h3>

<p>There are two things we need to take care of at the start. The first is picking a region. Many commands, such as creation of subnets or VPC endpoints, require the explicit specification of a region, which we would like to simplify by putting it into a variable, as in <code>REGION=eu-central-1</code>. The second preliminary is a bit more complicated. ECS uses a longer format for ARNs, which due to some reason makes it impossible to tag services. There is an option, however, which you need to opt in to, which enables this feature. You can opt in either using the web console (the <em>Account Settings</em> tab in the ECS service view), or by running the following command:</p>

<pre><code>aws ecs put-account-setting-default --name serviceLongArnFormat --value enabled
</code></pre>

<p><strong>Warning</strong>: This will set the option for all the IAM users on an account. If you don't want to do this, you should change it on the web console for the specific IAM user and use the API keys for that account in the rest of the tutorial.</p>

<h2 id="creatingarepository">Creating a repository</h2>

<p>Containers are distributed by building an image, and uploading it to a <em>container repository</em> from which they can be downloaded. Repositories on AWS are provided by ECR, the Elastic Container Registry (not <em>Repository</em>, since a registry is a collection of repositories). Each AWS account has <a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html">a single registry</a>, which can house many repositories; you can't delete this registry, or add any new registries. If you want to push images for a service, you need to create a repository for it. Let's go ahead and create a repository for the <code>static-app</code> (which is just Nginx with an index page, but I named it app due to some reason, and now it's too late to change) in the sample code repo:</p>

<pre><code>aws ecr create-repository --repository-name static-app \
  --tags Key=Environment,Value=Demo

STATICAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names static-app \
  --query "repositories[0].repositoryUri" --output text)
</code></pre>

<p>As you can see, we are sticking to the habit of setting the <code>Environment</code> tag to <code>Demo</code> for all our resources, as in the previous installment. The CLI also lets you log into the repository you just created without having to deal wih a complicated process, using the <code>ecr get-login</code> subcommand. The output of this command is itself a command you can use to log your docker client into the registry. You can avoid the extra copy-paste by executing the return value of this command, as follows:</p>

<pre><code>$(aws ecr get-login --region $REGION --no-include-email)
</code></pre>

<p>Be mindful of the <code>--no-include-email</code> option, as the command returned without it is not valid. Now it's time to build and push a container to this registry. In the directory of the <code>static-app</code>, there is a Dockerfile that you can use to create an image. Once you check out this repository, navigate to the directory <code>static-app</code>, and run the following commands:</p>

<pre><code>docker build -t $STATICAPPREPOURL:0.1 static-app/
docker push $STATICAPPREPOURL:0.1
</code></pre>

<p>We now need to deploy this image on Fargate. The first resource we need to create is a cluster. We will use the name <code>demo-cluster</code> for our cluster:</p>

<pre><code>aws ecs create-cluster --cluster-name demo-cluster --tags key=Environment,value=Demo
</code></pre>

<p>If you now run <code>aws ecs list-clusters</code>, it should show your brand new cluster as the only entry.</p>

<h3 id="iamrolefortheecsagent">IAM role for the ECS agent</h3>

<p>The ECS agent mentioned above needs to carry out certain operations in order to orchestrate the task containers. Among these are checking for images in the registry, downloading these images, and creating and piping to log streams (see <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html">here</a> for details). In order to give it the right permissions, we need to create the approriate IAM role, and give the ECS agent the permission to take on this role. Let's first create a role named <code>ecsTaskExecutionRole</code>, giving the ECS agent the right to take on this role:</p>

<pre><code>ROLEARN=$(aws iam create-role --role-name ecsTaskExecutionRole \
  --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"ecs-tasks.amazonaws.com\"]},\"Action\":[\"sts:AssumeRole\"]}]}" \
  --query "Role.Arn" --output text)
</code></pre>

<p>We will later use this role name in our task definitions. We now need to attach the right policy to this role. Fortunately, we don't have to manually create the policy, or attach the individual permissions one by one, since there is a policy managed by AWS that contains all the individual permissions. We will now get the ARN of this policy, named <code>AmazonECSTaskExecutionRolePolicy</code>, and attach it to the role we just created:</p>

<pre><code>POLICYARN=$(aws iam list-policies \
  --query 'Policies[?PolicyName==`AmazonECSTaskExecutionRolePolicy`].{ARN:Arn}' \
  --output text)
aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn $POLICYARN
</code></pre>

<h2 id="registeringataskdefinition">Registering a task definition</h2>

<p>Having pushed a container image, created a cluster, and given the cluster agent the right permissions, what we need to do next is create a task definition. A task definition is a JSON file that specifies which containers have to be deployed together as a unit, and on which ports these containers are listening. Here is a template that will serve as the base of our task definition for the <code>static-app</code> container (the file <code>static-app/task-definition.json.tmpl</code> in the sample repository):</p>

<pre><code>{
  "family": "static-app",
  "networkMode": "awsvpc",
  "executionRoleArn": "$ROLEARN",
  "containerDefinitions": [
    {
      "name": "static-app",
      "image": "$STATICAPPREPOURL:0.1",
      "portMappings": [
    {
      "containerPort": 8080,
      "hostPort": 8080,
      "protocol": "tcp"
    }
      ],
      "essential": true
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}
</code></pre>

<p>In this template, you need to either manually replace <code>$ROLEARN</code> and <code>$REPOURL</code> with the actual values, or use this file as a template, by first exporting the necessary values with <code>export ROLEARN STATICAPPREPOURL</code> on the command line, and then subsituting them with <code>envsubst &lt; static-app/task-definition.json.tmpl &gt; task-definition.json</code>. Now we are ready to create a task definition with the following command:</p>

<pre><code>TASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)
</code></pre>

<p>A couple of things worth pointing out in this task definition:</p>

<ul>
<li><p>The <code>networkMode</code> is <code>awsvpc</code>, which is an AWS-native implementation of container networking. <code>awsvpc</code> enables tasks to connect to the AWs networking infrastructure just like VMs over an elastic network interface (ENI), with the ability to give them private IPs and DNS entries. When using Fargate, the <code>networkMode</code> has to be specified as <code>awsvpc</code>.</p></li>
<li><p><code>containerPort</code> and the <code>hostPort</code> have to match because we are using awsvpc; see the section <em>Port mappings</em> in <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions">this part</a> of the documentation.</p></li>
<li><p>You can't use arbitrary values for <code>cpu</code> and <code>memory</code>. See <a href="https://docs.aws.amazon.com/AmazonECS/latest/userguide/task_definition_parameters.html#task_size">here</a> for the combinations of values that are allowed.</p></li>
<li><p>The <code>family</code> field is used to generate an index for the task definition versions. When the task definition is first created, it starts at version 1. Every request to register a task definition with the same family field will up this number by one, and this version number can be used when a service is created or updated. This numbering is also the reason we are saving the new task revision in a variable, so that we don't accidentally deploy old versions of our tasks.</p></li>
</ul>

<h2 id="creatingaservice">Creating a service</h2>

<p>A service is a group of tasks, managed by the container orchestration system (in our case Fargate). The tasks sit behind a common interface, and the incoming requests are distributed among them based on load and availability. Fargate, similar to other container orchestration systems, makes it easy to scale the number of tasks and dedicate resources. In order to turn our <code>static-app</code> into a service, we need to use the previously created task definition, specifying how to scale it, and route requests to it.</p>

<p>If you thought we would be able to navigate around the networking stuff from the first post, I'm sorry to disappoint you. The first thing we need to deal with to create an online service is networking infrastructure. Let's start with the VPC and its subnets:</p>

<pre><code>VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo

# We will need this later when we deploy services with DNS to our VPC
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support

SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.1.0/24 \
  --availability-zone "${REGION}b" \
  --query "Subnet.SubnetId" --output text)
SUBNET2ID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.2.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
PRIVATESUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.3.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $SUBNET2ID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $PRIVATESUBNETID --tags Key=Environment,Value=Demo
</code></pre>

<p>Here, we are laying down the networking infrastructure for the rest of the tutorial; this is the reason for creating two subnets. As we will see later, application load balancers require at least two subnets, hence the two public subnets. These two subnets also need to be in different availability zones for reliability; this is why we are distinguishing them using a single extra letter, as explained in the <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones">AWS documentation on regions and AZs</a>. The private subnet will be used to host an internal service. Now let's create a gateway, which we need for the communication between the services on the VPC and the rest of the Internet, as explained in the first part of this post:</p>

<pre><code>GATEWAYID=$(aws ec2 create-internet-gateway --query "InternetGateway.InternetGatewayId" \
  --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID --internet-gateway-id $GATEWAYID
</code></pre>

<p>Once we have the gateway, we need to modify the default route table to use it, and allow ingress to the network security group:</p>

<pre><code>ROUTETABLEID=$(aws ec2 create-route-table --vpc-id $VPCID \
  --query "RouteTable.RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID
aws ec2 associate-route-table  --subnet-id $SUBNETID --route-table-id $ROUTETABLEID
aws ec2 associate-route-table  --subnet-id $SUBNET2ID --route-table-id $ROUTETABLEID
SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 80 --cidr 0.0.0.0/0
</code></pre>

<p>Now that we have the necessary networking elements and security rules, we can go ahead and create our first service, based on the <code>simple-app</code> task definition:</p>

<pre><code>aws ecs create-service --cluster demo-cluster --service-name static-app-service \
  --task-definition static-app:$TASKREVISION --desired-count 1 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200 \
  --network-configuration "awsvpcConfiguration={subnets=[$SUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"ENABLED\"}"

aws ecs wait services-stable --cluster demo-cluster --services static-app-service
</code></pre>

<p>Let's go through some of the arguments:</p>

<ul>
<li><p>The launch type is <code>FARGATE</code>, which we also specified as a required compatibility in the task definition.</p></li>
<li><p>The <code>scheduling-strategy</code> argument lets us specify <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html">how tasks are instantiated and maintained</a>. The <code>REPLICA</code> strategy tells Fargate to keep <code>desired-count</code> (another argument) instances of the task running. We can increase or decrease this number as need be, and Fargate will take care of starting, stopping and (together with a load balancer, which we will see later) routing traffic to these tasks.</p></li>
<li><p>An important aspect of a container orchestration platform is how new containers are deployed. The <code>deployment-controller</code> and <code>deployment-configuration</code> arguments are how we specify the deployment strategy. The <a href="https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_DeploymentController.html"><code>ECS</code> deployment controller</a> is used for <em>rolling deployments</em>, in which new containers are started, and depending on whether these reach running state, old ones are stopped after draining connections to them. The numbers in <code>deployment-configuration</code> specify the percentage of new containers to start and old ones to stop at the same time. Refer to <a href="https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_DeploymentConfiguration.html">the documentation</a> for details.</p></li>
<li><p>The network configuration options, required for the <code>awsvpc</code> infrastructure, specify that the service should attach to one of the public subnets, run under the default security group, and receive a public IP.</p></li>
</ul>

<p>Once the service is created, we use the wait command, which we previously used to wait for an EC2 instance (albeit with <code>ec2 wait</code> instead of <code>ecs wait</code>), to wait for the service to be stable, i.e. for the number of running tasks to be equal to the number of desired tasks. Once this command returns, we can fetch the IP address of the service task with the following command:</p>

<pre><code>aws ec2 describe-network-interfaces --filters "Name=subnet-id,Values=$SUBNETID" \
  --query 'NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp' --output text
</code></pre>

<p>You should now be able to access this task at the resulting IP address. We can't yet call it a day, however. The way we are using ECS is suboptimal due to a number of reasons. Because each task gets a separate IP address, clients will need to know which task has which IP to make a request (assuming that our service does something useful, of course). Load balancing between multiple tasks of a service will be difficult, as the clients need to keep track of the IPs of the tasks. There is also a clear security risk, as all tasks would have public interfaces. We will adress these issues in the next section.</p>

<h2 id="microservicesonfargate">Microservices on Fargate</h2>

<p>What we want to achieve in this section is being able to use Fargate as a microservices platform. This involves the following features that are missing from our primitive, one-public-IP-per-task setup:</p>

<ul>
<li><p>Ingress configuration: Based on the request path, we want to be able to route requests to different services.</p></li>
<li><p>Load balancing: Both for public and private services, we want to distribute requests between the tasks in a manner independent of the client.</p></li>
<li><p>Internal DNS to implement <a href="https://www.nginx.com/blog/service-discovery-in-a-microservices-architecture/">service discovery</a>.</p></li>
</ul>

<h3 id="ingressandloadbalancingwithelb">Ingress and load balancing with ELB</h3>

<p>Thanks to awsvpc networking, it is very easy to <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html">connect an ELB instance</a> to a subnet, and assign task containers to it. The kind of load balancer we will use is called an <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html">application load balancer (ALB)</a>, which allows only HTTP and HTTPS traffic. Let's first scale down our <code>static-app</code> service to zero tasks and delete it, as it is too basic for this demonstration:</p>

<pre><code>aws ecs update-service --service static-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service static-app-service --cluster demo-cluster
aws ecs wait services-inactive --service static-app-service --cluster demo-cluster
</code></pre>

<p>An ALB is configured through three entities: Load balancer, target group and listener. The load balancer is the point of contact for the clients, and the target group gathers the target units (in our case tasks) that receive the requests. Listeners connect these two to each other, and are used to specify which conditions are used to route requests to which target groups. Now let's create these:</p>

<pre><code>LBARN=$(aws elbv2 create-load-balancer --tags Key=Environment,Value=Demo --name demo-balancer \
  --type application --subnets $SUBNETID $SUBNET2ID --security-groups $SECURITYGROUPID \
  --tags Key=Environment,Value=Demo \
  --query "LoadBalancers[0].LoadBalancerArn" --output text)

TGARN=$(aws elbv2 create-target-group --name hostname-app-tg \
  --protocol HTTP --port 80 --target-type ip --vpc-id $VPCID \
  --query "TargetGroups[0].TargetGroupArn" --output text)

aws elbv2 add-tags --resource-arns $TGARN --tags Key=Environment,Value=Demo

LISTENERARN=$(aws elbv2 create-listener --load-balancer-arn $LBARN --protocol HTTP \
  --port 80 --default-actions Type=forward,TargetGroupArn=$TGARN \
  --query "Listeners[0].ListenerArn" --output text)
</code></pre>

<p>We are not adding tags to the listener, as this is not supported. As already mentioned, load balancers require at least two subnets from different zones on creation, for reasons of reliability; we are using the two subnets we created in different AZs here. The target group we create is empty, and will be populated later by a new service. We will be using a different service for demo purposes in this section; you can find it <a href="https://github.com/afroisalreadyinu/aws-containers/tree/master/hostname-app">in the samples repo</a>. This service is called <code>hostname-app</code> because it displays the value of the <code>HOSTNAME</code> environment variable; we will see why this is relevant later. Another thing we will need is a security group for internal services through which we can control traffic between various parts and the internet. We will allow traffic between this security group and any interfaces on the VPC network:</p>

<pre><code>PRIVATESECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name private-security-group --description "Private SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16
</code></pre>

<p>Finally, we need to create a new container repository for this service, push an image, and create a task description:</p>

<pre><code>aws ecr create-repository --repository-name hostname-app \
  --tags Key=Environment,Value=Demo

HOSTNAMEAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names hostname-app \
  --query "repositories[0].repositoryUri" --output text)

docker build -t $HOSTNAMEAPPREPOURL:0.1 hostname-app/
docker push $HOSTNAMEAPPREPOURL:0.1
export ROLEARN HOSTNAMEAPPREPOURL
envsubst &lt; hostname-app/task-definition.json.tmpl &gt; task-definition.json

HNTASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)
</code></pre>

<h3 id="vpcendpoints">VPC Endpoints</h3>

<p>We can now create a service for the hostname app, which, unfortunately, is not going to be particulary successful. Let's go ahead and see why. Here is the command we need to create the service:</p>

<pre><code>aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo
</code></pre>

<p>We will go through the new arguments to the <code>create-service</code> command later, but first let's query the state of the task that is started by the Fargate agent for this service with the following commands:</p>

<pre><code>TASKARNS=$(aws ecs list-tasks --cluster demo-cluster \
  --service-name hostname-app-service --query "taskArns" --output text)
aws ecs describe-tasks --tasks $TASKARNS --cluster demo-cluster
</code></pre>

<p>If you do this a short time after the service is created, you will see an error message similar to the following in the field <code>tasks[0].containers[0].reason</code>:</p>

<pre><code>"CannotPullContainerError: Error response from daemon: Get https://$REPOID.ecr.eu-central-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)"
</code></pre>

<p>This error is caused by Fargate not being able to fetch the container images required for the task, because there is no network path to the ECR repository. When we deployed <code>static-app</code>, our tasks could communicate with the rest of the Internet in a straightforward manner, as they had public IPs. In the new layout, the tasks are on a private subnet, and can be contacted only through the load balancer. It is possible to solve this issue using a <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html">NAT (Network Address Translation) gateway</a>, but NAT gateways are relativelssy expensive, and require an elastic IP address. A better solution can be achieved using <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html">VPC endpoints</a>. What VPC endpoints essentially provide is that AWS services work as if they are part of a private subnet. There are two kinds of VPC endpoints: Interfaces and gateways. Interface endpoints function by creating an <em>endpoint network interface</em> in the specified subnets. Gateway endpoints, on the other hand, function by manipulating the route table of a VPC. Although there are two different kinds of endpoints, you as a user do not have much choice as to which to use for which service, since gateway endpoints have to be used for S3 and DynamoDB, and interfaces for the other services. We will therefore go ahead and create an interface VPC endpoint for ECR, and a gateway endpoint for S3, as container image layers are downloaded from S3. Bur first let's first delete the existing service:</p>

<pre><code>aws ecs update-service --service hostname-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service hostname-app-service --cluster demo-cluster
# This takes some time
aws ecs wait services-inactive --service hostname-app-service --cluster demo-cluster
</code></pre>

<p>In order to make sure we can isolate different pieces of our cluster security-wise, let's also create a separate security groop for the endpoints, and authorize ingress and egress between the private security group and this new group:</p>

<pre><code>ENDPOINTSECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name endpoint-security-group --description "VPC Endpoint SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $ENDPOINTSECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $PRIVATESECURITYGROUPID

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $ENDPOINTSECURITYGROUPID
</code></pre>

<p>Through these rules, we are allowing requests into the endpoints from the private services (on all ports here, but allowing port 80 for HTTP and 443 for HTTPS should be enough).</p>

<p>And now let's create the ECR and S3 VPC endpoints:</p>

<pre><code>ECRENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Interface" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.ecr.dkr" \
  --security-group-ids $ENDPOINTSECURITYGROUPID --subnet-id $PRIVATESUBNETID \
  --private-dns-enabled --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $ECRENDPOINTID --tags Key=Environment,Value=Demo

S3ENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Gateway" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.s3" \
  --route-table-ids $DEFAULTRTID $ROUTETABLEID \
  --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $S3ENDPOINTID --tags Key=Environment,Value=Demo
</code></pre>

<p>The ECR endpoint accepts a security group id argument, for which we use the default security group of the VPC. The S3 endpoint, on the other hand, does not accept such an argument. The question now is, how do we specify that requests from our private subnet to S3 are allowed? We can't use IP addresses, as we don't know the private IP which the S3 gateway is appointed. Security groups are not an option, as the gateway does not have on. The solution is using what are called prefix lists to specify a group of IP prefixes that point to the S3 endpoints the gateway will choose among. In the following, we first get the ID of the prefix list we are interested in using <code>aws ec2 describe-prefix-lists</code>, and then we allow requests to these IP addresses from our services using the <code>--ip-permissions</code> option of the <code>authorize-security-group-egress</code>:</p>

<pre><code>S3PREFIXLISTID=$(aws ec2 describe-prefix-lists --region $REGION \
  --query "PrefixLists[?PrefixListName == 'com.amazonaws.${REGION}.s3'].PrefixListId" \
  --output text)

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
    --ip-permissions IpProtocol=tcp,FromPort=0,ToPort=65535,PrefixListIds="[{Description=\"Why isnt this in the docs\",PrefixListId=${S3PREFIXLISTID}}]"
</code></pre>

<p>Afterwards, let's try to create the service once more, with the command repeated here for ease of reference:</p>

<pre><code>aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo

aws ecs wait services-stable --cluster demo-cluster --services hostname-app-service
</code></pre>

<p>Let's now go through the arguments to this command that differ from the previous one that created <code>static-app</code>:</p>

<ul>
<li><p>The desired count is this time 2. The service will create 2 tasks for us, and the incoming requests will be load balanced among these over the load balancer we created.</p></li>
<li><p>The network configuration this time around specifies that the network interface should be placed on the private subnet, and that public IP is disabled. Our container cannot make or receive requests to/from the rest of the internet, except for the AWS services for which we created VPC endpoints.</p></li>
<li><p>The additional argument <code>--load-balancers</code> specifies that the service bind to the load balancer target group created earlier. Here we are specifying that the containers named <code>hostname-app</code> (this should align with the <code>name</code> field in the task definition) should be contacted on port 8080, which is the port our app listens on.</p></li>
</ul>

<p>Once again, we are waiting for the service to reach a stable state where all tasks are running. Once this command has run through, we can fetch the URL of the load balancer, at which we can access the service, with the following command:</p>

<pre><code>aws elbv2 describe-load-balancers  --load-balancer-arns $LBARN \
  --query "LoadBalancers[0].DNSName" --output text
</code></pre>

<p>You should now see a page that displays the hostname of the task that responds to the request. If you reload the page, you should see the displayed hostname alternate between two options, as the consequent requests are rotated between two targets as per the <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#request-routing">round robin algorithm</a>. We can scale our service by changing the number of tasks using the <code>aws ecs update-service</code>. New tasks will be added to the service, or old ones removed, with the load balancer target group draining the connections from the removed ones, and rerouting alternatively to new tasks. Here is an example for reducing the number of tasks to one:</p>

<pre><code>aws ecs update-service --service hostname-app --cluster demo-cluster \
  --desired-count 1
</code></pre>

<h3 id="healthchecks">Health Checks</h3>

<p>One thing you have to pay attention to when creating the task and the load balancer is the health check option of the load balancer target group. Health checks are used by load balancers to determine which targets (in our case, containers, but it could also be VMs) are healthy, and should be routed requests to. The <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html">default health check</a> for ALBs is whether a <code>GET</code> request to the index (i.e. <code>/</code>) endpoint of the target returns a 200 response code. If your app does not respond to such a request in the expected manner, you can use the <a href="https://docs.aws.amazon.com/cli/latest/reference/elbv2/create-target-group.html">health check options</a> of the <code>create-target-group</code> subcommand to specify a more suitable one. A tricky issue to debug is when the app is configured to bind to localhost or <code>127.0.0.1</code> instead of <code>0.0.0.0</code>. When this is the case, the app will not respond to the requests on the host it is given by the Fargate agent, thus failing the health request checks. New instances of the same task will be created in a loop, without the service reaching stable status. So make sure that your app binds to the general <code>0.0.0.0</code> interface instead of the loopback interface.</p>

<h2 id="internaldnsandservicediscovery">Internal DNS and Service Discovery</h2>

<p>If we want to use Fargate as a microservice platform, we need a means to contact the tasks of a service on a private subnet under a single name for easy <a href="https://microservices.io/patterns/server-side-discovery.html">server-side service discovery</a>. To give an example, Kubernetes achieves this functionality by <a href="https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/">giving each service a DNS</a> that resolves to a cluster IP. This cluster IP is used to proxy connection requests to a service to one of the service pods <a href="https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies">at the node level</a>. The way to implement similar functionality on Fargate would be through the <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html">ECS service discovery API</a>, which uses Route 53 to create VPC-local DNS entries for services. In our demo of this functionality, we will use yet another app, the <a href="https://github.com/afroisalreadyinu/aws-containers/tree/master/random-quote-app"><code>random-quote-app</code></a>, which returns a random quote on programming a JSON. The <code>random-quote-app</code> will not have a public endpoint, in order to simulate microservices. <code>hostname-app</code> service has the route <code>/random-quote</code> which queries the <code>random-quote-app</code> and displays the result.</p>

<p>Commands for creating <code>random-quote-app</code> container registry and task definition are marginally different from the previous two services, so I will not repeat them here, and will instead focus on service discovery. The resources we need for DNS-based service discovery on Fargate are a namespace and a "service discovery service", a terrible name for a straightforward concept. A service discovery service is an ECS service that should be represented in the service discovery mechanism with a name. This name, plus the namespace, are used to resolve DNS queries to the IP address of a task that belongs to the service. In the following, we are first creating the namespace, and then the service discovery service that attacches to it:</p>

<pre><code>OPERATIONID=$(aws servicediscovery create-private-dns-namespace --name "local" \
 --vpc $VPCID --region $REGION --query "OperationId" --output text)

NAMESPACEID=$(aws servicediscovery get-operation --operation-id $OPERATIONID \
  --query "Operation.Targets[0].NAMESPACE" --output text)

RQSERVICEID=$(aws servicediscovery create-service --name random-quote \
  --dns-config "NamespaceId=\"${NAMESPACEID}\",DnsRecords=[{Type=\"A\",TTL=\"300\"}]" \
  --health-check-custom-config FailureThreshold=1 --region $REGION \
  --query "Service.Id" --output text)
</code></pre>

<p>The <code>--name</code> argument we supply to the <code>aws servicediscovery create-private-dns-namespace</code> command will be the top-level domain of the cluster DNS. Once we fetch the ID of the namespace with the second command, we can use it to create DNS for our service with <code>aws servicediscovery create-service</code>. The <code>--name</code> argument to this command determines how to refer to the service. Once this second command has ran, any DNS queries to <code>random-quote.local</code> from within the VPC will resolve to up to eight instances of <code>random-quote-app</code>. You should now be able to go to <code>${LBURN}/random-quote/</code> and see a random quote on programming. As you can see in the <a href="https://github.com/afroisalreadyinu/aws-containers/blob/master/hostname-app/app.py#L31">app code</a>, <code>hostname-app</code> uses the URL <code>http://random-qoute.local:8080</code> to contact the <code>random-quote-app</code> and fetch the quote. The port has to be included in the request, because the task to which the DNS resolves is contacted directly, without a load balancer in between.</p>

<h2 id="conclusion">Conclusion</h2>

<p>As mentioned in the introduction to the first part of this tutorial, the command line client for AWS can be quite useful for discovering what AWS has to offer. Once the going gets tough, however, and numerous AWS services and complicated security and network resources are involved, it gets quite difficult to keep track of the various commands and the minute ways they differ from each other. In another context, I have had the opportunity to implement a very similar microservice architecture, using Terraform, a tool much better suited to provisioning dependent and highly-connected cloud resources. It was a much better experience, and I would say that beyond simple things, and the occasional tricky feature that cannot be implemented with another tool, the CLI should be limited only to discovery and prototyping. That said, I hope this tutorial helped you to understand Fargate and the other relevant AWS components better.</p>

<h2 id="resources">Resources</h2>

<ul>
<li><p><a href="https://dev.to/diogoaurelio/container-orchestration-in-aws-comparing-ecs-fargate-and-eks-56d1">This blog post</a> gives an overview of the advantages of Fargate over ECS.</p></li>
<li><p><a href="https://aws.amazon.com/blogs/compute/task-networking-in-aws-fargate/">This blog post</a> from the AWS team explains the nitty gritty details of container networking in Fargate.</p></li>
<li><p>Another <a href="https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/">blog post</a> from AWS, this one explaining how to create a service registry for a Fargate cluster.</p></li>
<li><p>A <a href="https://noise.getoto.net/2019/01/25/setting-up-aws-privatelink-for-aws-fargate-amazon-ecs-and-amazon-ecr/">detailed tutorial</a> on connecting ECR to Fargate using VPC endpoints.</p></li>
<li><p><a href="https://www.youtube.com/watch?v=IEvLkwdFgnU">Deep Dive into AWS Fargate</a> is a talk from 2018 that contains a nice overview of Fargate as compared to ECS and standard EC2, with a demo that uses CloudFormation.</p></li>
</ul>]]></content:encoded></item><item><title><![CDATA[Discovering AWS with the CLI Part 1: Networking and Virtual Machines]]></title><description><![CDATA[<p>Recently, I started working on moving an application that was deployed manually to an AWS EC2 instance to a more modern, infrastructure-as-code setup. This gave me the chance to dive deeper into AWS concepts, and play around with the various services. There are numerous ways to use the AWS API:</p>]]></description><link>http://okigiveup.net/discovering-aws-with-cli-part-1-basics/</link><guid isPermaLink="false">73bf58a8-6c83-4e4b-8b13-4ef7e6637f6b</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Tue, 27 Aug 2019 12:29:53 GMT</pubDate><content:encoded><![CDATA[<p>Recently, I started working on moving an application that was deployed manually to an AWS EC2 instance to a more modern, infrastructure-as-code setup. This gave me the chance to dive deeper into AWS concepts, and play around with the various services. There are numerous ways to use the AWS API: On top of the standard tools offered by Amazon, such as the web GUI, CLI client, client packages for a number of languages and CloudFormation, there are various third party tools, such as Terraform and Ansible. Pretty much every other tutorial or book on AWS is a click-through in the web UI, but neither the pedagogic effect nor the resulting programmatic output is optimal: It cannot be reproduced, and when you want to go over it, you need to recall where the hell you clicked, and which values had to be same or related to each other. I found the CLI client to be a much better alternative, because you can linearly follow what has to happen when, and how things connect to each other. You can also use the resulting code for actual productive orchestration work. This tutorial documents what I found out about getting the most out of the CLI client, and how one can use it to understand and discover AWS concepts.</p>

<p>If you don't want to copy-paste all the commands, you can check out the <a href="https://github.com/afroisalreadyinu/aws-containers">samples repository</a>, which I will refer to extensively in part 2,and use <a href="https://github.com/afroisalreadyinu/aws-containers/blob/master/checkpoints-part-1.sh"><code>checkpoints-part-1.sh</code> file</a> which bundles all examples into on script. This script notificies the user at the different checkpoints of the current location, and the execution will pause. You can then inspect the state on the AWS console, or run commands in another shell.</p>

<h3 id="installationandconfigurationofawscli">Installation and Configuration of awscli</h3>

<p>The AWS CLI client is delivered as a Python package named <a href="https://pypi.org/project/awscli/">awscli</a>. As such, the easiest way to install it is to use pip, with <code>pip install awscli</code>. Once you have installed it, you need to register you access keys, which can be done with <code>aws configure</code> command. You can add additional profiles with the <code>--profile</code> argument, and you can also rerun the command if you want to change something, such as the default region. When you use the command and want to specify a certain profile, you can either use the `&#x2013;profile` argument, or set the environment variable <code>AWS_DEFAULT_PROFILE</code>. The same thing is valid for region; you can either pass the argument <code>--region</code>, or export it as <code>AWS_DEFAULT_REGION</code>. If you are ever in doubt of who you are logged in as, you can simply issue the command <code>aws iam get-user</code>, which will show you your username and user ARN.</p>

<h3 id="generalusage">General usage</h3>

<p>The AWS CLI accepts combinations of commands, with the first command being something like the namespace. The default output format is JSON, and you can manipulate this output using <a href="http://jmespath.org/">JMESPath</a> notation. It is also possible to print the output as a table or in plain text; the former is rarely used, but the latter is necessary if you want to use the output as input for other commands. A really useful feature is autocompletion, which provides a quick means to search among the many namespaces and subcommands. In order to enable autocompletion, you need to specify that the command <code>aws_completer</code> needs to be used to complete the command <code>aws</code>, which can be done with the following:</p>

<pre><code>complete -C "$(which aws_completer)" aws
</code></pre>

<p>Now, tabbing should help you find stuff. More details can be found <a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-completion.html">here</a>.</p>

<h4 id="uploadingansshkey">Uploading an SSH key</h4>

<p>We will be creating EC2 instances in the following, and you will need an SSH key to access them. The way this works on AWS is that you upload your public key with a name, and then specify, on creation, that a VM should be accessible with that key. Creating an SSH key is very easy, as famously documented on the <a href="https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent">Github documentation</a>. Once you have created one, you can add it to the available keys on AWS with the following command:</p>

<pre><code>aws ec2 import-key-pair --key-name brand-new-key \
    --public-key-material file://~/.ssh/id_rsa.pub
</code></pre>

<p>You can later refer to this key as <code>brand-new-key</code> and use it to SSH into your VMs.</p>

<h4 id="resourcegroupsandtags">Resource groups and tags</h4>

<p>It is possible to gather AWS resources under <em>resource groups</em>, which enables certain bulk features such as monitoring costs or gathering logs. Unfortunately, deleting resources is not among those features, at least not using the web console or the CLI. A resource group is created by specifying a query that will match resources based on tags. If we want resource groups to be based on the value of the <code>Environment</code> tag, for example (tag names are by convention capitalized), we need to create the resource group <code>demo-environment</code> with the following command:</p>

<pre><code>aws resource-groups create-group \
    --name DemoEnvironment \
    --resource-query '{"Type":"TAG_FILTERS_1_0", "Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Environment\", \"Values\":[\"Demo\"]}]}"}'
</code></pre>

<p>As you can see, the format is really god awful. The use of tags on the command line (and also on the web console, for that matter) is complicated considerably by the non-uniform application of tags to resources. Some resources (such as EC2 instances) accept a tag on creation, whereas others can be tagged only once they are are created; you can see a detailed list <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-resources">here</a>. Resource groups are still rather useful, however, due to which reason we will tag all the resources we create. We will see later how to create resource as a part of the <code>DemoEnvironment</code>.</p>

<h3 id="vmandvpctheheartofaws">VM and VPC, the heart of AWS</h3>

<p>The heart of AWS is EC2 service, the Elastic Compute Cloud. It provides the means to create, organize, access and interface to scalable computing infrastructure, the infamous EC2 instances. Under the covers, AWS uses EC2 to run the rest of its own services. EC2 instances are just a part of the puzzle, though. You will be dealing even more often with the networking components of EC2, especially with virtual private clouds (VPC). Nearly every resource on AWS is connected to a VPC and a subnet, either directly or with at most one hop. A VPC is a logically isolated network which separates your AWS resources from the rest of AWS, while subnets are tools for finer control of how these resources communicate with each other, and with the internet. Your account comes with a default VPC; if you don't supply the VPC argument, the resource will be created in this default VPC. Here is how to get the default VPC's ID:</p>

<pre><code>DEFAULTVPCID="$(aws ec2 describe-vpcs \
    --filter "Name=isDefault, Values=true" \
    --query "Vpcs[0].VpcId" --output text)"
</code></pre>

<p>As you can see, there is no separate namespace for VPC subcommands; they are in the EC2 namespace. Also, we used the <code>--query</code> argument, which can be added to any command to print a specific part of the response JSON. Here we use it to print the ID of the new network; we also pass it the option <code>--output text</code> to get the ID as simple text instead of a JSON string. Talking of the default VPC is a bit misleading; it's more like the <em>default networking infrastructure</em>, as there are a couple of other things attached to this VPC that make it special. The first part of this structure is the subnets. We can print the subnets of the default VPC with the following query:</p>

<pre><code>aws ec2 describe-subnets --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"
</code></pre>

<p>This should print a number of subnets; in my case it's 3. One field of significance is the <code>AvailabilityZone</code>. It should be easy to see that each subnet has a different value, but they are all in the same region (for my region <code>eu-central-1</code>, the availability zones are <code>eu-central-1a</code> to <code>1c</code>). A VPC created in a region will logically span all the availability zones (AZ) in that region. A subnet, on the other hand, is specific to a single AZ. You can also list the network interfaces, which are the entities through which computational resources connect to the network, with the following command:</p>

<pre><code>aws ec2 describe-network-interfaces --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"
</code></pre>

<p>One situation where this command comes in handy is figuring out which resources to first delete when you are trying to delete a VPC. When it comes to the dependency graph, VPCs are pretty much at the top of the (top-down) tree. You cannot delete them unless all the other, non-default resources are also removed or detached.</p>

<h4 id="creatingconnectingandinstantiatingresourcesinvpcandsubnets">Creating, connecting and instantiating resources in VPC and subnets</h4>

<p>If you want to create multiple isolated resource groups, keep control over which resources can access which others, and generally understand how to connect various other AWS things like RDS databases, you will need to deal with VPC's. Let's begin this process with creating one such VPC:</p>

<pre><code>VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
    --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo
</code></pre>

<p>The <code>--cidr-block</code> required argument specifies what IP range will be valid within the VPC. This uses the CIDR format, with the suffix <code>/16</code> specifying how many bits <em>from the beginning</em> constitute the network mask; our VPC will be able to hand out and route between IPs from <code>10.0.0.0</code> to <code>10.0.255.255</code>, that is, <code>256*256 = 65536</code> IPs in total. Once this command runs, you should see two results in the output of <code>aws ec2 describe-vpcs</code>: The default VPC, and the new one you created just now. You can also see the new VPC in the list of resources for our new resource group with the command <code>aws resource-groups list-group-resources --group-name DemoEnvironment</code>. A VPC is not enough information for AWS to figure out the networking topology, however: We need a subnet. The subnet needs to have a CIDR block that's a subset of the VPC's. Now let's create one with the following command:</p>

<pre><code>SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID \
  --cidr-block 10.0.1.0/24 \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo
</code></pre>

<p>As you can see in the <code>--cidr-block</code> argument, this subnet covers IPs in the ranges from <code>10.0.1.0</code> to <code>10.0.1.255</code>, which is a part of the IPs covered by the VPC. Once we have the subnet, we can go ahead and create our first EC2 instance attached to it. In order to do so, we first need the ID of a proper AMI. I used the following command to list the official Ubuntu AMI's, and picked the newest one:</p>

<pre><code>AMIID=$(aws ec2 describe-images \
  --filters "Name=root-device-type,Values=ebs" \
  "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*" \
  "Name=architecture,Values=x86_64" \
  --query "reverse(sort_by(Images, &amp;CreationDate)) | [?! ProductCodes] | [0].ImageId" \
  --output text)
</code></pre>

<p>The reason for the complicated query argument is that we don't want the AMIs that are in the <a href="https://docs.aws.amazon.com/marketplace/latest/userguide/ami-products.html">AMI marketplace</a>, and one needs to pay for, or agree to license for. Now let's start an EC2 instance with the AMI the above command picked (as of 11.08.2019, this is <code>ami-0ac05733838eabc06</code>):</p>

<pre><code>aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' --subnet-id $SUBNETID
</code></pre>

<p>This instance gets loaded with the SSH key that we uploaded earlier, named <code>brand-new-key</code>. It also gets the same environment tag, but with the convenience of adding it in the creation command, making a second command unnecessary. The <code>--subnet-id</code> argument specifies which subnet the networking interface should connect to. If we hadn't specified this, a subnet in the default VPC would have been picked. We now have a functioning VM, whose status we can query by listing through the resource group, and querying for the instance ID:</p>

<pre><code>INSTANCEID=$(aws ec2 describe-instances \
  --filter "Name=tag:Environment,Values=Demo" \
  --query "reverse(sort_by(Reservations, &amp;Instances[0].LaunchTime)) | [0].Instances[0].InstanceId" \
  --output text)
</code></pre>

<p>The query part of this command is again relatively complicated. The reason is that, if you create a couple of VMs and terminate them, they will still appear in the list of VMs when searched by tag. That's the reason we pick the VM that was last launched. When you create an instance and would like to know when it is actually running, you can use the handy <code>wait</code> feature, as follows:</p>

<pre><code>aws ec2 wait instance-running --instance-ids $INSTANCEID
</code></pre>

<p>Now if we run the command to list group resources, we should see three entries: A VPC, a subnet and an instance. If you inspect the EC2 instance with <code>aws ec2 describe-instance $INSTANCEID</code>, you can see a couple of fields that are interesting. There's the ID of course, and <code>PrivateDnsName</code>, but peculiarly no public IP or DNS. This is because the subnet was not configured to give this instances an IP address on launch; you can see that this is so in the <code>MapPublicIpOnLaunch</code> field of the subnet we created, which is false. The instance we created is in a vacuum, as far as we are concerned, and cannot be contacted from anywhere. You can also see this by right clicking on the instance in the web GUI, and clicking connect. AWS will ask you to pick a method out of SSH client, web SSH client, or Java SSH client. Interestingly, the first of these shows the private IP of this instance (something like <code>10.0.1.12</code>), which is in the reserved range and cannot be used for internetworking. If you pick the second option, you will see an error message telling you that the instance does not have a public IP.</p>

<h4 id="openingasubnettotheouterworld">Opening a subnet to the outer world</h4>

<p>We need to modify and extend our basic subnet in two ways in order for the instances connected to it to communicate with the internet. The first is a gateway. An <a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html">internet gateway</a> acts as a target for internet-routable traffic, and takes care of NAT (Network Address Translation). You should not confuse an internet gateway with a <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html">NAT gateway</a>: The latter is used to connect instances in private subnets to the internet, while they are still unavailable to traffic from the outside. The default VPC has an internet gateway, as you would expect:</p>

<pre><code>DEFAULTGATEWAY=$(aws ec2 describe-internet-gateways \
  --filters "Name=attachment.vpc-id,Values=$DEFAULTVPCID" \
  --query "InternetGateways[0].InternetGatewayId" --output text)
echo $DEFAULTGATEWAY
</code></pre>

<p>This should print the ID of the gateway used by the default VPC. A gateway is not automatically created for a VPC, however. Our new VPC is lacking one, which we can see using the following command:</p>

<pre><code>aws ec2 describe-internet-gateways --filters \
  "Name=attachment.vpc-id,Values=$VPCID"
</code></pre>

<p>This should return an empty list. We can create a brand new gateway for our VPC with the following commands:</p>

<pre><code>GATEWAYID=$(aws ec2 create-internet-gateway --query \ 
  "InternetGateway.InternetGatewayId" --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID \
  --internet-gateway-id $GATEWAYID
</code></pre>

<p>Now we have a gateway that is attached to our VPC. The next thing we need is a means for the networking logic to route the requests that are meant for the internet through this gateway. This is the job of the route table. Every VPC comes with a default route table (see <a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html">here</a> for details). We can see how these rules look by first looking at the settings for the default VPC and its subnets:</p>

<pre><code>aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$DEFAULTVPCID"
</code></pre>

<p>In the <code>Routes</code> entry of the resulting output, you should be able to see two entries. The first of these has the field <code>DestinationCidrBlock</code> set to <code>172.31.0.0/16</code>, which is the CIDR of the VPC itself (you can verify this with the command <code>aws ec2 describe-vpcs --vpc-id $DEFAULTVPCID --query "Vpcs[0].CidrBlock"</code>). The <code>GatewayId</code> of this rule is <code>local</code>, meaning that it will route traffic locally. The second rule has <code>0.0.0.0/0</code> as <code>DestinationCidrBlock</code>, and its <code>GatewayId</code> is equal to the <code>DEFAULTGATEWAY</code>. Since the rules in a routing table take precedence in order of specifity, this second rule will be valid for all requests that are not meant for the VPC IP range. Since, as mentioned above, every VPC has a route table, we do not need to create a new one, and can instead modify the existing route table:</p>

<pre><code>ROUTETABLEID=$(aws ec2 describe-route-tables \
  --filter "Name=vpc-id,Values=$VPCID" \
  --query "RouteTables[0].RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID \
  --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID
</code></pre>

<p>With the last <code>create-route</code> command, we are telling the network to route requests that are not to an interface in the VPC to the gateway defined by the <code>GATEWAYID</code>. As we are modifying the default route table, there is no need to explicitly associate the route table with the subnets of the VPC which we want to make public, because in the absence of explicit associations, subnets use the default route table. This association is also not displayed in the result of <code>aws ec2 describe-route-tables</code>, which is the reason we cannot demo it for the default network. If it were the case that we were creating a new routing table, however, the following command would have been necessary for such an association:</p>

<pre><code>aws ec2 associate-route-table  --subnet-id $SUBNETID \
  --route-table-id $ROUTETABLEID
</code></pre>

<p>One last step is necessary to make sure that the instances we start in the subnet are getting public IPs. The following will modify the subnet to make sure that is the case:</p>

<pre><code>aws ec2 modify-subnet-attribute --subnet-id $SUBNETID \
  --map-public-ip-on-launch
</code></pre>

<p>Normally (as in, in most cases, and definitely for the default VPC), an instance that gets a public IP address is also given a public DNS; this public DNS of an instance can be queried through the <code>PublicDnsName</code> field. Sometimes, however (the <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames">documentation</a> is not clear on when and how), the relevant fields on the VPC are not set properly on creation. In order to make sure that your instance gets not only an IP address but also a DNS, you should to set the proper configuration values with the following commands:</p>

<pre><code>aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support
</code></pre>

<p>As far as I can understand from <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html">the documentation</a>, it is not possible to attach a public IP to a running instance from the subnet pool. You can use an <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html">Elastic IP</a>, but that's out of scope for this post. Instead, we will simply delete the running instance, and create a new one:</p>

<pre><code>aws ec2 terminate-instances --instance-ids $INSTANCEID
INSTANCEID=$(aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' \
    --subnet-id $SUBNETID --query "Instances[0].InstanceId" --output text)
aws ec2 wait instance-running --instance-ids $INSTANCEID
</code></pre>

<p>Let's check whether our instance now has a public IP address and DNS:</p>

<pre><code>IPADDRESS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicIpAddress" --output text)
PUBLICDNS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicDnsName" --output text)
</code></pre>

<p><code>IPADDRESS</code> should now be a proper IP address, and <code>PUBLICDNS</code> should be a URL that resolves to that IP address. Since we already waited for the instance to start, you can, at least in principle, contact it via SSH with <code>ssh ubuntu@$IPADDRESS</code> or <code>ssh ubuntu@$PUBLICDNS</code>. If you try this now, however, you will again face an empty line, without a response from the new server. The reason for this silence is that the default security rules do not allow inbound traffic to this instance. AWS security groups are means of controlling the traffic between EC2 instances and the internet. A new VPC has a default security group, which also has default rules. These default rules allow all outgoing connections (and the incoming responses these cause), and all connections between instances in the same security group, but nothing else. Since we did not create a new security group (which we could have done with <code>aws ec2 create-security-group</code>), the new instance has been automatically connected to the default security group of the VPC. All is not lost, though: If we change the rules for the security group, it will be instantly applied to any new requests. Let's modify the security group rules, and allow TCP connections from all IP addresses on the default SSH port:</p>

<pre><code>SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 22 --cidr 0.0.0.0/0
</code></pre>

<p>See <a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html">here</a> for more on security groups. Now you should be able to access the VM on the public IP address or DNS.</p>

<h3 id="cleanup">Cleanup</h3>

<p>Cleaning up is relatively straightforward if you have access to the shell session with the variables that store the resource IDs. Remove all the AWS resources we created with the following commands:</p>

<pre><code>aws ec2 terminate-instances --instance-ids $INSTANCEID
aws ec2 delete-key-pair --key-name brand-new-key
aws ec2 detach-internet-gateway --internet-gateway-id $GATEWAYID \
  --vpc-id $VPCID
aws ec2 delete-internet-gateway --internet-gateway-id $GATEWAYID
aws ec2 delete-subnet --subnet-id $SUBNETID
aws ec2 delete-vpc --vpc-id $VPCID
aws resource-groups delete-group --group-name DemoEnvironment
</code></pre>

<p>You have to delete resources in this order, otherwise AWS will tell you that dependencies are being violated. If you don't have access to the IDs, you can either query the individual elements via the CLI using the <code>Environment</code> tag, or copy the IDs from the result of <code>aws resource-groups list-group-resources</code>. Unfortunately, as mentioned above, there is no easy command to delete all resources in a resource group. Even worse, there is no way to delete resources by ARN, which is the identifier output of this last command.</p>

<h3 id="conclusion">Conclusion</h3>

<p>The AWS CLI client is, as one would expect from the company that builds AWS, a solid piece of software. As you might have noticed from the command examples, there are some inconsistencies, such as differing names for the same kinds of arguments, or the issue with tags, but I think this is the least one would expect from a client that has to cover such a massive base of functionality. In the second part of this tutorial, we will be looking at creating a Fargate cluster using the CLI. The requirements will get more complicated as we try to create a scalable, decoupled application, and we will use many other AWS services to tackle them.</p>]]></content:encoded></item><item><title><![CDATA[An Introduction to Cython, the Secret Python Extension with Superpowers]]></title><description><![CDATA[<p>Cython is one of the best kept secrets of Python. It extends Python in a direction that addresses many of the shortcomings of the language and the platform, such as execution speed, GIL-free concurrency, absence of type checking and not creating an executable. It is a mature tool with a</p>]]></description><link>http://okigiveup.net/an-introduction-to-cython/</link><guid isPermaLink="false">e6734158-c05c-4da3-a14c-18a2dbfdb640</guid><category><![CDATA[python]]></category><category><![CDATA[cython]]></category><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Thu, 21 Feb 2019 14:09:55 GMT</pubDate><content:encoded><![CDATA[<p>Cython is one of the best kept secrets of Python. It extends Python in a direction that addresses many of the shortcomings of the language and the platform, such as execution speed, GIL-free concurrency, absence of type checking and not creating an executable. It is a mature tool with a number of widely used packages that are written in it, such as <a href="https://github.com/explosion/spaCy">spaCy</a>, <a href="https://github.com/MagicStack/uvloop">uvloop</a>, and significant parts of <a href="https://github.com/scikit-learn/scikit-learn">scikit-learn</a>, <a href="http://www.numpy.org/">Numpy</a> and <a href="https://pandas.pydata.org/">Pandas</a>. It smoothly hooks into the latter two, giving you access to underlying data structures in a straightforward way. All these superpowers come with the baggage of certain parts of C, however, which makes becoming proficient in Cython a bit steep for those who don't know C. In this tutorial, I will give an overview of working with Cython, focusing on the parts of C that are relevant.</p>

<p>The requirements for running the code samples in this tutorial are Python 3 (preferably 3.6), a C compiler (GCC or Clang should do), and virtualenv or pipenv. Once you have these, getting Cython is as easy as creating a pipenv or a virtualenv, and within that environment, running <code>pip install Cython</code>.</p>

<h3 id="whatiscythonandwhatisitnot">What is Cython, and what is it not?</h3>

<p>Cython is built on the fact that Python the language (or at least the CPython implementation of it) is built on top of a C API that is instrumented through an intermediate language. The way CPython works is by compiling Python code into a <a href="https://docs.python.org/3.6/glossary.html#term-bytecode">bytecode</a> representation, and then executing the result on the virtual machine at runtime. Individual instructions of this bytecode consist of an opcode and a reference to any arguments. You can see the existing opcodes in the file <a href="https://github.com/python/cpython/blob/v3.6.8/Include/opcode.h">opcode.h</a>. In the main evaluation loop of the runtime, the opcode of an instruction is used to determine what to do next, in the form of <a href="https://github.com/python/cpython/blob/v3.6.8/Python/ceval.c#L2570">one big switch statement</a>. For example, if the opcode is <code>BUILD_LIST</code>, specifying the construction of a new list, the <a href="https://github.com/python/cpython/blob/v3.6.8/Python/ceval.c#L2570"><code>PyList_New</code> function is called</a> with the appropriate arguments. Obviously, there is a great deal of work the Python runtime is doing around this evaluation loop, such as garbage collection and error handling. Another large piece of the runtime work pertains to the dynamic nature of Python, where the actual methods to run have to be figured out at runtime, based on the object structure available. For example, when you try to multiply two numbers, Python has to figure out whether these are floats, integers etc., or the multiplication operator has been overloaded as with the string type. In many other languages, explicit type information helps the compiler figure this out at compile time, leading to faster compiled code. This distinction is known as <a href="https://cython.readthedocs.io/en/latest/src/userguide/early_binding_for_speed.html">early vs. late binding</a>, and is the source of one of the major performance gains one can achieve by using Cython.</p>

<p>Cython makes use of the architectural organization of Python by translating (or 'transpiling', as it is now called) a Python file into the C equivalent of what the Python runtime would be doing, and compiling this into machine code - this can be a Python extension which can be dynamically loaded, or an actual executable. The resulting module makes calls to the Python runtime in order to deal with things like above mentioned dispatch, which means that straightforward Python code will not be executing much faster than it would anyway. The speed difference becomes significant when you code using Cython-specific constructs that are transpiled directly to their C equivalents, thereby avoiding the Python runtime. We will see what these constructs are in a minute, but let's start with a simple example to get used to working with the Cython toolchain.</p>

<h3 id="startingoff">Starting off</h3>

<p>The first thing you need to do is to install Cython, obviously. Please refer to the <a href="https://cython.readthedocs.io/en/latest/src/quickstart/install.html">official Cython documentation</a> for the installation instructions. You will also need a C compiler; GCC on Linux and Clang on Mac should do. Once you have have these two, we can start with the usual hello world. Save the following in a file named <code>hello_world.pyx</code> (or simply clone and use the <a href="https://github.com/afroisalreadyinu/cython-samples">sample code repository</a>):</p>

<pre><code class="language-python">def say_hello():  
    print("Hello world from Cython!")
</code></pre>

<p>And then execute the command <code>cython hello_world.pyx</code>. You should end up with a file named <code>hello_world.c</code>. This file can be compiled into a shared library with the following command:</p>

<pre><code class="language-bash">gcc -shared -fPIC `pkg-config --cflags python-3.6m` hello_world.c -o hello_world.so  
</code></pre>

<p>Your Python extension should now be the ready, in the form of a file named <code>hello_world.so</code>. You should be able to drop into a Python shell in the same directory with this file, import it with a simple <code>import hello_world</code>, and then run <code>hello_world.say_hello()</code>, the output of which should be "Hello world from Cython!". Voila, your first Cython extension.</p>

<p>Now let's have a quick look at the <code>hello_world.c</code> file. The file is pretty big, and fortunately, you don't need to understand any of it to work with Cython. Still, it's interesting to have a look at what Cython did to your Python code. Go ahead and search for <code>Hello world from Cython</code> in this file (there is a much easier way to compare the generated code with the original, which we will see in a minute). You will see that Cython has generated C functions for the high-level <code>hello_cython</code> and the inner <code>print</code> functions, and also annotated these by marking in comments what they correspond to in the original code. The call to <code>print</code> has been moved into a separate function (called <code>__pyx_pf_11hello_world_hello_cython</code> on my computer). It makes a call to <code>__Pyx_PrintOne</code>, which is a wrapper for the Python <code>print</code>, and deals with various error conditions and return values using C functions and macros.</p>

<h3 id="easiercompilationofcythonextensions">Easier compilation of Cython extensions</h3>

<p>There is a much easier way to turn Cython code into native modules, involving the distutils core module that is responsible for building and installing modules in Python. This will allow us to delegate the trans- &amp; compilation to Cython and distutils, and be able to import the file like a normal Python module. In order to do so, you need to create a <code>setup.py</code> file with the following contents in the same directory as <code>hello_cython.pyx</code>:</p>

<pre><code class="language-python">from distutils.core import setup  
from Cython.Build import cythonize  
setup(ext_modules = cythonize("*.pyx", annotate=True))  
</code></pre>

<p>The role of the <code>annotate</code> argument will be explained in a bit. After running the command <code>python setup.py build_ext --inplace</code> to build the extension module, you should be able to import <code>hello_cython</code> and call <code>hello_cython.say_hello()</code>, getting the same result as above. From now on, whenever you make a change to a Cython file, you need to run the above command, which will build all the Cython files that have changed.</p>

<h3 id="pythonisvalidcython">Python is valid Cython</h3>

<p>As you can see with the hello world example, Cython accepts and processes correctly the Python you are used to writing everyday. It was mentioned above that there is an easy way to view the C code generated by Cython from the input, and also that the <code>annotate</code> argument would be explained later. If you compiled <code>hello_world.pyx</code> using distutils as explained above, you should now see a file named <code>hello_cython.html</code> in the same directory. This file contains a visual display of the Python code Cython has transformed, with the lines that require access to the Python interpreter colored in yellow. The darker the background yellow of a line, the more Python runtime interaction it contains. There is also a plus sign to the left of every yellow-tinged line, expanding to the C code that was generated by Cython as translation. Generally, the aim when converting Python to Cython for purposes of optimization is to decrease the number of yellow lines, or at least lighten the hue of their yellow. This will ensure that your code is as close as possible to a C version, and thus &#x2013;probably&#x2013; faster.</p>

<h3 id="thesimplethingsvariablesfunctionsloops">The simple things: Variables, Functions, Loops</h3>

<p>Now let's take some Python code that is somewhat slow, turn it into Cython, and make it faster by annotating it with Cython-specific statements, decreasing the amount of yellow in the annotation. We will use this approach to write a fast version of the <a href="https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes">Sieve of Erastothenes</a>, with which one can find the prime numbers up to a certain limit. Here is an implementation in pure Python:</p>

<pre><code class="language-python">import math

def sieve(up_to):  
    primes = [True for _ in range(up_to+1)]
    primes[0] = primes[1] = False
    upper_limit = int(math.sqrt(up_to))
    for i in range(2, upper_limit+1):
        if not primes[i]:
            continue
        for j in range(2*i, up_to+1, i):
            primes[j] = False
    return [x for x in range(2, up_to+1) if primes[x]]
</code></pre>

<p>Let's find out how fast this code is, by saving it in the file <code>sieve_python.py</code>, and benchmarking it with the handy <code>timeit</code> module, as follows:</p>

<pre><code class="language-bash">python -m timeit "import sieve_python; sieve_python.sieve(200000)"  
</code></pre>

<p>On my computer, the output is <code>10 loops, best of 3: 27.9 msec per loop</code>. Admittedly, this does not look like a slow way of computing 17984 primes (computers are freaking fast), but it already gives us a good starting point.</p>

<p><strong>Small note on timeit</strong>: The timeit module will run the piece of code you are timing in loops that increase in number of steps, until <a href="https://github.com/python/cpython/blob/v3.6.8/Lib/timeit.py#L316">execution time exceeds 0.2 seconds</a>. So don't be surprised if different attempts lead to different number of loops. From here on, I will omit the complete output, and report only the best time for each timing run. Also, in the following, benchmarking and timing will be used to mean the same thing, namely measuring the running time of a piece of code.</p>

<p><strong>Small note on benchmarking</strong>: Since building the extensions and benchmarking a module is an operation we will repeat frequently in the following, I have added a bash script named <code>benchmark.sh</code> that does this for you. It accepts the name of the module as its single argument, as in <code>./benchmark.sh sieve_python</code>.</p>

<h3 id="firstattemptatcythonizing">First attempt at Cythonizing</h3>

<p>Now let's copy the contents of <code>sieve_python.py</code> to a Cython file named e.g. <code>sieve_naive.pyx</code> (also available in the sample code repo), build it as an extension, and benchmark it with <code>./benchmark.sh sieve_naive</code>. On my computer, this leads to an average runtime of 17.2 msec, which is already an improvement over the pure Python version. Still, an improvement of 40% is not really worth our efforts. If we have a look at the annotation file in <code>sieve_naive.html</code>, we can see that there is Python runtime interaction pretty much on every line, except for the line with <code>continue</code>, which is a keyword in C, too. In order to optimize the naive Cython code, we need to convert all of these lines to Cython-specific code that would be translated to pure C. Without further ado, here is the properly cythonized version, available as <code>sieve_cython.pyx</code> in the sample repo (discussion of the changes will follow):</p>

<pre><code class="language-python">from libc.math cimport sqrt  
from libc.stdlib cimport malloc, free

def sieve(up_to):  
    cdef bint *primes = _sieve(up_to)
    response = [x for x in range(up_to+1) if primes[x]]
    free(primes)
    return response

cdef bint *_sieve(int up_to):  
    cdef int i, j
    cdef bint *primes = &lt;bint *&gt;malloc((up_to+1) * sizeof(bint))
    for i in range(up_to+1):
        primes[i] = 1
    primes[0] = primes[1] = False
    cdef int upper_limit = int(sqrt(up_to))
    for i in range(2, upper_limit+1):
        if not primes[i]:
            continue
        j = 2*i
        while j &lt; up_to + 1:
            primes[j] = False
            j += i
    return primes
</code></pre>

<p>When we benchmark this new version with <code>./benchmark.sh sieve_cython</code>, we achieve a considerable improvement in performance: 4.24 msec, which is 6.5 times faster than the Python version. Now let's dive into the Cython features we used to achieve this speed-up.</p>

<h6 id="1interfaceseparation">1. Interface separation</h6>

<p>Cython offers <a href="https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#python-functions-vs-c-functions">two fundamental ways of defining functions</a>: With the usual def (Python functions) vs. with cdef (C functions). The common pattern of organizing Cython code is separating the interface and computation functions, adn then writing them as Python and C functions respectively. Python functions have the exact same facilities as in pure Python, and are accessible from the Python runtime. They are mostly responsible for any conversion of arguments &amp; return values to and from C types, in addition to Python-specific things like exception handling. You can nevertheless use type-annotated variables as arguments or within them (such as the <code>cdef int *primes</code> above). C functions, on the other hand, which are declared using the Cython-specific <code>cdef</code> keyword, are transpiled directly to C functions. These functions (from here on called cdef functions) can be used exactly in the way Python ones are declared (optional arguments, Python objects as argument and return types etc.), but these lead to runtime interaction. In order to circumvent this, cdef functions can be annotated with types in the signature and in the variables used. The <code>_sieve</code> function from the above code example, for example, does not accept, return or process any Python objects; all arguments and variables are annotated with C types, as well as the return value.</p>

<p>cdef functions have a couple of oddities you need to consider, however. The default return type of cdef functions is Python object, and they will convert whatever is returned into one. It therefore makes sense to declare some return type to avoid Python runtime interaction. Also, within them, rules of the C world are valid, meaning that integer division and overflow function differently. Especially if your code is doing intensive numeric calculation, you should watch out for these catches.</p>

<p>A look at the annotation file for <code>sieve_cython.html</code> is rather revealing. In <code>sieve_cython.html</code>, you can see pretty clearly that the <code>_sieve</code> function has been completely converted to C, with no yellow lines. The four-line <code>sieve</code> function, on the other hand, is yellow except for the call to <code>free</code>. If you expand the third line of this function (line 6 in the whole file), you can get a very good view of what Cython is doing in the background. The one-line Python call to create a new list from a comprehension on a range calls has led to 58 lines of error handling, resource management and Python interaction. The above mentioned <code>PyList_New</code>, for example, is called to create a new list.</p>

<p>There is a third type of function you can define with <code>cpdef</code> instead of <code>cdef</code>, which is a hybrid between Python and C functions. In the background, Cython will actually define two functions, a Python and a C one. Calls from Python runtime will be directed to the Python function, whereas calls from within Cython-generated code will be directed to the C function. Therefore, you will be getting the benefit of C optimization when your code is called from other C functions, whereas the Python function will still be available as an interface.</p>

<h6 id="2typeannotations">2. Type annotations</h6>

<p>The type annotations used in Cython are <a href="http://docs.cython.org/en/latest/src/userguide/language_basics.html#c-variable-and-type-definitions">relatively straightforward</a>, especially if you are acquainted with C types. Any C type can be used as a valid type, including pointer types. Variables with C types have to be defined using the <code>cdef</code> keyword, which can be done in both kinds of functions we have seen. One special type used also in <code>sieve_cython.pyx</code> is <code>bint</code>, which is a normal <code>int</code> in C code, but is converted to the Python boolean when necessary. If a variable is defined in the usual Python way without type information, it's assumed to be a Python object. When you build a Python object out of C types, as on line 9 of <code>sieve_cython.pyx</code>, Cython will do certain kinds of type conversions on the fly, without asking for more information. Similar conversions are possible also the other way around, from Python to C. You can see which conversions are possible, and how they work in the documentation on <a href="https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#type-conversion">type conversions</a>.</p>

<h6 id="3clibraries">3. C libraries</h6>

<p>Instead of importing and using the Python math module, <code>sieve_cython.pyx</code> uses the <a href="https://en.wikibooks.org/wiki/C_Programming/math.h">C math library</a>, which makes type conversions unnecessary. Cython facilitates this step by providing the C libraries as Python-like imports, as in <code>from libc.math cimport sqrt</code>. The same is valid for the dynamic memory management routines malloc and free, which are discussed next.</p>

<h6 id="4memorymanagement">4. Memory management</h6>

<p>In C-land, memory demands much more of the programmer. The difficulty arises from the interaction of <a href="https://en.wikipedia.org/wiki/C_dynamic_memory_allocation">dynamic memory management</a> and the <a href="https://www.gribblelab.org/CBootCamp/7_Memory_Stack_vs_Heap.html">stack vs. heap distinction</a>. Data allocated on the stack is automatically managed, i.e. removed when the reference goes out of scope. If you want to keep a reference to a piece of data after its initial context is gone, however, you will need to allocate it using the notorious <code>malloc</code>, and then return the memory back with <code>free</code>. In our case, we need to keep a reference to the <code>primes</code> list after <code>_sieve</code> is done; this is the reason it is created using <code>malloc</code>, which takes as input the amount of memory that needs to be set aside, and returns a pointer to it. The topics of dynamic memory management and pointers are very C-specific, but they are relatively straightforward, so any decent book on C should provide enough information to set you up for their use in Cython.</p>

<h6 id="5loops">5. Loops</h6>

<p>Python has numerous facilities for patterns that are handled using a single tool in C, the for loop. In the pure Python implementation of the sieve, we have three loops: A list comprehension (line 7 of <code>sieve_naive.pyx</code> and two for-in loops, all three with a <code>range</code> call. In C, all of these have to be written using a for loop with auxiliary variables and all, and Cython does its best to convert them properly, both the loop and the <code>range</code> call. In the very first case, we don't really care that much about what the code is converted to, as it's an interface function, and a list comprehension is mixed with type conversion and range, but I would like to note that the call to <code>range</code> is of a simple form, with a start and an end index. The second call to <code>range</code> (on line 10) is also of the same form. This form can be <a href="http://docs.cython.org/en/latest/src/userguide/pyrex_differences.html#automatic-range-conversion">converted relatively easily</a> into a for loop by Cython. The last use of range, however, involves a step argument (on line 13), which means that it would require Python runtime interaction, which we want to avoid. For this reason, in the Cython-optimized version, this third call, together with the for loop, has been turned into a while loop with explicit loop counter increment to achieve the same functionality.</p>

<h3 id="definingandcollectingnewtypeswithcython">Defining and Collecting New Types with Cython</h3>

<p>The above example is relatively straightforward, since it uses only built-in <br>
Python and C types. To delve into more complex use cases, I will use as an <br>
example <a href="http://www.aiai.ed.ac.uk/~gwickler/eightpuzzle-uninf.html">the 8-puzzle</a> that is a simple task used to demonstrate search <br>
algorithms. The 8-puzzle involves integers from 1 to 8 placed on a grid, with <br>
one empty cell. Moves can be made by sliding one number vertically or <br>
horizontally to the empty cell. The board is considered "solved" if the numbers <br>
are in order, and the last cell is the empty one. In the sample repository, you <br>
can see the solutions I already wrote in Python and C. Both require a start <br>
state as the single argument, for which there is a sample in the file <br>
<code>state.txt</code>. In both implementations, state of the puzzle board is represented
as a 2-dimensional array of integers, with 0 representing the empty cell. <br>
Breadth-first search is used to find a sequence of moves that solves the puzzle, <br>
printing this sequence in reverse. The set of board states that were already <br>
seen is represented as a trie, as membership in this set is checked frequently, <br>
and must be approriately fast.</p>

<p>Let's start off with benchmarking. As we are comparing an executable with a <br>
Python script in this step, the <code>timeit</code> module won't cut it; we will have to <br>
resort to a more general solution. The most comprehensive Linux tool for this <br>
purpose is <code>perf</code>, but I should warn you that on Ubuntu it depends on kernel <br>
tools specific to the kernel version, which caused quite some headache for me, <br>
so proceed at your own risk. With that out of the way, here is how to benchmark <br>
the python script:</p>

<pre><code class="language-bash">    perf stat -r 10 python eight.py start.txt &gt; /dev/null
</code></pre>

<p>On the last line, I can see that it took 0.201 seconds on average on my <br>
computer. I also gave pypy a try for completeness' sake, simply replacing <br>
<code>python</code> with <code>pypy</code> in the above command, which resulted in 0.360 seconds on
average, surprisingly slower than Python itself. The C code delivers the <br>
goodies, however: When benchmarked, the average runtime with the same input is 8 <br>
msec, nearly 30 times faster than Python. The question is now whether we can get <br>
close to it using Cython.</p>

<h4 id="firstiterationofeightpuzzleincython">First iteration of eight puzzle in Cython</h4>

<p>As with the previous example, we will start by simply copying the Python solution to <code>eight_cython.pyx</code>, and using it as a starting point. In order to get this file compiled into an executable, we need to use the <code>cython</code> CLI command instead of the extension building mechanism, and pass it an extra argument, as in <code>cython eight_cython.pyx --embed</code>. Don't forget adding <code>--annotate</code> to the mix if you want the annotation file. The resulting <code>eight_cython.c</code> file can be compiled into an executable with the following command (given that you have Python 3.6 and the relevant development package installed):</p>

<pre><code class="language-bash">gcc `pkg-config --cflags python-3.6m` eight_cython.c -lpython3.6m -o eight_cython  
</code></pre>

<p>When benchmarked, this file already leads to a significant improvement, clocking at 200 msec, 23 msec less than letting the Python runtime execute the script. There is a lot to be done to speed this up, however, which I went ahead and did already. You can see the results in the file <code>eight_cython.pyx</code>. This version is considerably faster: With an average runtime of 43 msec, it is five times faster than the pure Python version. There were two major improvements that were used to achieve this speedup, explained in the following.</p>

<h6 id="1classvsstruct">1. Class vs Struct</h6>

<p>The main data structure in the Python version is the <code>State</code> class that wraps the integer array and provides the necessary methods for generating children, checking final status etc. Cython can definitely deal with this class, as we saw in the naive attempt, but it will be approximately as slow as the Python version, so we have to convert it to something else. We have two options what this something can be: C structs and extension types. The former is the plain old C struct that can be declared in Cython using the <code>cdef struct</code> construct. Here is an example from <code>eight_cython.pyx</code>:</p>

<pre><code class="language-python">cdef struct BoardPosition:  
    int row
    int column
</code></pre>

<p><a href="https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#extension-types">Extension types</a>, on the other hand, are a Python-runtime related construct. Their instances are proper Python objects, managed by the Python garbage collector, and they can be subclassed in the usual way by other extension types or Python classes. You cannot add new attributes at runtime, however, as these are stored directly on the C struct that represents an instance object. The attributes are also not accessible to other classes or functions unless explicitly declared to be so. The way special methods of extension types function also differs in <a href="https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#special-methods">subtle ways</a>; you should check in the list whether your expectations are met before you implement one. In <code>eight_cython.pyx</code>, the <code>State</code> class is declared as an extension type using the <code>cdef</code> keyword, and it has the attributes <code>board</code>, <code>_zero_index</code> and <code>parent</code>:</p>

<pre><code class="language-python">cdef class State:  
    cdef int **board
    cdef BoardPosition _zero_index
    cdef public State parent
</code></pre>

<p>As you can see, the <code>board</code> 2-dimensional array is declared exactly the way in C, but the reference to the parent <code>State</code> is not a pointer, as it would have been in C. Cython manages references to Python objects (extension types or normal classes) as pointers in the background; you don't have to declare them as such. The <code>State</code> extension type has methods defined with either <code>def</code> or <code>cdef</code>; the same conditions as above are valid, but you cannot declare any special methods (those wrapped in double underscores) with <code>cdef</code>. Also, it is not possible to turn <code>cdef</code> methods into properties with the usual <code>@property</code> decorator (as would have been convenient with the <code>zero_index</code> method).</p>

<p>One special method of <code>State</code>, <code>__cinit__</code> is worth special attention. This method is called once when an extension type is allocated, and it is the place where code that allocates any further C data structures belongs. In our case, the <code>board</code> array is allocated here. There is a corresponding <code>__dealloc__</code> special method where you can free resources allocated in <code>__cinit__</code>. This is also implemented in <code>State</code> for completeness sake.</p>

<h6 id="2carraysversuslists">2. C arrays versus lists</h6>

<p>In the <code>State.children</code> method, we need to collect structures for representing the two following things:</p>

<ul>
<li><p>Possible ways of swapping positions on the eight board, depending on where the empty cell is (as <code>BoardPosition</code>)</p></li>
<li><p>Child states that result from these swaps (as <code>State</code>).</p></li>
</ul>

<p>When it comes to storing such structures, the simplest option is using the tried and proven Python list. Conveniently, basic Python collection types (list, dict, tuple and set) can be used as a type in <code>cdef</code> functions. The problem with the list structure, however, is that it leads to Python runtime interaction, and is accordingly slow. Therefore, in performance-criticial situations, it is advisable to use C arrays instead. It is the bread and butter of C programming to allocate arrays of structs and iterate over these in every which way possible, and it is not any more difficult in Cython to do so; you can see how it is done with the array of <code>BoardPosiion</code> structs in the <code>State.children</code> method. The situation is more complicated with the child states, however, as these are represented with the <code>State</code> extension type. Since extension types are managed by the Python garbage collector, and they can be referred to only as pointers to Python objects, their storage in arrays as C pointers is rather complicated, involving casts and calls into the reference counting mechanism of Python. For this reason, in order not to complicate the code too much, I opted for the simpler list type in the first iteration. There will later be a more detailed discussion of storing pointers to extension types in arrays.</p>

<h6 id="3usingdeftodeclareaconstant">3. Using DEF to declare a constant</h6>

<p>Cython allows C-style constants with <a href="https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#conditional-compilation">the <code>DEF</code> directive</a>. As with C, any values defined this way are replaced within the code at compile time. you have to be careful, though, as only the basic types int, long, float, bytes and unicode can be declared as constants. The number of rows and columns on the eight board is declared in <code>eight_cython.pyx</code> as a constant with <code>DEF SIZE = 3</code>. Cython also allows conditional compilation with the <code>IF</code> directive.</p>

<h3 id="profilingcython">Profiling Cython</h3>

<p>Comparing the performance of the C version (9 msec) with <code>eight_cython.pyx</code> (43 msec), we can see that there is still room for improvement to reach C level performance. This is also obvious in the annotation file, which is deep yellow in many parts. To find out which bottleneck to tackle next, however, the annotation file is not enough, as it shows us only how much Python runtime interaction a line of code causes, and not how much it contributes to the total runtime. Profiling is what we need, and fortunately, Cython makes this extremely easy. You need to only add the following compiler directive to the top of a pyx file to make Cython generate profiling data which can be processed by the standard Python profilers:</p>

<pre><code># cython: profile=True
</code></pre>

<p>Another difference to normal operation is that you need to run the code under profiling as a module, and not from the command line, which means that the code has to be compiled as a module with the usual <code>python setup.py build_ext --inplace</code>. Now we are ready to profile our code by starting a Python shell and running the following:</p>

<pre><code class="language-python">import cProfile  
import eight_cython  
cProfile.run("eight_cython.main('start.txt')", sort='time')  
</code></pre>

<p>And here is the output I have received:</p>

<pre><code>     403088 function calls in 0.097 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    33453    0.018    0.000    0.022    0.000 eight_cython.pyx:59(add_or_get_child)
    44935    0.017    0.000    0.021    0.000 eight_cython.pyx:67(get_child)
    1    0.009    0.009    0.097    0.097 eight_cython.pyx:202(search)
     3717    0.008    0.000    0.012    0.000 eight_cython.pyx:163(children)
     6413    0.008    0.000    0.037    0.000 eight_cython.pyx:88(contains)
    44935    0.008    0.000    0.029    0.000 eight_cython.pyx:67(get_child (wrapper))
[..snip..]
</code></pre>

<p>The runtime increased a little due to the profiling overhead, but we now have ample information on which functions are causing the most execution overhead, namely the <code>get_child</code> and <code>add_or_get_child</code> methods of the <code>TrieNode</code> extension class. This is not surprising, as this code, used for checking whether a state has already been seen, is guaranteed to be executed for every evaluated state. On top of that, it uses the Python <code>list</code> type, leading to huge overhead in interaction with the runtime. This can also be seen in the annotation file in which the two <code>TrieNode</code> methods are a very deep shade of yellow. If we want to speed up our code, we need to tackle the way the board trie is built and used; we will now see how.</p>

<h3 id="replacingextensiontypeswithcstructs">Replacing extension types with C structs</h3>

<p>The difficulty we are facing in the optimization of the <code>TrieNode</code> class is that its methods need to return lists of <code>TrieNode</code> instances. As already mentioned, interaction with the Python list class requires a lot of runtime interaction, so the way to speed up list iteration is to get rid of it, and use C arrays instead. There are two ways of going about this. The first is turning <code>TrieNode</code> into a C struct, and storing pointers to instances of it in the array. This is the path we will take, and we will have a look at the alternative, using pointers to Python objects, later. The file <code>eight_cython_improved.pyx</code> contains the implementation where <code>TrieNode</code> is now a struct, defined as follows:</p>

<pre><code class="language-python">cdef struct TrieNode:  
    int value
    TrieNode **children
</code></pre>

<p>Code that used to be methods on the <code>TrieNode</code> cdef class is now packed into individual functions, all prefixed with <code>trie</code>, and accepting a pointer to a <code>TrieNode</code> as the first argument. This is a common pattern of organizing C code, somewhat similar to object oriented coding. If you compare this new trie code with the implementation in <code>eight.c</code>, you can see that the Cython version is pretty much a line-by-line translation, except for the semicolons, and with an extra cast of the malloc return value, which Cython needs for typechecking. Another C pattern that is used in both <code>eight.c</code> and this improved Cython version is marking the end of a list with a <code>NULL</code> value. The end of the list of children of a <code>TrieNode</code> is marked with <code>NULL</code>, a keyword in Cython that is synonymous with its meaning in C.</p>

<p>This improved version benchmarks at 33 msec, shaving off 10 more msec from the previous version. Is there any other big win we can score, as with the trie optimization? Once more, the answer lies in profiling. Here is the result of profiling <code>eight_cython_improved.pyx</code>:</p>

<pre><code>Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.009    0.009    0.054    0.054 eight_cython_improved.pyx:206(search)
  3717    0.008    0.000    0.013    0.000 eight_cython_improved.pyx:167(children)
  6413    0.006    0.000    0.011    0.000 eight_cython_improved.pyx:93(trie_contains_board)
 44935    0.005    0.000    0.005    0.000 eight_cython_improved.pyx:77(trie_get_child)
 33453    0.005    0.000    0.009    0.000 eight_cython_improved.pyx:64(trie_add_or_get_child)
  3717    0.005    0.000    0.013    0.000 eight_cython_improved.pyx:86(trie_add_board)
</code></pre>

<p>There are two more bottlenecks that promise significant improvements: the <code>search</code> function, and the <code>children</code> method of <code>State</code>. These two are connected, however, in that the <code>children</code> method returns a list, and the <code>search</code> function iterates over its result. According to the annotation file, this iteration is the most intensive interaction with the Python runtime in <code>search</code>. We might achieve further speedup by doing the same kind of refactoring we did with the <code>TrieNode</code> structure, namely by turning <code>State</code> into a struct and its methods into functions that accept a <code>State</code> as an argument. The template is already available in <code>eight.c</code>, but instead of this easy fix, I wanted to give the alternative a try, namely using a C array to collect pointers to Python objects. The result is in the file <code>eight_cython_improved.pyx</code>. As you can see there, a <code>ChildSet</code> struct is necessary to return the number of children and the actual pointers to the children:</p>

<pre><code class="language-python">cdef struct ChildSet:  
    PyObject **children
    int count
</code></pre>

<p>The <a href="https://github.com/python/cpython/blob/master/Include/object.h#L113"><code>PyObject</code></a> is the C struct that Python uses to keep track of objects; you can use it in Cython to refer to arbitrary objects. It is your responsibility, however, to manage reference counts when you create a reference to an object, as with the pointer in the <code>child_set.children</code> array on line 198. This is the reason for the call to <code>Py_XINCREF</code> on line 196. <code>PyObject</code>, <code>Py_XINCREF</code> and <code>Py_XDECREF</code> are imported from the same package with the following line at the top of the file:</p>

<pre><code class="language-python">from cpython.ref cimport PyObject, Py_XINCREF, Py_XDECREF  
</code></pre>

<p><code>Py_XDECREF</code> is later used to decrement the reference count, on line 233. Another thing explicitly done in a number of places is casting between the objects and pointers. When we are adding the child <code>State</code> object into the array on line 198, we are casting it to a <code>PyObject *</code>. Later, when this entity needs to be accessed as a <code>State</code> object again, it is cast back again on line 231. Building and timing the version with object pointers, I get an average runtime of 32 msec, which is the same with the version with extension types in Python lists. Apparently, the load of the casts and reference counting balance out the wins from interacting with the list, leading to zero improvement.</p>

<h2 id="goingfurther">Going Further</h2>

<p>The material presented here covers only the basics of Cython. There are numerous other features, explained in detail in the official documentation and some other resources, that allow Cython to interact with other C or Python libraries, generate better optimized code, or let you tune the interaction with the Python runtime to your needs. Here are some resources that can help you get further in general Cython or in specific areas that interest you:</p>

<ul>
<li><p>The <a href="https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html">Basic Tutorial</a> in cython documentation is another good starting point, but it leaves off at a point where confusions arise if you try to write useful code.</p></li>
<li><p>The <a href="https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html">Language Basics</a> page is worth going through before embarking on any significant Cython project, as it touches on all Cython features, with links to further documentation.</p></li>
<li><p><a href="https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html">Typed Memoryviews</a> allow Cython code to interact with memory buffers of uniform data types, such as Numpy arrays or built-in Python array types. This feature is based on the <a href="https://docs.python.org/3/c-api/buffer.html">buffer protocol</a>, the C-level infrastructure that lays out the groundwork for shared data buffers in Python.</p></li>
<li><p>Cython also allows for easy and GIL-free parallelism using OpenMP with the <a href="https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html">cython.parallel</a> package.</p></li>
<li><p>The book <a href="http://shop.oreilly.com/product/0636920033431.do">Cython - A Guide for Python Programmers</a> is an in-depth discussion of Cython, with all the ins and outs and corner cases. I would highly recommend it if you are going to work extensively with Cython.</p></li>
<li><p><a href="https://www.youtube.com/watch?v=wsczq6j3_bA">The Day of the EXE Is Upon Us</a> is an excellent talk given by Brandon Rhodes at PyCon 2014. Among others, it touches upon the complicated distinctions between interpreted and compiled code (surprise: even x86 assembly is interpreted), why Python is slow, and why Cython is incredible.</p></li>
</ul>]]></content:encoded></item><item><title><![CDATA[A Tutorial Introduction to Kubernetes]]></title><description><![CDATA[<p>Kubernetes is the hottest kid on the block among container orchestration tools right now. I started writing this post when we decided to go with Kubernetes at <a href="https://www.twylahelps.com/">Twyla</a> a year ago, and since then, the developments in the ecosystem have been simply overwhelming. In my opinion, the attention Kubernetes gets</p>]]></description><link>http://okigiveup.net/a-tutorial-introduction-to-kubernetes/</link><guid isPermaLink="false">83dc4783-2b9f-4392-87f5-4d73a8c90b34</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Thu, 05 Jul 2018 09:54:16 GMT</pubDate><content:encoded><![CDATA[<p>Kubernetes is the hottest kid on the block among container orchestration tools right now. I started writing this post when we decided to go with Kubernetes at <a href="https://www.twylahelps.com/">Twyla</a> a year ago, and since then, the developments in the ecosystem have been simply overwhelming. In my opinion, the attention Kubernetes gets is completely deserved, due to the following reasons:</p>

<ul>
<li><p>It is a complete solution that is based on a fundamental set of ideas. These ideas are explained in the <a href="https://research.google.com/pubs/archive/44843.pdf">Borg, Omega and Kubernetes</a> article that compares the consecutive orchestration solutions developed at Google, and the lessons learned.</p></li>
<li><p>While it is container-native, Kubernetes is not limited to a single container platform, and the container platform is extended with e.g. networking and storage features.</p></li>
<li><p>It offers an open and well-designed API, in addition to various patterns that suit differing workflows. The wonderful thing is that there is a very well-governed community process whereby the API is constantly developed further. You have to spend effort keeping up, but regularly receive goodies in return.</p></li>
</ul>

<p>In this tutorial, I want to document my journey of learning Kubernetes, clear up some points that tripped me as a beginner, and try to explain the most important concepts behind how it works. There is absolutely no claim of completeness; Kubernets is way too big for a blog tutorial like this.</p>

<h2 id="startingoff">Starting off</h2>

<p>The easiest way to start using Kubernetes is Minikube. If you have an account with a cloud provider, and would like to first figure out the details of running a cluster on their platform, this tutorial will still work for you, as the commands work for any recent version of Kubernetes. See <a href="https://kubernetes.io/docs/getting-started-guides/minikube/">here</a> for details on how to get Minikube running on your computer. In order to manipulate the Kubernetes mini-cluster minikube runs, you need the official CLI client named kubectl, which can be installed following the instructions <a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/">on this page</a>. You will also need Docker to create and push container images. Install Docker on your computer following the instructions <a href="https://docs.docker.com/engine/installation/#supported-platforms">here</a>.</p>

<p>Once you have installed everything, make sure they are all available with the following commands:</p>

<pre><code>kubectl version
docker version
minikube version
</code></pre>

<p>You can check whether Minikube is running using the following command, which also tells you whether there is an update available:</p>

<pre><code>minikube status
</code></pre>

<p>If minikube is not already running, you can start it with <code>minikube start</code>. Normally, when you install minikube, it automatically configures kubectl to access it. You can check whether this is the case with <code>kubectl cluster-info</code>. Its output should be something like the following:</p>

<pre><code>Kubernetes master is running at https://192.168.99.100:8443
</code></pre>

<p>If the IP is not in the <code>192.168.*.*</code> range, or kubectl complains that configuration is invalid or the cluster cannot be contacted, you need to run <code>minikube update-context</code> to have minikube fix your configuration for you.</p>

<h2 id="howiskubectlconfigured">How is kubectl configured?</h2>

<p>I think it is a good idea to shortly mention how kubectl is configured. Which API endpoints and clusters kubectl accesses are defined in the <code>\~/.kube/config</code> file by default. The file that is accessed can be changed with the <code>KUBECONFIG</code> environment variable, which should specify a list of paths, so if kubectl displays weird behavior whih you suspect might be due to the configuration, don't forget checking whether this environment variable is set. The kubectl configuration file is in the YAML format, like many other things in Kubernetes. It has two top-level keys that are of immediate relevance: <code>contexts</code> and <code>clusters</code>. The clusters list contains endpoint and certificate information for the different clusters to which the user has access. A context combines one such cluster with the user and namespace values for accessing it. One of these contexts is the currently active one; you can find out which by either looking at the config file, or running <code>kubectl config current-context</code>. You can also run <code>kubectl config view</code> command to show the complete configuration. You can limit the data shown to the current context with this command using the <code>--minify</code> option.</p>

<h2 id="nodesandnamespaces">Nodes and namespaces</h2>

<p>Two basic concepts that are relatively straightforward and can be explained without a lot of context are nodes and namespace. Nodes are the individual units of a Kubernetes cluster, be it a VM or an actual computer. What makes such a unit a node is the <code>kubelet</code> process that runs on it. This process is responsible for communicating with the Kubernetes master, and running the right containers in the right way. You can get a list of the nodes with <code>kubectl get nodes</code>. If you are using Minikube, and didn't do anything fancy with the configuration, there will be a single node. Nodes are not particularly interesting. You as a Kubernetes user will not be doing anything fancy with them, and cloud provisioners all have means of automatically or manually scaling the nodes in a Kubernetes cluster.</p>

<p>Namespaces provide a means to separate subclusters conceptually from each other. If you are running different application stacks on the same cluster, for example, you can organize the resources per app by putting them in the same namespace. A resource created without a namespace specified is created in the <code>default</code> namespace. It's not necessary to use namespaces, but they make certain things much easier, by helping you avoid name clashes, <a href="https://kubernetes.io/docs/concepts/policy/resource-quotas/">limit ressource allocation</a>, or manage permissions. In case you start working with namespaces, and get annoyed by having to provide the <code>--namespace</code> switch to every command, here is a handy command that will set the default namespace for the current context:</p>

<pre><code>kubectl config set-context $(kubectl config current-context) --namespace=my-namespace
</code></pre>

<h2 id="kubernetesdashboard">Kubernetes dashboard</h2>

<p>Kubernetes comes with a built-in dashboard in which you can click around and discover things. You can find out whether it is running by listing the system pods with the following command:</p>

<pre><code>kubectl get pods -n kube-system
</code></pre>

<p>If there is an entry beginning with `kubernetes-dashboard`, it's running. In order to view the dashboard, first run the command <code>kubectl proxy</code> to proxy to the Kubernetes API. The Kubernetes API should now be available at <a href="http://localhost:8001">http://localhost:8001</a>, and the dashboard at <a href="http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/">this rather complicated URL</a>. It used to be reachable at <a href="http://localhost:8001/ui">http://localhost:8001/ui</a>, but this has been changed due to what I gather are security reasons.</p>

<h2 id="usingalocallybuiltimagewithminikube">Using a locally built image with Minikube</h2>

<p>In the following tutorial, we will be deploying various container images in order to demonstrate Kubernetes features. Kubernetes uses Docker to retrieve and run container images, meaning that the usual rules of Docker container pull logic apply. That is, for a container image that is not available, if only a name and a tag are provided, Docker contacts the Docker Hub, otherwise hitting the registry in the container name. The aim of this tutorial is to get you to playing around with services running within a Kubernetes cluster as quickly as possible. Hence, the method I would recommend for accessing the container images from minikube is directing your Docker client to the daemon running inside minikube, instead of the local one. Configuring Docker to do so is straightforward with <code>eval $(minikube docker-env)</code>. Now, any image that you create and tag will be available inside minikube. You can make sure that this is the case by running <code>docker ps</code>. If the output contains a list of images from <code>gcr.io/google_containers</code>, you are doing it right. This proxy to the docker service in minikube will be valid only in the current shell; you will be back to using the local docker service when you switch to another shell.</p>

<p>If you are not interested in modifying and building the sample services yourself, you can also pull the sample images <a href="https://hub.docker.com/r/afroisalreadyin/">from my Docker.io profile</a>. It should be enough to replace the <code>kubetutorial</code> prefix in the image tags with <code>afroisalreadyin</code>.</p>

<h2 id="runningaservice">Running a service</h2>

<p>Let's start off by running our first command to tell us whether there is anything running on the cluster. We will use the above mentioned kubectl client to do so, running the command <code>kubectl get pods</code>. What pods are will be explained in a second. As long as the client is configured correctly, as explained above, you should see only the message <em>No resources found</em>. What kubectl did was to access the Kubernetes cluster running within minikube as specified by the currently active context configuration and present the resulting information. kubectl is just one among many API clients; there are others, such as <a href="https://github.com/kubernetes-incubator/client-python">this Python client</a> which is the other officially supported one. You can view the API requests <code>kubectl</code> is making by increasing the verbosity of the logging with the <code>--v=7</code> argument, but careful, this will lead to a lot of textual output.</p>

<p>Kubernetes will not figure out for itself what we need to run, so let's go ahead and tell it to run a very simple application, namely the simple Python application from the Kubernetes demos repository. In order to do so, you need to first clone the repo, navigate to the subfolder <code>simple-python-app</code>, and create a container image by running the following command:</p>

<pre><code>docker build -t kubetutorial/simple-python-app:v0.0.1 .
</code></pre>

<p>Once the build runs, you should be able to see it in the list of available images in the result of running <code>docker images</code>. After making sure this is the case, we are finally ready to run our first Kubernetes command, which is the following:</p>

<pre><code>kubectl run simple-python-app \
     --image=kubetutorial/simple-python-app:v0.0.1 \
     --image-pull-policy=IfNotPresent \
     --port=8080
</code></pre>

<p>It should be obvious that this command somehow runs the container that we just created, since the tag of the image is passed in with the <code>--image</code> argument. The <code>imagePullPolicy=IfNotPresent</code> argument tells Docker to use an existing local image instead of attempting to pull it. We are also specifying the port 8080 here as the port this deployment is exposing. This has to be the same port the application is binding to. Unless we provide this bit of information, Kubernetes has no way of knowing on which port to contact the application. Small side note: The demo service has to bind to this port on the general interface <code>0.0.0.0</code> and not on <code>localhost</code> or <code>127.0.0.1</code>.</p>

<p>How do we reach into Kubernetes to contact our service? This is the perfect time to introduce the most important abstraction in Kubernetes: <em><a href="https://kubernetes.io/docs/concepts/workloads/pods/pod/">The Pod</a></em>. As with the other abstractions, pods are resources on the Kubernetes API, and we can list and query them using kubectl. Let's see which pods are now running, with the same command that we ran earlier, <code>kubectl get pods</code>. The output should closely resemble the following:</p>

<pre><code>NAME                               READY     STATUS    RESTARTS   AGE
simple-python-app-68543294-vhj7g   1/1       Running   0          21s
</code></pre>

<p>Great, we have a pod running. But what is a pod, actually? A pod is the fundamental application unit in Kubernetes. It is a collection of containers that belong together, and whose lifetimes are managed together. These containers are deployed on the same node, their lifetimes are managed together, and they share operating system namespaces, volumes, and IP address. They can contact each other on <code>localhost</code> and use OS-level IPC mechanisms such as shared memory. The decision of what to include in a pod hinges on what serves as a single unit across the dimensions of deployment, horizontal scaling, and replication. For example, it would not make sense to put the data store <em>and</em> the application containers of a service into the same pod, because these scale and are replicated independently of each other. What <em>does</em> belong together with the application container is a container that hosts the log aggregation process, for example.</p>

<p>Now that we know what a pod is, and can figure out the name of our single pod running, we can query it using the kubectl proxy feature we already used above. Once the proxy is running, you can access the <code>simple-python-app</code> container on the port we specified in the previous command by querying the special URL that Kubernetes makes available for this purpose (don't forget changing the name of the pod at the end of the URL):</p>

<pre><code>curl http://localhost:8001/api/v1/proxy/namespaces/default/pods/simple-python-app-68543294-vhj7g
</code></pre>

<p>We can also see the logs of our brand new pod with <code>kubectl logs simple-python-app-68543294-vhj7g</code>, which should show the stdout of our application. It is also possible to execute a command within the container, similar to the <code>docker exec</code> command, with <code>kubectl exec -ti simple-python-app-68543294-vhj7g CMD</code>. As with Docker, the <code>-ti</code> bit signals that a tty should be allocated, and the command should run interactively. The <code>kubectl exec</code> command allows you to pick which container to run the command in using the <code>-c</code> switch. When ommitted, the default is the only container in the pod, if there is just one, as per the definition of the pod.</p>

<h3 id="whocreatedthepod">Who created the pod?</h3>

<p>It's nice that Kubernetes is running our container inside a pod, but we would still like to know where the pod actually comes from. We didn't tell Kubernetes to create any pods. In fact, pods are rarely created manually in Kubernetes. If that were the case, Kubernetes would not be offering anything new; the user would still be responsible for orchestrating the individual application units, and ensuring their availability. What the above <code>kubernetes run</code> command did was to <em>create a Deployment</em>. This can be seen by listing the deployments:</p>

<pre><code>$ kubectl get deployments
NAME                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
simple-python-app   1         1         1            1           1s
</code></pre>

<p><a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Deployments</a> are one of the special kinds of resources in the Kubernetes world, in that they are responsible for managing the lifetime of application containers. These kinds of resources are called <em>controllers</em>, and they are central to the Kubernetes puzzle. You can get more detailed info about the new deployment with <code>kubectl describe deployments simple-python-app</code>. The <code>describe</code> subcommand is a very useful tool for getting detailed information on all resources. It also lists related resources, and events that concern the described resource. For this deployment, you can see a couple of things in the output of <code>kubectl describe</code>. First of all, there is talk of something called a <em>pod template</em>. This is what is used to create the pods when the deployment is being scaled, i.e. new pods are being created to meet the target.</p>

<p>What happens when we delete the pod? In order to view what is happening in real time, I would advise you to open a second terminal, and run the command <code>kubectl get pods -w</code> in it. The <code>-w</code> switch updates the output in regular intervals. Now, delete the existing pod with <code>kubectl delete pod simple-python-app-68543294-vhj7g</code>. In the output of the pod listing terminal, you should temporarily see a state like the following:</p>

<pre><code>NAME                                 READY     STATUS        RESTARTS   AGE
simple-python-app-5c9ccf7f5d-8lbb2   1/1       Running       0          4s
simple-python-app-5c9ccf7f5d-kl77s   1/1       Terminating   0          43s
</code></pre>

<p>So as one pod is being deleted, another was already created (the status might also be <code>ContainerCreating</code> instead of <code>Running</code>. The responsibility for this recreation goes to <a href="https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/">Replica Sets</a>. You can see the replica sets that belong to a deployment using the above mentioned <code>kubectl describe</code> command; the Replica Sets will be listed at the bottom, before the events. You can see that there are two lists: <code>OldReplicaSets</code> and <code>NewReplicaSets</code>. The difference between the two will be explained later in the context of rollouts. You can also list the replica sets with the <code>kubectl get replicasets</code> command.</p>

<p>Looking at the replica set created by our deployment with <code>kubectl describe replicaset $REPLICA_SET_NAME</code>, we can see at a glimpse a number of relevant rows:</p>

<pre><code># ... snip
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       pod-template-hash=4035281104
                run=simple-python-app-2
  Containers:
   simple-python-app:
    Image:              kubetutorial/simple-python-app:v0.0.1
    Port:               8080/TCP
    Environment:        &lt;none&gt;
    Mounts:             &lt;none&gt;
  Volumes:              &lt;none&gt;
Events:                 &lt;none&gt;
</code></pre>

<p>This Replica Set is responsible for keeping one Pod with our <code>simple-python-app</code> container running, and it is doing that successfully, judging from the <code>1 current / 1 desired</code> row. But as with pods, replica sets are intended to be created by Deployments, so you shouldn't have to create or manipulate them manually.</p>

<h2 id="shortexcursiononnetworking">Short excursion on networking</h2>

<p>As nice and useful as replica sets are, they not much of a help in terms of high availability. When a Pod goes down, another one is started, and it has a different name, a different IP address, and is possibly running on a completely different node. Also, what if we want to load balance these replicas? If Kubernetes were to offer service discovery only based on pod names, the clients of this service would need to do client-side load balancing, and keep an internal list of pods that need to be updated on every pod lifetime event. What about routing incoming traffic to services (ingress)? These are all pesky issues that need simplification. Kubernetes offers much easier mechanisms to achieve HA, load balancing and ingress. The basis for all this is the <a href="https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model">networking requirements Kubernetes imposes on the nodes and pods</a>. These are the following:</p>

<ul>
<li><p>All containers can communicate with all other containers without NAT (<a href="https://en.wikipedia.org/wiki/Network_address_translation">Network Address Translation</a>).</p></li>
<li><p>All nodes can communicate with all containers (and vice-versa) without NAT.</p></li>
<li><p>The IP that a container sees itself as is the same IP that others see it as.</p></li>
</ul>

<p>It is possible to use <a href="https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-achieve-this">any one of various networking options</a> that fit this model, with kubenet being the default. The above requirements sound relatively straightforward. One would think that each application container gets its IP. That is not the case, however, as it is not the application containers, but the <em>Pods</em> that get the IP addresses. Or in the words of the documentation:</p>

<blockquote>
  <p>Until now this document has talked about containers. In reality, Kubernetes
  applies IP addresses at the Pod scope - containers within a Pod share their
  network namespaces - including their IP address.</p>
</blockquote>

<p>You can also verify that pods can be reached by IP on the exposed port by getting the private network IP address of the container with <code>kubectl get pods -o wide</code>. Afterwards, log on to the Minikube node with the command <code>minikube ssh</code>. From within this node, you can query the service with <code>curl $IP_ADDRESS:8080</code>, which should return the response we have already seen.</p>

<p>How are pods that belong to the same replica set organized, in order to provide high availability, load balancing and discovery? The answer to this question is requires introducing another Kubernetes concept.</p>

<h2 id="services">Services</h2>

<p>I have been calling the tiny web application we have been using for demo purposes a service, but service has a totally different meaning in the Kubernetes world. A Kubernetes Service is an abstraction that allows loose coupling of pods to enable load balancing, discovery and routing. Through services, pods can be replaced and rotated without impacting the availability of an application. Let's start with a very simple example where we turn our simple Python application into a Service, which can be achieved with the following very simple command:</p>

<pre><code>kubectl expose deploy simple-python-app --port 8080
</code></pre>

<p>If you now run <code>kubectl get services</code>, you should see a list consisting of two entries: <code>kubernetes</code> and <code>simple-python-app</code>. The <code>kubernetes</code> service is a part of the infrastructure, and you shouldn't meddle with it. The other service is what we are looking for, especially the IP address, which is listed under the column <code>CLUSTER-IP</code>. We are interested in this IP address because it is something special. It's a <em>virtual IP</em> Kubernetes has reserved for the new service. In the same output, you can also see that the port 8080 is exposed. We can now log on to the minikube VM (which is a Kubernetes node) with <code>minikube ssh</code>, and query what is now truly a service with <code>curl $IP_ADDRESS:8080</code>, once more returning <code>Hello from the simple python app</code>. The network requirements mentioned above ensure the reachability of the service IP from the node.</p>

<p>Things get much more interesting when there are multiple pods in a replica set. In order to see the effect, let's use another service that provides more information in its response. This service is in the <code>kubernetes-repository</code> as <code>env-printer-app</code>. When the base path is called, it returns a print of the environment variables. Just like with the previous application, you can go ahead and create a container with the following command:</p>

<pre><code>docker build -t kube-tutorial/env-printer-app:v0.0.1 .
</code></pre>

<p>We will start the Deployment with a replica count of 3, which will cause Kubernetes to start 3 pods right away. To do so, use the following command:</p>

<pre><code>kubectl run env-printer-app \
     --image=kube-tutorial/env-printer-app:v0.0.1 \
     --image-pull-policy=Never \
     --replicas=3 \
     --port=8080
</code></pre>

<p>Now let's create a Service by exposing this Deployment with the following command, which is a slight modification of the expose command we used earlier:</p>

<pre><code>kubectl expose deploy env-printer-app --port 8080
</code></pre>

<p>A new service <code>env-printer-app</code> should pop up in the output of <code>kubectl get services</code>. Note the IP address for this service under <code>CLUSTER-IP</code> as <code>$IP_ADDRESS</code>, and log on to minikube via ssh again. Afterwards, run the following command a couple of times:</p>

<pre><code>curl -s $IP_ADDRESS:8080 | grep HOSTNAME
</code></pre>

<p>This command makes a request to the service endpoint, and filters the <code>HOSTNAME</code> environment variable out of it. You should observe that the hostname alternates between the various pod names. Kubernetes is distributing the requests among the replica pods for us, giving us load balancing out of the box.</p>

<p>This very short demo of services leads to more questions than answers. How does the service know which pods to hit when a request comes in, for example? Why can we contact our service only from within the cluster? How can we enable external access to it? Before we can answer these questions, however, we need to have a look at a better way of specifying deployments, services and other resources.</p>

<h2 id="usingthecommandlineversusmanifestfiles">Using the command line versus manifest files</h2>

<p>Until now, we have been using the command line interface to Kubernetes via <code>kubectl</code>. It is possible to get quite far with <code>kubectl</code>, as it is pretty complete, but it can become difficult to read, share with others, and organize in a repository. A much better method for organizing Kubernetes resources which adheres to the <em>infrastructure as code</em> mantra is using manifest files. These are either YAML or JSON files (although YAML is preferred) that specify in a more structured format the resources to be created and actions to be undertaken. A manifest file takes the form of a list of resources of different <em>kinds</em>, together with <em>metadata</em> and a <em>spec</em>. It is also common and recommended practice to specify the version of the API that is targeted with each entry. The different entries must be separated with a triple dash separator, which signifies the start of a new document in YAML. This separator is mandatory; if you leave it out, only the first item in a list will be processed.</p>

<p>The resource specifications are documented in great detail in the <a href="https://kubernetes.io/docs/api-reference/v1.7/">Kubernetes API documentation</a>. What's even better, however, is that the <code>kubectl</code> command is self-documenting. To get documentation on pods, you can use the <code>kubectl explain pods</code> command. This command will print, prefixed by a short description, the various fields a pod manifest can contain. In order to go deeper in this tree, you can run commands such as <code>kubectl explain pod.metadata.labels</code>, which will give more detailed information on individual fields.</p>

<p>If you have a look at the entry for <a href="https://kubernetes.io/docs/api-reference/v1.7/#deployment-v1beta1-apps">deployment</a> in either the online or command line documentation, you will see that the metadata field is same across all resources, and the name field is required. This field enables us to refer to resources in commands when we want to get detailed information or delete them, or cross-reference from other manifest files. The spec field is required to adhere to the <code>DeploymentSpec</code> configuration, which should have a <em>template</em> field that describes the pod to be deployed. This template, in turn, must have a metadata field itself, and a spec that should contain a list of containers. As per this specification, here is how to create the above deployment example for the <code>env-printer-app</code>, in YAML format:</p>

<pre><code>apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: env-printer-app
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: env-printer-app
    spec:
      containers:
      - image: twyla.io/env-printer-app:v.0.0.1
        imagePullPolicy: IfNotPresent
        name: env-printer-app
</code></pre>

<p>It is possible to see a common pattern of nested resources that all have metadata which is used to refer to each other, templates that tell Kubernetes what kind of resources to create, and various other kinds of auxiliary information, such as the <em>replicas</em> field. You can now go ahead and use this YAML file, saved into <code>deploy.yaml</code> in the <code>kubernetes-repository/env-printer-app</code> directory, to create a deployment by running <code>kubectl apply -f deploy.yaml</code>. It is possible to create all resources in a directory by <code>kubectl apply -f</code> with the directory path.</p>

<p>You can also use <code>kubectl get KIND NAME -o yaml</code> to get a detailed description of a resource in YAML format. This YAML document might include much more than the information you supplied when creating a resource, as the values for the defaults you omitted, and those calculated or set by Kubernetes are also included. Another really great feature that relies on the YAML representation capabilities of Kubernetes (one of my favorite features) is <em>editing</em> a resource with the command <code>kubectl edit KIND NAME</code>. This command will fetch the resource description in YAML, and load it in the editor defined by the <code>EDITOR</code> (or <code>KUBE_EDITOR</code>, if it's defined) environment variable. Once you save your changes and exit, the new resource description will be applied to the resource. This is a great way to try things out quickly without having to keep multiple versions of resource definitions.</p>

<h2 id="servicescontinued">Services, continued</h2>

<p>Alright, where were we? So we have a bunch of containers running in Pods, provisioned and kept alive through Deployments, bundled into a Service that puts them behind a common IP. And we can put all of these into one or more YAML files to recreate them arbitrarily. This is a good point to explain one very interesting and versatile feature of Kubernetes: Selectors. If you go ahead and get the details of the <code>env-printer-app</code> service we have created above with <code>kubectl describe service env-printer-app</code>, you should see a row that begins with ~Selector: ~. This selector configuration tells you how Kubernetes finds the pods it should collect behind the virtual IP of the service. If you didn't do anything funky in the meanwhile, the value of the selector row should be <code>run=env-printer-app</code>. If you describe the deployment targeted by this service with <code>kubectl describe deploy env-printer-app</code>, you will see exactly the same selector line. Services and deployments use the same mechanism to match the pods that they hit or control. Which pods are these? This question can be answered by filtering a search by label, as in the following command:</p>

<pre><code>kc get pods -l run=env-printer-app
</code></pre>

<p>Not surprisingly, these are the three pods created by the original deployment. This selector-based mechanism is used by many components in Kubernetes, and it is very versatile in that it allows custom labels. This opens up a whole lot of possibilities for different patterns, such as A/B deployments, rolling updates (which we will see later) and similar things.</p>

<p>What is thus happening is that a collection of pods, as picked by the <code>spec.selector</code> attribute, is exposed as a service on an IP. This is not the only way to expose a service, however: There are <a href="https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types">different kinds of Services</a> based on how this exposing happens. The default is the <strong>ClusterIP</strong> kind, which is what we have now. Other kinds are <code>NodePort</code>, where a service is exposed on the same port on all exposed nodes, <code>LoadBalancer</code> that uses a platform-native load balancer to expose a service to the outer world, and <code>ExternalName</code> which enables you to provide an <em>external</em> service on the local cluster as if it's an internal one.</p>

<p>These all have their use cases, but the <code>ClusterIP</code> service is the one that covers the most use cases, so we will concentrate on it here. Having multiple pods behind a single IP solves many problems, since Kubernetes also takes care of things like load balancing (done by randomly routing requests; a <a href="https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs">new proxy mode</a> will introduce more options) or managing modifications in target pod set. One thing it does not solve, however, is the problem of figuring out this IP in the first place. This is another point in which Kubernetes shines: Matching a name to an IP address is done using DNS on the internet, and Kubernetes builds on this common protocol by providing <a href="https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/">an internal DNS service</a> itself. By default, a DNS A record is created, pointing to the service IP, for each <code>ClusterIP</code> service. Hence, we should be able to refer our <code>env-printer-app</code> under this exact name. To see that this is the case, run the following command to run bash on a container:</p>

<pre><code>kubectl run my-shell --rm -ti --image cfmanteiga/alpine-bash-curl-jq bash
</code></pre>

<p>There are quite some arguments to this command, which need some explanining. The <code>--rm</code> switch tells kubectl to delete the deployment and the pod once the command is run, while <code>-ti</code> asks it to attach a tty to the container, and make it connect to the stdin of the container process. The <code>--image</code> argument specifies a lightweight alpine-based image with some debugging utilities, and the last argument is the command to use instead of the entry point of the container. In the shell that starts, you can now run <code>curl http://env-printer-app</code>, and enjoy the environment varliable list delivered by the service.</p>

<h2 id="ingress">Ingress</h2>

<p>Our service is now humming in the cluster, accepting requests when we hit it at <a href="http://env-printer-app">http://env-printer-app</a>. In order to make it available to the outer world, we need to do one last thing: Tell Kubernetes to route HTTP requests from the outside to a certain location to this service. This process is called Ingress, and Kubernetes offers a <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">complete system</a> to handle it. There are two things you need to enable to route requests to the env-printer-app from the outside:</p>

<ul>
<li><p>An Ingress controller, essentially a reverse proxy running within Kubernetes that can be configured using Kubernetes-native resources. The two built-in solutions are GCE and Nginx-based. In order to use the Nginx-based ingress controller on Minikube, you have to enable the extension with <code>minikube addons enable ingress</code>.</p></li>
<li><p>Ingress specifications. These are resources just like Pods and Deployments, and contain information on how to map incoming requests to services, serving as configuration for the aforementioned ingress controller.</p></li>
</ul>

<p>An Ingress specification for the env-printer-app is included in the sample project repo as <code>ingress.yml</code>. After activating the minikube ingress plugin, you can run <code>kubectl apply -f ingress.yml</code> to create an ingress that maps requests to <a href="http://env-printer">http://env-printer</a> to the <code>env-printer-app</code> service. In order to test the ingress, you need to first figure out the IP of the minikube VM with <code>minikube ip</code>, and then edit <code>/etc/hosts</code> on your computer, adding the line <code>$IP_ADDRESS env-printer</code>. You should now be able to navigate to <a href="http://env-printer">http://env-printer</a> in your browser, and see the output of the <code>env-printer-app</code> service.</p>

<h2 id="rollingupdates">Rolling updates</h2>

<p>Once you have a deployment managing a set of pods, there are a couple of things you can do with it to adapt to new conditions. First of these is scaling the set of containers to meet load conditions. One way of achieving this is using the <code>kubectl scale</code> command, as follows:</p>

<pre><code>kubectl scale deploy env-printer-app --replicas=4
</code></pre>

<p>Alternatively, you can use the <code>kubectl edit deploy env-printer-app</code> command to bring up an editor, and change the <code>spec.replicas</code> field to the required number. If you now run <code>kubectl describe deploy env-printer-app</code>, there should be a new scaling event in the Events section. When the number of replicas is changed, Kubernetes simply creates new pods, or terminates existing ones, without any further complications. It's a different situation when the container spec for a deployment is changed, however. Kubernetes, based on the strategy specified by the user, replaces the pods progressively, to enable a smooth transition from one set of pods to the other. This is called <em>rolling updates</em>.</p>

<p>In order to demo rolling updates, I added another project to the sample Kubernetes services repository, the <code>rollout-app</code>. You can go ahead and create the service by running <code>kubectl apply -f deploy.yml --record</code> in the app's directory, which will create the deployment, the service, and the ingress. The reason for the <code>--record</code> switch will be explained in a couple of paragraphs. If you edit your <code>/etc/hosts</code> file to add <a href="http://rollout-app">http://rollout-app</a> with the minikube IP, you should be able to navigate to this URL and see a big display of the port's hostname.</p>

<p>If you open <code>rollout-app/application.py</code>, you can see two peculiar things there. One is the <code>/healthz</code> endpoint that returns a simple <em>OK</em> message and nothing else, and the other is a <code>time.sleep(5)</code> before the app starts. The purpose of the <code>/healthz</code> endpoint might become clearer if you also look at the <code>deploy.yml</code> in the same directory; this endpoint is registered as a <code>readinessProbe</code> on the deployment. The readiness probe is a part of the <a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/">pod lifecycle system</a> of Kubernetes. Before this probe is valid (for HTTP probes, it must return a status code between 200 and 400), the new pod is not marked as "ready", and requests will not be routed to it. Due to the sleep of 5 seconds before our application is started, the pods of the <code>rollout-app</code> will not be ready for at least five seconds. Now let's have a look at how this delay interacts with the rolling updates feature of Kubernetes. Once you have deployed the application, change <code>application.py</code> in some minor way, such as adding a newline. Afterwards, create a new docker container with a new tag with <code>docker build -t kubetutorial/rollout-app:v0.0.2 .</code>. Then go ahead and change the Docker image for the <code>rollout-app</code> deployment to the new version with the following command (again with the <code>--record</code> switch which will be explained later):</p>

<pre><code>kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record
</code></pre>

<p>Kubernetes gets to work right away, creating new pods and terminating the ones these are supposed to replace. You can see that this is the case by running <code>kubectl get pods</code>. One peculiar (or actually nice) thing is that Kubernetes does not just pull down the running pods, starting their replacements at the same time. <a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment">A rollout process</a> is applied, whereby new pods are created as old ones are taken down. You can follow this process by running the command <code>kubectl rollout status deploy rollout-app</code>. This command will hang with a message like <em>Waiting for rollout to finish: 2 of 3 updated replicas are available&#x2026;</em>. So now the deployment is in the middle of a rollout process. We will see where these numbers come from later. A rollout is actually the process of moving from one replica set to another. You can see that this is the case by running the command <code>kubectl get replicaset</code> (or replace <code>replicaset</code> with <code>rs</code> to make the command shorter). You should see two replica sets that begin with <code>replica-set</code>, one belonging to the old state, and the other belonging to the new state. The DESIRED, CURRENT and READY values of one should decrease, while the other one goes up and approaches required values.</p>

<p>One thing you can do is pause this rollout while it is in progress with <code>kubectl rollout pause deploy rollout-app</code>. This will leave the pod counts the way they are when you run the command, and give you the chance to run checks, to make sure everything is OK. Let's say that you start a rollout, pause it to run some checks, and discover that you made a mistake, and would like to rever to the previous version to fix the issue. This can be achieved by rolling <em>back</em> the rollout with <code>kubectl rollout undo deploy rollout-app</code>. But let's say that you want to move back even <em>further</em> in the deployment history. This is where the <code>--record</code> switch to the <code>kubectl apply</code> command comes into play. Thanks to this switch, we can now see the commands that caused a rollout on this deployment, and a version number that we can use to refer to that rollout. After you deploy version 0.0.2 of <code>rollout-app</code>, the output of the <code>kubectl rollout history deploy rollout-app</code> should be similar to the following:</p>

<pre><code>REVISION        CHANGE-CAUSE
1               kubectl apply --filename=deploy.yml --record=true
2               kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record=true
</code></pre>

<p>You can switch e.g. to revision 1 with the following command:</p>

<pre><code>kubectl rollout undo deploy rollout-app --to-revision=1
</code></pre>

<p>The rollout feature of Kubernetes is very well-designed and feature rich. Other things you can do are precisely control the number of percentage of pods that are replaced, or set conditions on failing rollouts so that they can be rolled back automatically by other tools.</p>

<h2 id="goingfurther">Going further</h2>

<p>Until now, I have been singing Kubernetes' praise, but not everything about it is perfect, unfortunately. We have run into a couple of issues building a Kubernetes cluster. Kubernetes, despite being a relatively young project, is under heavy development, and <a href="https://gravitational.com/blog/kubernetes-release-cycle/#">keeping up with it is not a simple job</a>. The development process is very well-managed, but nevertheless it is a full-time responsibility to keep up with the changes. This situation is mirrored on the provider side of things, as cloud vendors are racing to provide the best hosted Kubernetes solution possible, which also leads to considerable trial-and-error. Azure, for example, started off with a feature called <a href="https://github.com/Azure/ACS">ACS</a>, which was supposed to be a generic container management solution, but quickly recognized how popular Kubernetes was coming, and deprecated ACS in favor of <a href="https://github.com/Azure/AKS">AKS</a> which is directed solely towards Kubernetes, and has extra features such as redundant master nodes. Unfortunately, we are on ACS, and need to make the move to AKS at some point.</p>

<p>Another thing you have to keep in mind when running Kubernetes is that it has significant platform-dependent parts, and these are not uniform in terms of correctness and reliability. A short time after moving to Kubernetes on Azure, we found out that there was <a href="https://github.com/Azure/ACS/issues/12">a serious bug</a> with Kubernetes on ACS that makes the storage mounting feature of Kubernetes nearly unusable. Our solution is to rely as much as possible on the cloud offerings of Azure such as CosmosDB and managed PostgreSQL, but we will need to use local storage in a service at some point. Fortunately, <a href="https://github.com/kubernetes/kubernetes/pull/60183">the bug appears to be fixed</a> in Kubernetes 1.10.</p>

<p>As Kubernetes increases in feature set and complexity, tools built on Kubernetes to simplify workloads and provide more integrated workflows have also started popping up. <a href="https://twitter.com/kelseyhightower/status/969616896604581888">Kubernetes was never meant as the last application level</a>, meaning that there will be tools that build up on it for specific developer workflows, which is already happening. It looks like Helm is the most popular choice on this front, but there are other alternatives such as OpenShift. So be prepared to learn another tool that runs on top of Kubernetes in the near future.</p>

<h2 id="bonusshellhelpers">Bonus: Shell Helpers</h2>

<p>There are a couple motions you repeat over and over when you are working on a Kubernetes cluster. One of these is getting the name of a pod. As the pod name is derived from the name of the deployment, you end up running <code>kubectl get pods</code> and either grepping it searching it visually. In the case of single-pod deployments, fetching the name of the pod is very eash with the following bash function:</p>

<pre><code>function podname {
    kc get pods | grep $1 | awk '{print $1}';
}
</code></pre>

<p>If you want the name of the <code>simple-python-app</code> pod, for example, you would need to run something as simple as <code>podname simple</code>. You can also use this function as argument to other kubectl commands, e.g. to print the logs with <code>kubectl logs `podname simple`</code>.</p>

<p>Another handy snippet (written by my Bash Jedi Master friend Matthias Krull) is the following, which lets you switch between Kubernetes configurations like between Python virtual environments:</p>

<pre><code>function kubeon {
    if [ "${1}" ]; then
        local config_file="${1}"
    else
        echo "Usage: kubeon &lt;config|config_file&gt;"
        return 1
    fi

    if [ ! -f "${1}" ]; then
        config_file="${HOME}/.kube/${1}"
    fi

    if [ ! -f "${config_file}" ]; then
        echo "No config file found. Tried ${1} and ${config_file}"
        return 1
    fi

    export KUBECONFIG="${HOME}/.kube/${1}"
    export KUBEON_PROMPT="${1}"
    export KUBE_MASTER=$(kubectl config view|grep server:|cut -d/ -f3)

}
</code></pre>

<p>Using this function, you can set any one of the configuration files in your <code>~/.kube</code> directory as the current configuration with <code>kubeon filename</code>. Among the variables set are <code>KUBEON_PROMPT</code>, which you can use in your <code>PS1</code> to visualize the active Kubernetes configuration, and the <code>KUBE_MASTER</code> URL which might come in handy if you want to SSH to it.</p>]]></content:encoded></item><item><title><![CDATA[KubeCon Impressions]]></title><description><![CDATA[<p>I had the chance to attend KubeCon Europe in Copenhagen last week, and it was a total blast. The attendance was huge, with developers from all over the world, and I had great conversations with many different people. There were countless talks of all levels, and many of them (especially</p>]]></description><link>http://okigiveup.net/kubecon-impressions/</link><guid isPermaLink="false">f8f1cf9a-f3d6-4a2c-a962-f4a012ccbbb4</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Tue, 08 May 2018 13:33:47 GMT</pubDate><content:encoded><![CDATA[<p>I had the chance to attend KubeCon Europe in Copenhagen last week, and it was a total blast. The attendance was huge, with developers from all over the world, and I had great conversations with many different people. There were countless talks of all levels, and many of them (especially keynotes) by core committers to many projects from CNCF (Cloud Native Containers Foundation). In this post, I would like to gather my impressions on what I think were the main themes, and some tendencies and future directions that I think the CNCF, Kubernetes and the other projects will take. In case you wonder who I am, by the way, I'm the guy who walked around the whole conference in a Kramer hairdo, because his hair gel got confiscated at the airport.</p>

<p>The short name of the conference and the Twitter tag was KubeCon, but it was in fact an umbrella conference by the Cloud Native Foundation. There will apparently now be one in Europe, one in the US and another in China each year. I think this is a great idea, because there are a great many people who either can't afford or don't want to go through the hassle of obtaining a Visa, such as myself. The future of Kubernetes was of course a major topic; Aparna Sinha gave <a href="https://www.youtube.com/watch?v%3D2eAOx8E6-5Q">a keynote on the state of Kubernetes</a>, especially regarding how it is hosted on GKE. Most of her talk was oriented around how enterprises are accepting Kubernetes, and what kind of developments they expect. Security was a huge topic, with enhancements to authorization, RBAC and pod permissions on the list. A new project from Google named <a href="https://github.com/google/gvisor">gVisor</a> was released just recently, bringing very simple sandboxed containers to Kubernetes (there was another talk later just on gVisor). On the application front, better support for stateful applications in the form of <em>application operators</em> was mentioned, but I didn't quite get what was new about this. There is already <a href="https://coreos.com/blog/introducing-operator-framework">the operator framework by CoreOS</a>, and it sounded like Sinha was talking about the exact same thing, with common features such as application lifecycle operations, backup, restore, monitoring etc. But maybe I missed something; do let me know in the comments if this is a new feature.</p>

<p>How the enterprise is discovering (or discovered and is now getting involved in more deeply with) Kubernetes, and how Kubernetes is also developing in that direction, was a topic that came up frequently in talks and chats with attendees. There was a very interesting presentation by two developers from a consultancy in China who talked about a project they did for the central Chinese banking authority (The Visa <em>and</em> MasterCard of China, as one presenter said). As one would expect from an organization of that size, they had to come up with a rather complicated setup for security and reliability; there were multiple checks for who could do what, and what could be deployed by whom. Security is obviously one of the things that early adopters may ignore, but enterprises like these care a lot about, but as this talk displayed, Kubernetes has made huge advances in this area.</p>

<p>All the big cloud vendors were at the KubeCon, as one would expect, either advertising or actually revealing their hosted Kubernetes solutions. DigitalOcean announced <a href="https://www.digitalocean.com/products/kubernetes/">a hosted Kubernetes solution</a> on the second day of KubeCon; it is yet in early access stage, but will be available soon. The common thing about all these hosted solutions was that they promised to handle the major pains of hosting Kubernetes, such as updates. While the big cloud vendors were targeting the difficulties of <em>running</em> a Kubernetes clusters, other vendors were advertising super easy ways of running an application in a Kubernetes cluster. The presenters of <a href="https://www.youtube.com/watch?v%3DgDGT4Gf_4JM">one talk I attended</a> demoed a service called <a href="https://hasura.io/">hasura.io</a> where it was possible to simply push code to a Git repo, and have it deployed to a pod in a Kubernetes cluster. The description of the cluster is included as YAML files in a repo, and it is possible to attach these descriptions to a cluster using a CLI client. Once that is done, all git push events deploy to the cluster.</p>

<p>Which brings me to what I think is another trend that has been very obvious in this KubeCon: GitOps. Alexis Richardson mentioned this in his keynote, and he came up with the name as far as I could understand. He also went into more depth in <a href="https://www.youtube.com/watch?v%3DVkKMf23ZokY">a separate talk on how to implement it on Istio</a>, which I missed and had to watch separately later. One half of GitOps is method-wise the same as the "infrastructure as code" part of devops, in that the system is described in declarative terms and stored in a shared repository. What's new is a much tighter connection between new code in a Git repository, and its availability in the cluster. The aforementioned hasura.io is a platform for achieving this connection. Weaveworks implemented their own internal version using operators, which were mentioned above. These operators listen to Git repositories, update services and deployments based on changes, and report the current state to observability tools. The originator of the push-is-deploy kind of flow is of course Heroku, which was mentioned every time the topic came up. It looks like GitOps will be the Kubernetes-based, more generic method of achieving the same workflow. The way I have explained it here is kind of an oversimplification; I would advise you to have a look at the presentation. I would also expect more tooling support to appear and also be standardized in the near future.</p>

<p>Kubernetes offers a very straightforward pattern of component integration. Components can be deployed as pods managed by Kubernetes itself, accessing data from application pods, and changing cluster state based on specifications stored in the etcd data store. An interesting example of this pattern could be witnessed in a demo for Fluent Bit, where pods could be annotated according the kind of log they output, and the output would be parsed accordingly. A core part of this integration pattern is Prometheus as the main source of observability. All pods make data available in the format Prometheus understands, from where it is posted mostly to Grafana for visibility and alarms. There is now also a slew of new applications that are, as per the name of the foundation, cloud-native and first-class citizens of Kubernetes. This means that they play well with the pod lifecycle elements of Kubernetes, are Prometheus-observable, and can cluster easily in a container network. Another common feature is that they are relatively simple, just like Prometheus itself, and concentrate on doing one job well. This point was very prominent in one talk I attended on <a href="https://nats.io/">Nats</a>, a new message queue whose developers refrained from implementing many standard features in other message queues (message headers, complex routing logic etc), opting instead for performance and reliability.</p>

<p>These various components make life easier and enable continuous scaling and growth of the cluster. Their interaction in a living and changing cluster can get rather complex, however, and minor mismatches can lead to serious issues. This point was driven home in <a href="https://www.youtube.com/watch?v%3DOUYTNywPk-s">one excellent keynote by Oliver Beattie</a>, the CTO of Monzo, an online bank. He explained an outage which took 1.5 hours to fix. <a href="https://twitter.com/obeattie/status/925100706473955328?lang%3Den">The post mortem</a> is pretty good reading, and shows how the interplay of various pieces of complex software can have unexpected error cases. In this case, one of the root causes was an incompatibility between specific versions of Linkerd and Kubernetes, stemming from the representation of an empty service being changed from an empty list to null in Kubernetes. On the one hand, this hits a pet peeves of mine, namely preferring the null value of a type (empty list in this case) instead of null or none. On the other hand, more generally speaking, this is an issue that I think will become more and more acute in the near future. As the number of components used in a cluster and the frequency of updates to them grows, the chances of one or more of those components interacting together to cause issues will also increase.</p>

<p>The solution to component combinatorial explosion might be another practice on which there were two talks by Sylvain Hellegouarch: Chaos Engineering. <a href="https://www.youtube.com/watch?v%3DnfY7BM9KO0g">The second of these talks</a> went into more detail on what I think will be a more and more accepted means of improving the understanding and reliability of complex clusters. <a href="http://principlesofchaos.org/">Chaos engineering</a> is "the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production". Sylvain explained the usage of the <a href="http://chaostoolkit.org/">chaos toolkit</a>, which can run pre-planned tests in which load conditions are created, the cluster is "mutilated", and then the reaction of the cluster is tested against the given criteria. The toolkit then creates a report, replete with graphs and detailed information on whether and how the cluster recovered. A couple of points stressed by Sylvain was that chaos engineering is not an effort to simply break a cluster, but probe it with knowledge of what <em>can</em> actually go wrong. The probing is done with a certain aim, such as the cluster repairing itself or alarms going off. The concrete aim is to unearth weaknesses, which are definitely there, to know what to do in critical situations, and instill more trust in the system by being prepared for difficult situations. Chaos engineering is a practice I definitely intend to introduce into our development team. I think it should be done instead of simple load testing, where optimum working conditions are usually taken as given. Instead, for such an in-depth test to deliver useful and relevant information, proper load conditions need to be combined with changes to system and service pods and failure conditions in various places.</p>

<p>Not all was nice and dandy at the KubeCon. A major point of disagreement between presenters: How to pronounce "kubectl". To my horror, the majority pronounced it "kube-cuttle", which is just wrong. kubectl doesn't have anything to do with cuddling or cuttlefish; it's for controlling Kubernetes, ergo <em>kube control</em>. I guess I will have to wait until the next KubeCon to settle this point with a talk of my own.</p>

<p>One last note: I'm nearly done with an introductory Kubernetes tutorial, which should be published in a couple of days. <a href="https://twitter.com/morphotactics">Follow me on Twitter</a> to be informed when it's online.</p>]]></content:encoded></item><item><title><![CDATA[Big Software]]></title><description><![CDATA[<blockquote>
  <p>"The grateful moon has granted the city of Lalage a rarer privilege:
  to grow in lightness" - Italo Calvino</p>
</blockquote>

<p>A number of software projects I had the pleasure of working on were what I later came to think of as big software. They had common qualities that led the development</p>]]></description><link>http://okigiveup.net/big-software/</link><guid isPermaLink="false">f274966c-1531-4bf9-9533-51fdcd135fb8</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Tue, 11 Jul 2017 13:28:32 GMT</pubDate><content:encoded><![CDATA[<blockquote>
  <p>"The grateful moon has granted the city of Lalage a rarer privilege:
  to grow in lightness" - Italo Calvino</p>
</blockquote>

<p>A number of software projects I had the pleasure of working on were what I later came to think of as big software. They had common qualities that led the development team to work in a certain way, perpetuating these characteristics in a cycle. These common qualities should be thought of as umbrella terms; not all big software systems have each and every one of them, and none of them are strictly required. In the following, I would like to describe these qualities, and how they are related to each other.</p>

<p>There is an undeniable attractiveness, or at least <strong>maze-like quality</strong> to big software. If one did not have to change it to satisfy clients, having to keep the whole edifice running in the process, diving into big software could even be considered a fun and revealing exercise. Navigating the parallel paths, conflicts, bolted-on suburbs and dead ends, one could learn about the individual tendencies and social tensions that led to such a system. But this would be software archeology work, fundamentally different from maintenance and extending.</p>

<p>Changing big software is like writing as Philip Roth describes it: <a href="https://www.farnamstreetblog.com/2013/05/philip-roth-one-skill-that-every-writer-needs/">In most professions there's a beginning, a middle, and an end. With writing, it's always beginning again</a>. Every change opens a new can of worms, and closing it is temporary. The conflicts and tears are discovered only when change is attempted, and the change introduces new ones itself, because it pulls the software in yet another direction, in yet another manner. Reconciling the various demands on the code is impossible, as such chances are perpetually delayed. In professional programming work, there is little more satisfying than refactoring big software with proper (frequently archeological) knowledge, and enjoying the simplifications and dead code that result.</p>

<p>All sophisticated software is unpredictable, becoming a complex system in the limit. <strong>Big software turns complexity into an art form</strong>. Any factor &#x2013;code, operating environment, external systems&#x2013; can have cascading effects on the behavior of the system. The only way to reliably find out what the system will do is to run it on the live production system with real data &#x2013; and this not because production is particularly reliable, but because it's what matters for the users. Even then, the behavior will change from one moment to the other, because of subtle changes in the environment. Effects in big software are nonlocal and disproportionate. A developer can never be certain that what she thinks is the core location of a certain functionality is the only relevant place to look. Simple changes to the environment or code might cause ripple effects.</p>

<p>Delivering big software is a complicated process that depends on many other components of software, online resources, and special conditions. <strong>The delivery of changes to users is in no corellation to the size of the change</strong>. Since the effects of even minor changes are unforeseen, complex testing mechanisms that take a long time to run exercise all software, for every feature and regression. Many security checks that are themselves complicated due to their target are built into the delivery mechanism, which causes the build to be long and fragile.  These mechanisms cannot be forfeited, however, because they are the last barrier to the application disintegrating on delivery, or at least they are perceived to be so. Even if the change to be deployed is tiny, it takes hours, if not days, to deliver, because the baseline for integration and deployment is big.</p>

<p>Once in operation, big software is difficult to observe. In order to navigate the immense complexity in operation, very detailed logs are emitted. Understanding and evaluating these logs becomes a domain of its own, with its own independent logic. There is a fallacy hiding here, similar to that of expecting badly written code to become more comprehensible through comments. It is the belief that a complex system can be understood through a large amount of logs. The same diffuse, uncoordinated approach is applied to error handling. In oder to fight the reliability demons, code is written in a very defensive manner.  Default values are used for missing data, errors are caught and handled in different ways in multiple frames. These practices serve to hide errors, in that they are not perceived unless brought up by the clients. In case one of these errors actually manages to surface, the core reason has to be assembled from a number of code locations. As per the <a href="https://en.wikipedia.org/wiki/Systemantics#System_failure">Fundamental Failure Mode Theorem</a>, complex systems usually operate in a failure mode; the reflection of this theorem in big software is that <strong>big software has ambiguous error conditions. Distinguishing correct functionality from incorrect is difficult</strong>.</p>

<p>Due to the reasons listed, integrating new code with big software is a royal pain. This leads to a mindset of not undoing work, of letting things chug along as long as there is no urgent reason to rip things out. After a while, it becomes practically impossible to remove things. Thus, it is difficult to scale big up, but down is even more difficult: <a href="https://arxiv.org/pdf/1603.01416.pdf">Big cannot scale down</a>. Regarding resources or scope, big software will not accept any limits. This is also the fundamental source of big's fragility. <strong>As big grows, the impact necessary to cause a failure becomes smaller compared to its size. As it cannot scale down, however, the impact threshold does not go down, even when the system is doing less, in terms of load or functionality.</strong> That is, even when the system is used less, for fewer functions, it will keep on breaking as often, and need the same amount of maintenance.</p>

<p>A second-degree quality of big software is a result of the way big software systems working with each other store data. Big software usually interfaces with multiple other big systems. These systems share a lot of representational data, things that are supposed to correspond to a shared reality out in the world. These "facts" frequently diverge from each other, however, but not because the facts change due to transmission or storage errors, but because differences in representation lead, over time, to differences in content. Take e-commerce, for example. There are no two e-commerce systems in the world that represent an order in even remotely similar ways. Some store consumer data in independent tables, tying these to orders, whereas others store all such data on the order itself. The product information is either line or item-based, and the primary reference for a product can be one of many different formats. In order to account for the mismatches between the different storage formats, logic is applied to transform data when it crosses between boundaries. This logic is neither static nor lossless. It changes over time, and data transported from system <code>X</code> to system <code>Y</code> cannot be transported back, into its original format. <strong>As data is shuffled from one big system to the other, their representations of reality become multi-faceted, rich, and correct and incorrect at the same time.</strong></p>

<p>Considering all the negative and weird things that accompany it, the surprising thing about big software is that there is a lot of it running and keeping customers moderately happy. It has also made a decent amount of money for some people. <strong>The inescapable conclusion is that big software still gets most of the job done, most of the time, and its clients are happy.</strong> For me, as a developer, the more relevant question is why we have to work with such systems. There is the fact that sometimes one simply <em>has</em> to work on big software. It might be legacy software that has to keep on running, maybe of one's own doing. It is not infrequent that a team outgrows its methods and tools, getting stuck in a system that served as a ladder used to climb to a deeper design understanding. There are also cases where a team (or more frequently in this case, individual developers) have no problem working on big software, despite recognizing the issues. There is a certain joy in working with big software, as alluded to in the beginning. It gives the programmer a sense of working with something big, complex, beyond the capabilities of others. The bug fixes and feature changes are as big as the software itself. Meeting this challenge provides its own satisfaction. What is forgotten in day-to-day efforts, however, is that it's not possible to grow in lightness in such big steps.</p>]]></content:encoded></item><item><title><![CDATA[Arguments against JSON-driven development]]></title><description><![CDATA[<p>I don't have to explain how popular JSON is. There are very few projects that don't need to work with JSON, even when they are not related to network programming. The ubiquity of JSON is causing some developers to rely on it a bit too much, however. I'm witnessing this</p>]]></description><link>http://okigiveup.net/arguments-against-json-driven-development/</link><guid isPermaLink="false">8638f3fd-f698-45db-883c-e30851c42290</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Tue, 23 Aug 2016 07:38:36 GMT</pubDate><content:encoded><![CDATA[<p>I don't have to explain how popular JSON is. There are very few projects that don't need to work with JSON, even when they are not related to network programming. The ubiquity of JSON is causing some developers to rely on it a bit too much, however. I'm witnessing this in the Python world, but would not be surprised to hear that it happens with other languages too. List, dictionary and primitive types have become the exclusive building blocks in many projects, to the detriment of code quality. These days, it's not unusual to see something like the following:</p>

<script src="https://gist.github.com/afroisalreadyinu/63d7f839d93555d9e94a9b409319ec89.js"></script>

<p>This function receives and returns dictionaries that have either primitives or lists as values. It builds a dictionary by checking for keys and iterating over values, which leads to item lookup and list iteration being strewn all over the place. This coding style (let's call it the JSON-driven style) has a number of serious disadvantages:</p>

<p><strong>It completely defeats object orientation.</strong> The above code is C without pointers. It offers nothing of the abstraction powers of object orientation. With dictionaries, there is no encapsulation. Let's say that you want to change the way the cell labels are accessed. You would have to touch the above function, although it's not strictly its business. I know that it's now en vogue to sneer at OO, but done right, it can be very powerful, especially in big and complex codebases. Use dictionaries, and you throw that out the window.</p>

<p><strong>It doesn't say what it's doing.</strong> This is mostly a result of the previous point regarding OO, but deserves its own discussion. The above code is filled with auxiliary logic that has nothing to do with what it actually tries to achieve. For example, the for loop matches books from the persistency to shops, but there is nothing there that even remotely signals that; you have to read the code in detail and build the idea yourself.</p>

<p><strong>It doesn't use the excellent built-in object system.</strong> Python's object system (or the protocol) is beautifully designed, and very powerful. It has features like properties, dynamic attribute lookup with <code>getattr</code>, and all kinds of metaprogramming magic. Anyone who has worked with one of the Python ORMs such as SQLAlchemy or the Django ORM will know how much can be achieved with these relatively straightforward tools. The above code completely skips that machinery, and gives the developer only loops and key lookup as tools. The resulting code is accordingly primitive.</p>

<p><strong>The infestation is difficult to control.</strong> Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them. When you start working and thinking with dictionaries, you also use them even when you don't have to, or when you shouldn't. This will also inhibit discovery of more interesting features of Python which might actually improve your code.</p>

<p><strong>It's error-prone.</strong> Dictionaries and lists don't give you any guarantees about their contents. Each time you access something, either the exception case has to be checked or handled propery, or you have to live with various kinds of exceptions. No one in his right mind would do the first, which leaves the second. In the above code, for example, every key lookup could throw a <code>KeyError</code>. One could say that using objects is not much different, since accessing invalid attributes on an object also causes an exception, but the responsibility for setting and handling attributes is localized to the class in the case of objects, and you don't have to distribute it all over the codebase.</p>

<p><strong>It's ugly as sin.</strong> One of the distinguishing features of Python as a language is that good Python code also looks good in an editor. It's not zigzagged, there aren't any large or deep indentation blocks, and there is a rhythm to the size of the different scopes such as functions and classes. When you use lists and dictionaries, however, you are bound to frequently check for membership and existence of keys, which makes achieving this aesthetics virtually impossible.</p>

<h2 id="whattodo">What to do</h2>

<p>Here is what you should do: Take the popular advice relating to Unicode, and apply it to JSON. The fundamental advice on Unicode is <a href="http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-not-just-assume-utf-8-and-avoid-having-to-decode-at-system-boundaries">decode and encode on system boundaries</a>. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation. Work with the business logic objects, which give you all the OO niceties plus Python's object protocol. Once you are done, decode these objects into JSON again, and send them to wherever they are needed.</p>

<h4 id="update">Update</h4>

<p>I had a very interesting discussion with my colleague Mouad, and he pointed out two things. The first is the danger of creating <a href="http://www.martinfowler.com/bliki/AnemicDomainModel.html">anemic objects</a>, i.e. objects with only data fields and no behavior, and then using these in functions such as the above instead of dictionaries. This of course beats the purpose of having objects, since you are only delegating the dictionary business to the <code>__dict__</code> attribute of the objects. Real business objects encapsulate their logic. The other topic is the performance aspect. To be perfectly honest, I didn't think about performance at all when writing this. I usually stick to the adage of <em>Make it work, make it good, make it fast</em>. However, if you are working within strict performance bounds, and don't want to get into any monkey business such as compiling C extensions, which might complicate deployment more than necessary, it might make sense to use dictionaries and lists in the critical places, since they are highly optimized. </p>]]></content:encoded></item><item><title><![CDATA[Why I'm not a big fan of Scrum]]></title><description><![CDATA[<p>Scrum is now the default agile software development methodology. This management framework, which is "simple to understand but difficult to master", is used by <a href="https://www.scrumalliance.org/why-scrum/who-uses-scrum">66% of all agile companies</a>. After two extensive workshops, more than five years, and a couple hundreds of sprints working in Scrum, I have some points</p>]]></description><link>http://okigiveup.net/not-big-fan-of-scrum/</link><guid isPermaLink="false">c395a5d5-c066-4db7-bb79-2e452923a92b</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Mon, 11 Jul 2016 12:28:08 GMT</pubDate><content:encoded><![CDATA[<p>Scrum is now the default agile software development methodology. This management framework, which is "simple to understand but difficult to master", is used by <a href="https://www.scrumalliance.org/why-scrum/who-uses-scrum">66% of all agile companies</a>. After two extensive workshops, more than five years, and a couple hundreds of sprints working in Scrum, I have some points of criticism about it. I think it's not naturally conducive to good software, it requires too much planing effort on the part of the developers, and it inhibits real change and improvement. In the following, I will try to put these into more detail by organizing them around more concrete topics.</p>

<p>Before you go to the comments section to tell me that I have no idea what I'm talking about, please keep in mind a few things. First of all, this is not a rant against agile. I'm a big fan of agile, as it is explained in e.g. <a href="http://www.martinfowler.com/articles/newMethodology.html">The New Methodology</a>, and I believe that the potential of this concept has not been exhausted yet. Also, I'm not against every idea and practice in Scrum. For example, the principles of the whole team taking responsibility for the code base, or always having an integrated, working master are really awesome. Last but not least, the following points are directed against standard Scrum as described in the <a href="http://www.scrumguides.org/docs/scrumguide/v1/scrum-guide-us.pdf">official guide</a>. If you are doing something totally different but still calling it Scrum, this post is probably not so relevant for you.</p>

<p>One thing I would like to refrain from here is anecdotal evidence. My individual experiences, as far as they are not related to the proper Scrum entities, are not really relevant, since many of them are individual mistakes, and will therefore intentionally be left out from the following.</p>

<h3 id="obsessionwithpointsaidsec01namesec01a">Obsession with points<a id="sec-0-1" name="sec-0-1"></a></h3>

<p>The use of story points appears to be one of the defining features of Scrum. Each user story is given a certain count of story points, and a team is confronted with the number of points it has "achieved" at the end of a sprint. Some teams also follow their daily progress in terms of story points with a burndown chart that is consulted every day at stand-up. The points collected at the end of a sprint consitutes the <em>velocity</em> of a team, and the team is expected to keep this velocity. During planning, the team takes on stories it thinks it can finish until the end of the sprint by looking at the velocity from previous sprints. The velocity of the teams serves to project an estimate of what can be achieved in the future, for the purpose of business planning.</p>

<p>There are many murky things about story points, but somehow the scrum masters and coaches will not abandon it. First of all, what are story points?  Are they measures of time it takes to complete a story? If yes, then why are they not in terms of time? Are they measures of complexity? If yes, why are we not talking of the complexity of stories, and how we can remove them, instead of how we can achieve as many points as possible? That is, shouldn't we be talking about doing as few points as possible? The best measure I have heard is effort. You can work three hours on half effort, but work hard for an hour and finish a task, which explains why it's not about time.</p>

<p>No matter how you define story points, the real issue with them doesn't go away. The main purpose of points is making planning more reliable, and providing a temporal perspective for business.  They never fail to take on a life of their own, however, with teams working to gather points instead of delivering good software. I don't understand why points are special compared to the oft-mocked cases of bug hunts or lines of code written. If devs are measured on points, they will optimize on points. Has the code base improved? Did it become more modular, simpler, <em>habitable</em> (see the section <em>Habitability and Piecemeal Growth</em> in <a href="https://www.dreamsongs.com/Files/PatternsOfSoftware.pdf">this book (pdf)</a> by Richard P. Gabriel)? None of these questions is of relevance. The points have to be gathered.  The spice has to flow. That's what counts.</p>

<p>One can of course counter that if you write stories for accomplishing these counter-examples, you would get points for them. But the point of stories is that they have acceptance criteria that can be tested for, and demo'ed at the end of a sprint (see the point on creating user value below). How can you demo that your code base has become more habitable? Will the acceptance criterion be that the code is "just, you know, nicer"? In practice, refactoring that aims to improve existing code is done as a part of new stories. Instead of simply adding more to the spaghetti code that exists, you try to "leave the grounds better than you found", as per Pragmatic Programmer lore. This might well be true of simple refactoring where you move code, reorganize classes, or rename things, but the really complicated cases that require rethinking of the base abstractions cannot be covered with this simple recipe.</p>

<p>I definitely understand the need for making software development plannable for business purposes. If the business people cannot rely on some kind of an estimate as to how much can be achieved in a certain time frame, they are navigating in the dark, and the developers are in danger of losing their jobs. Programmers also occasionally dig deeper when they have already dug themselves into a hole, so it makes sense to set limits to stories. But there are, <em>must</em> be better ways to make reliable estimations of how much effort stories require.</p>

<h3 id="meetingextravaganzaaidsec02namesec02a">Meeting extravaganza<a id="sec-0-2" name="sec-0-2"></a></h3>

<p>Scrum meetings (aka rituals) have been among the most miserable hours of my life, and this is coming from someone with a number of visits at the Berlin foreigners office. First of all, they are long. I don't care that the meetings take place every 2 weeks if I'm going to be sitting there for three hours. They have too many attendants. Most of the stuff presented is not relevant for most people, but everyone comes because there might be something relevant, and because they have to. The review meeting causes utterly unnecessary anxiety (<em>Oh my god, will my feature work?</em>). It's as if the whole work of the sprint will get evaluated then and there (which in some Scrum implementations actually is the case), and you either get the points or don't, no matter how much thought you put into a piece of work. The app is now faster? Who cares, I don't get the exact response that was expected, so <em>no points for you</em>. One implicit requirement of every story is thus "should be reliably demoable to a roomful of people", which requires much more work than you would imagine (think payments).</p>

<p>In the planning meeting, you get to discuss with others about whether something is two points or five, and then actually list the things you are going to do. I presented my gripes with story points above, but in the context of planning meetings a few more sentences are in order. Why estimate stories that you are going to break down anyway?  The breakdown will be a much more detailed analysis of stories, so doing that would provide a much more precise estimate. Another thing that outright astonishes me is how little attention is paid to whether estimations are correct. This is one area where teams can learn the most, because incorrect estimates point to misunderstanding of the code base and domain, but decent review of estimates is rarely, if ever done. Tracking and estimate reviews would also enable <a href="http://www.joelonsoftware.com/items/2007/10/26.html">Monte Carlo simulation of delivery dates</a>, which sounds awesome, but is, again, rarely done.</p>

<p>Next up is retrospective. Frequent feedback meetings (in which also the estimates are reviewed) are actually a great idea, because the best opportunity to learn from something is right after it happened, but in Scrum, the retro is explicitly supposed to be about the Scrum process itself, not about the codebase, the technology stack or development patterns. So you have this hour for the team, and you are supposed to use it to talk about Scrum itself. Blergh.</p>

<p>The daily standup deserves a blog post of its own. This religious ritual has become a staple of every team in the world. Ten minutes of staring into the void, talking about what you did while no one else listens, because they were in the middle of something five minutes ago and will go back to it in another five minutes, and waiting for everyone else to finish. I know this sounds cynical, but it is the end result of asking people to do it every freaking day. Nowadays devs are communicating on all kinds of channels (email, Slack, Github/Gitlab, ticketing system) and tracking detailed progress on some of these. What's the point in having them stand around for another ten minutes to repeat a few standard sentences? The daily standup is in my opinion a manifestation of a significant but unspoken component of Scrum: Control. The main goal of Scrum is to minimize risk and make sure the developers do not deviate from the plan. I will come back to "Scrum controlmania" later.</p>

<p>One problem Scrum meetings share with all other meetings is that they are synchronous. For teams working remotely, this can become a serious issue, because you have to synch across continents, ending up with people attending meetings at 7 in the morning on one side of the world, and at 4 in the afternoon on the other. This might sound like a simple scheduling problem, but synchronicity is more than that: It means cutting into people's daily routines to force information exchange that could as well be handled otherwise. As argued <a href="http://gilesbowkett.blogspot.de/2014/09/why-scrum-should-basically-just-die-in.html?m%3D1,">here</a>, the agile manifesto is complicit in this meeting obsession, due to its emphasis on face-to-face communication. What I have a hard time understanding is why the ancient, simple communication form of <em>text</em> is given second seat. The truth of the matter is that, especially under the constraint of distributed teams, it's difficult to beat text. It is definitely true that writing well without offending others is not the simplest thing in the world, but why not educate the developers and stakeholders in this dark art? They will have to learn to communicate anyway, so you might target this asynchronous mode of communication supported by all tools out there. Text is the best means of communication, and a team that masters it will have a huge advantage. Scrum, however, does not build on text, but on meetings.</p>

<h3 id="sprintuntilyourtongueishangingoutaidsec03namesec03a">Sprint until your tongue is hanging out<a id="sec-0-3" name="sec-0-3"></a></h3>

<p>Scrum is organized in units of sprints. A sprint is an iteration in which work is done, evaluated, and the process is adapted. The idea of the sprint is that the developers take on a certain amount of work, and do their best to finish it, as in, you know, they sprint. Nobody is allowed to change the acceptance criteria of the stories in the sprint, or add/remove stories. The sprint has its own backlog, which can be changed only in agreement with the team and the product owner. I find the idea that you should get somewhere by sprinting repeatedly rather weird. As any developer will tell you, software development is a marathon, not a series of sprints. But let's forget the semantic point for a moment, since it's a bit too obvious, and scrum proponents could claim it's just a convention that does not have to reflect the actual spirit.</p>

<p>But still, why the artificial two weeks unit? In the above mentioned guide, there is even talk of four weeks. Four weeks is a lot of time, and it is an ordinary occurence that one or more stories become superfluous the way they were written, or other, more urgent things come into focus. If the aim is to be agile, why not accept this as the correct way to work in the first place? In my experience, two weeks is too long for review purposes, too: It's impossible to remember at the end of the sprint what bothered or satisfied you in the beginning.  If you shorten it to one week, however, it feels like spending twice the time in the scrum rituals, although they might be shorter.</p>

<p>There is a more fundamental problem with the sprint idea, in my opinion. The reason software is so difficult to plan is that you discover new things about the problem at hand and your idea of a solution as you implement it. These discoveries affect not only the estimate, but also the actual path you are taking to the solution (as excellently described in <a href="https://www.quora.com/Why-are-software-development-task-estimations-regularly-off-by-a-factor-of-2-3/answer/Michael-Wolfe?srid%3DdVzo">this Quora answer</a>). The immediate work items, which consitutes the head of the backlog, is the most affected by these discoveries. So essentially, a sprint is working on a frozen set of items that are most prone to change within that time frame. This is also relevant for the point made above, of assuming a too linear trajectory for software development.</p>

<h3 id="oversimplificationofdevelopmentprocessaidsec04namesec04a">Oversimplification of development process<a id="sec-0-4" name="sec-0-4"></a></h3>

<p>What's so difficult about software development? Write stories, put them on a board, split them into tasks, and then start processing them from the top to the bottom. Gather points by closing stories, pick a new story after closing one, and watch your burndown chart go down. There are a million complications with this approach, of course. How should the teams manage dependencies among each others' backlogs? Can I collaborate with someone, or make sure that I'm not stepping on someone else's toes? One of the most central questions of large-scale software development alluded to above is how to rearrange work in the face of new discoveries as you are actually working; <a href="http://okigiveup.net/the-wooden-boat-of-software/">how to rebuild the ship while you're sailing</a>, so to say. This does happen in Scrum within the sprint, and the results of the sprint flow into the next planning session, but it is not foreseen, or even taken to be possible, that the development team can rearrange work <em>while</em> it is making progress.</p>

<p>The Scrum coach will find fifty ways of attacking each and every one of these topics, but all of them will be in the form of <em>one more thing</em>. One more meeting, one more document, one more backlog, one more item in the definition of done. The development process of a scrum team resembles one of those overly-pimped cars after a while: There are so many fancy bits and pieces that the actual car is not recognizable underneath anymore. The development process starts to resemble the oft mocked enterprise development process, where devs are occupied with attending meetings and filling up some documents more than anything else. Talking about the code the team is writing, and how to improve the codebase, might just be one of the meetings among others, if it at all exists.</p>

<h3 id="creatingcustomervalueaidsec05namesec05a">Creating customer value<a id="sec-0-5" name="sec-0-5"></a></h3>

<p>Every story in scrum has to end in customer value. The acceptance criteria have to explicitly state what the customers will derive from the results of that story, in the well-known "As a &#x2026;" format. The idea sounds great in its simplicity, but leads to some really convoluted results when taken to the extreme (which Scrum masters have consistently told me should be done). The most obvious thing is refactoring, already mentioned above. If neither the behavior nor performance change, why even bother with refactoring? And one thing I would be ready to bet my career on is, if you want to develop quality software, you should always be refactoring. As an engineer, I care about many things that will not lead to more sales, or the customer going "It got better" in the very short run. Making the platform more reliable, understandable, aesthetically pleasing is worth spending time on, but none of this is easily expressable as delivering customer value. For that matter, is writing a blog post delivering customer value? Will I get points for it? "As a customer, I want to read Ulaş's blog post" just doesn't sound right. What about contributing to open-source software? Reading the code of an important external dependency, such as the web framework your team uses, and working on bugs or feature requests to get a better understanding was not part of any Scrum backlog I've ever seen.</p>

<p>One more note on refactoring, since this is a favorite topic of mine. Why is it that scrum coaches keep on saying "You should always be refactoring"? Because the assumption is that refactoring will be a few hours' work, or even shorter if it's renaming a class here and replacing a file there. These are only the most superficial cases of refactoring, however. The most difficult refactorings, incidentally also the ones that make the biggest difference, target balls of mud that need considerable effort and work to disentangle, and this is not happening "always". It is the ideal condition to be able to do mini-refactorings, and improve code little by little, but small steps bring you nowhere in the case of these hardened balls of mud.  It's of course well and dandy if you can somehow magically plan such a complicated refactoring and find a place for it in your backlog. If you can't, which is much more probable given that deep-reaching refactoring is difficult to foresee, good luck telling your product owner that you will be lost in the depths of your codebase for a while.</p>

<h3 id="scrumisnotnativetosoftwareaidsec06namesec06a">Scrum is not native to software<a id="sec-0-6" name="sec-0-6"></a></h3>

<p>Any team that builds something can work on Scrum. This is often touted as a selling point, but it is admission of a shortcoming, in my opinion.  Claiming that Scrum is generic is admitting that it is not cut for the specific nature of software development. What is the job of a software developer? Writing code? I don't think so. I think it's inventing and customizing machine-executable abstractions, and Scrum has no facilitating aspects specifically for this kind of work. Scrum does not tell you how to organize interdependent processes that mutate while they are in flux. It doesn't tell you how to match domains to common abstractions. It doesn't tell you how to distinguish important differences from superficial ones based on context.</p>

<p>Of course, one can claim that this is not the job of Scrum, which is a software <em>management</em> methodology, and not a software <em>engineering</em> methodology, that it's only concerned with organizing the teams' time and workload, and anything else is the business of an engineering methodology, such as XP. If that is the case, why the hell am <em>I</em>, the software engineer, doing most of the work &#x2013; apart from the product owner, whose job <em>description</em> is doing Scrum anyway? Isn't it by definition the job of the managers, and not of the developers, to be practicing Scrum?  Shouldn't I, as a developer, be spending that whole batch of time and energy on software engineering relevant things, instead of on demoing stories, discussing the ordering of stories, and debugging the process itself? Why are the developers practicing only Scrum, and not, let's say, XP with bits of Scrum thrown in?</p>

<p>Another sign of the software-distant nature of Scrum is how little talk there is of an agile <em>codebase</em> in Scrum organizations. It's a <em>non sequitur</em> to think that Scrum is agile, agile teams produce agile code, ergo Scrum teams produce agile code. Having and keeping an agile codebase is crucial to "being" agile, and is actually hard work that requires much more than only following Scrum. It is difficult to introduce processes to manage this work, however, because</p>

<ul>
<li><p>Scrum makes claims that it is enough for design to "emerge", and</p></li>
<li><p>Where there is Scrum, people are reluctant to introduce even more
rituals and documents.</p></li>
</ul>

<p>In short: Does scrum help you write good code? Does it help you achieve modularization, expression, complexity reduction? The simplest answer I have is a clear <em>no</em>.</p>

<h2 id="scruminhibitsdeepunderstandingandinnovationaidsec07namesec07a">Scrum inhibits deep understanding and innovation<a id="sec-0-7" name="sec-0-7"></a></h2>

<p>This is actually my biggest gripe about Scrum. As mentioned above, in Scrum, the gods of story points per sprint reign supreme. For anything that doesn't bring in points, you need to get the permission of the product owner or scrum master or someone who has a say over them. Refactoring, reading code, researching a topic in detail are all seen as "not working on actual story points, which is what you are paid to do". Specialization is frowned upon. Whatever technology you develop or introduce, you are not allowed to become an expert at it, because it is finishing a story that brings the points, not getting the gist of a technology or mastering an idea. These are all manifestations of the control mania of Scrum.</p>

<p>I recently read <em>Innovation: The Missing Dimension</em> (<a href="https://www.amazon.com/review/R2J2RJJ7DPSAF7/">my review of the book</a>), a book that focuses on an aspect of innovation that is invisible if you look at design only from a problem-solving perspective. An important part of solving a problem is finding the right problem to solve, and this cannot be treated as a problem itself. It rather requires a community (what the authors call an interpretive community) that can reformulate the given domain and create linguistic and technological tools that allow novelty. This idea is inherent to the original agile principles in the form of <a href="http://www.agilemanifesto.org/">individuals and interactions taking precedence over processes and tools</a>. Scrum, however, is much closer to the problem solving approach, where analysis (breaking down a problem, and reassembling the solution) is the organizational tool. In order for an interpretive community to emerge, an organization needs ambiguity, open-ended conversations, and alternative perceptions. All of this, Scrum leaves to something else, whatever it is. They are not the domain of Scrum, but where there is Scrum, there is very little time and energy left for anything else. What's more, the conditions necessary for the emergence of an interpretive community, and thus for innovation, are seen by Scrum as risk that has to be controlled and eliminated. You cannot Scrum innovation.</p>

<p>You might of course think that innovation is not necessary for you, or that it's overrated, and your company can survive without innovating. But keep in mind that the software industry is probably <em>the</em> most innovative one out there. There are new technologies every day, and the basic tools of software development go through revolutions every couple of years. If you're not innovating, someone else who does might knock on the doors of your customers at some point. Also, innovation, in the sense of reinventing, is what software developers love to do, and is a great incentive for keeping top talent.</p>

<h3 id="summaryideasforalternativesaidsec1namesec1a">Summary, Ideas for Alternatives<a id="sec-1" name="sec-1"></a></h3>

<p>So, in summary, Scrum</p>

<ul>
<li><p>wastes too much of the developers' time for management</p></li>
<li><p>does not lead to good quality code</p></li>
<li><p>is a control freak which does not leave room for new ideas and
innovation.</p></li>
</ul>

<p>Discussion on software methodologies are a bit like discussions of open-source software. The default answer to any substantial criticism is "What is your alternative?", which is pretty much the equivalent of "Why don't you submit a patch?". Unfortunately, software management lies in the intersection of many disciplines, and is a huge field itself. My priorities as a developer lie elsewhere, namely in algorithms, programming languages, computer networks etc. I cannot squeeze in 500-page-tomes on software management into my already crammed bookshelves.</p>

<p>Which won't hold me back from making probably ill-advised and rather general proposals, or at least a clarification of my expectations as a dev. First, estimations and forecasting. I don't think there is anything wrong with estimating individual stories in terms of time, and deriving a general estimate of how long a project will take from this. The problem here is that the way stories are split and estimated is orthogonal to the way devs are working. That is, the work I, as a dev, put into organizing the backlog is not helping me in processing the backlog. If it were possible to organize and study the backlog so that this process also helps the devs, they would do it much better and eagerly. One way to achieve this might be putting work items through what I would call an <em>algebra of complexity</em>, i.e. an analysis of the sources of complexity in a work item and how they combine to create delays. The team could then study the backlog to locate the compositions that cause the most work and stress, and solve these knots to improve the codebase. The backlog would then resemble a network of equations, instead of a list of items, where solving one equation would simplify the others by replacing unknowns with more precise values.</p>

<p>The other proposal I would have is to get rid of the review, planning and stand-up meetings. Like, just stop doing them. There is no reason to make them so grand and rigid. You can replace most of the synchronous communication with textual communication, and create ad hoc meetings to discuss specific work items. Instead of having sprints that are marked by these meetings, one could simply point to the backlog as a measure of work done and pending. The retrospective, on the other hand, is the only meeting in which I saw magic happen in Scrum, but it has to happen more frequently, and concentrate more on the code base, as mentioned above.</p>

<p>To make it short, my dream workflow would combine offline working, continuous analysis of the sources of complexity and errors, and detailed, open-ended discussion on the path on which the team is approaching the goal (or not). The correct way of building software should align the understanding that devs have of the problem and the complexity involved with the aims of the other parts of the company.</p>]]></content:encoded></item><item><title><![CDATA[PostgreSQL Vacuuming: An Introduction for Busy Devs]]></title><description><![CDATA[<p>If you have interacted with PostgreSQL at any point in your developer career, you have met it: The autovacuum daemon. It fires up every now and then, consumes resources, and disappears again, without telling you what it did, and why it ran in the first place. In this post, I</p>]]></description><link>http://okigiveup.net/postgresql-vacuuming-an-introduction-for-busy-devs/</link><guid isPermaLink="false">4825afad-2144-4888-9196-f5a3e777e638</guid><category><![CDATA[postgresql]]></category><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Wed, 20 Apr 2016 12:13:56 GMT</pubDate><content:encoded><![CDATA[<p>If you have interacted with PostgreSQL at any point in your developer career, you have met it: The autovacuum daemon. It fires up every now and then, consumes resources, and disappears again, without telling you what it did, and why it ran in the first place. In this post, I would like to give an idea of what vacuuming is, what the autovacuum daemon does, and how you can become friends with it.</p>

<h2 id="whatisvacuumingaidsec11namesec11a">What is vacuuming?<a id="sec-1-1" name="sec-1-1"></a></h2>

<p>The concept of vacuuming has to do with the way PostgreSQL implements certain RDBMS features. A modern RDBMS has to offer concurrency control for transactions. That is, different transactions have to be able to see different views of the data, depending on which statements they have already executed. This concept is called transaction isolation, and constitutes the I in ACID. Some rows might be edited by a transaction, changing certain fields, whereas others might be deleted in one while they are still available in others. Furthermore, each transaction can be rolled back, leading to undoing of the changes made by the transaction. The management of data state is complicated by the fact that an RDBMS has to keep the storage of the table and any indexes on a table intact while managing data visibility. It cannot just go ahead and modify data on the primary data structures; this would lead to an invalid state.</p>

<p>The solution implemented by PostgreSQL is called <a href="http://satya-dba.blogspot.de/2009/08/rollback-segments-in-oracle.html">Multi-version Concurrency Control (MVCC)</a>.  The basic idea is to mark rows according to the transactions which are affecting them, and manage visibility accordingly. Each transaction gets an ID from a simple 32 bit integer sequence. Rows are then marked with this ID regarding which transaction last modified or deleted them. These marks are stored in the <code>xmin</code> and <code>xmax</code> columns which are normally hidden, but visible if explicitly queried for. Using the sample tables from <a href="http://okigiveup.net/what-postgresql-tells-you-about-its-performance/">the previous post on PostgreSQL performance</a>, we can insert some data, and then see the transaction ID's:</p>

<pre><code>BEGIN TRANSACTION;
SELECT txid_current(); -- prints the current transaction id
INSERT INTO person (first_name, last_name) VALUES ('Hercule', 'Poirot');
COMMIT TRANSACTION;
</code></pre>

<p>On my computer, the <code>SELECT txid_current();</code> statement prints out <code>156078</code>. When we query the columns <code>xmin</code> and <code>xmax</code>, we can see the following values:</p>

<pre><code>test=# SELECT xmin, xmax, first_name, last_name FROM person;
  xmin  | xmax | first_name | last_name
--------+------+------------+-----------
 156078 |    0 | Hercule    | Poirot
(1 row)
</code></pre>

<p>As you can see, the <code>xmin</code> column of the relevant row is set to the ID of the transaction in which it was commited. <code>xmin</code> can be interpreted as the lowest transaction ID that can see this column. Any transactions that have been started beforehand, and thus have lower ID, cannot see this row. The meaning of the <code>xmax</code> column is the exact opposite. This column is set to the ID of the transaction that deletes this row; any transactions that come after it cannot see the row. Essentially, for a transaction to see a row, the relationship <code>xmin &lt; current_txid &lt; xmax</code> should hold. There are two more columns (<code>cmin</code> and <code>cmax</code>) that are used for tracking rows per cursor state, but the details are not relevant here. For details of the algorithm, have a look at <a href="https://momjian.us/main/writings/pgsql/mvcc.pdf">these slides</a>.</p>

<p>MVCC is not the only method for RDBMS concurrency control; other databases use other mechanisms, such as rollback segments in <a href="http://www.enterprisedb.com/postgres-plus-edb-blog/amit-kapila/well-known-databases-use-different-approaches-mvcc">Oracle</a> or <a href="http://rhaas.blogspot.de/2011/02/mysql-vs-postgresql-part-2-vacuum-vs.html">MySQL</a>. These are like blocks of work which, when a transaction fails, are undone on rollback or in the next read that refers to those blocks. The advantage of MVCC compared to other methods is that rolling back a transaction has minimal cost. There is no cleanup that has to be done when a rollback happens; the memory and processor load is the same as the commit case.</p>

<h2 id="entervacuumaidsec12namesec12a">Enter VACUUM<a id="sec-1-2" name="sec-1-2"></a></h2>

<p>The <a href="http://www.postgresql.org/message-id/28353.1179983512@sss.pgh.pa.us">disadvantage of MVCC</a> is the topic of this post: The necessity of vacuuming. The primary purpose of vacuuming is as a garbage collector. Since PostgreSQL does not remove any rows from physical storage when they are updated or deleted, after some time (depending on the frequency of update and delete activity in the database), the database will be occupying a lot of essentially unused disk space.  Garbage collection is not the only purpose of vacuuming, though. Two related things are visibility map and transaction ID wraparound. Visibility maps are PostgreSQL's way of avoiding unnecessary trips to the heap, where the actual row data is stored. When a query finds rows in an index, PostgreSQL has to check whether these rows are visible, i.e. not already deleted and to be vacuumed, by fetching the data from the heap. This IO trip is avoided using visibility maps that record which pages on the heap have only visible data. If a page is on this map, PostgreSQL does not have to visit it to ensure visibility. As a side note, <a href="https://wiki.postgresql.org/wiki/Index-only_scans">the visibility map is the reason it took Postgresql longer to implement index-only scans</a>, which are possible only since version 9.2.</p>

<p>Transaction ID wraparound is the name given to the fact that since these IDs are 32 bit integers, they cannot be greater than 2<sup>32</sup>. When a database has processed more transactions than that, the transaction ID overflows, starting at 0 again. If no further action is undertaken, nearly all rows will suddenly become invisible, because they have positive transaction IDs. The solution implemented by PostgreSQL is setting the <code>xmin</code> id of rows with sensibly low <code>xmin</code> values to a special value <code>FrozenTransactionID</code> which is always considered to be lower (ergo older) than any transaction ID. This happens as a part of vacuuming, so if you do not vacuum your database for a long time, there is a real possibility that old data suddenly becomes invisible.</p>

<p><em>Edit</em>: As Peter pointed out in the comments, the transaction ID comparison is presented in a simplified manner here. The real comparison of IDs involves modulo-arithmetic, so that the space of IDs wraps around. That is to say, for any ID <code>x</code>, there are  2<sup>32</sup> IDs smaller than <code>x</code>, and just as many greater. See <a href="http://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND">the documentation for details</a>.</p>

<p>Manual vacuuming is as simple as running <code>VACUUM;</code> in <code>psql</code>, or rather <code>VACUUM VERBOSE;</code> if you want to actually see what is happening. These commands also accept the name of a table as an optional argument. If this option is ommitted, <code>VACUUM</code> is executed on the whole database. Running only <code>VACUUM</code> is what one could call the first level of vacuuming; it takes care of deleted rows and updates the visibility map. What it does <em>not</em> do is to return the storage space to the operating system, however, contrary to what I said above. It actually updates <a href="https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Is_PostgreSQL_remembering_what_you_vacuumed.3F">what's called the free space map (FSM)</a> to mark the pages that have free space due to deleted or updated rows. The next time a new row has to be written, PostgreSQL can consult this map and use the free space in the pages, instead of demanding more storage space from the OS. <a href="http://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY">If you want to reclaim all free space</a>, you need to run <code>VACUUM FULL;</code>, which might be necessary if you e.g. manually delete a lot of rows. Full vacuuming reprocesses table data, and rewrites a brand new version that is compacted and consumes exactly the space it needs. However, think twice before you run it: It locks the tables it is processing, and will block the both read and write queries.</p>

<h3 id="vacuumneanalyzeaidsec121namesec121a">Vacuum &ne; Analyze<a id="sec-1-2-1" name="sec-1-2-1"></a></h3>

<p>As I mentioned in my previous post, PostgreSQL relies on statistics of column value distributions to generate efficient query plans. Updating these statistics is not the job of <code>VACUUM</code>, and requires a separate command, namely <code>ANALYZE</code>. You can run <code>ANALYZE;</code> either standalon in <code>psql</code> (or <code>ANALYZE VERBOSE;</code> for more input), or both maintenance commands together with <code>VACUUM ANALYZE;</code>. As with <code>VACUUM</code>, you can pass <code>[VACUUM] ANALYZE</code> the name of a single table. Fun note: Both <code>ANALYZE</code> and <code>ANALYSE</code> work, so go ahead and spell it the British way if you are keen to do so.</p>

<h2 id="theautovacuumdaemonaidsec13namesec13a">The autovacuum daemon<a id="sec-1-3" name="sec-1-3"></a></h2>

<p>In order to make the jobs of database users worldwide easier, PostgreSQL since 8.1 comes with a daemon that runs both <code>VACUUM</code> and <code>ANALYZE</code> at certain intervals: The famous autovacuum daemon. It runs as a separate daemon process, the presence of which you can check with a simple <code>ps aux | grep autovacuum</code>. If you don't have any running vacuum processes, you should only see a "launcher process", otherwise you might also see workers. The autovacuum daemon checks each database in regular intervals to see whether it needs vacuuming and/or analyzing. If the number of rows that were updated or deleted is above a certain threshold for a table, these processes are executed. The number of deleted and updated rows is read from the statistics views; we can see an approximation for the <code>person</code> table with the following query:</p>

<pre><code>test=# SELECT n_tup_del, n_tup_upd FROM pg_stat_all_tables WHERE relname = 'person';
 n_tup_del | n_tup_upd
-----------+-----------
         0 |         0
(1 row)
</code></pre>

<p>The threshold is calculated <a href="http://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#AUTOVACUUM">according to the following formula</a>:</p>

<pre><code>autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * pg_class.reltuples)
</code></pre>

<p>The constants starting with <code>autovacuum</code> in the above formula can be queried in psql from the <code>pg_settings</code> table. The last value can be obtained with <code>SELECT reltuples from pg_class WHERE relname='person';</code>. Bringing these together, we can write the following query as an approximation for what the autovacuum daemon does to decide whether to vacuum a table:</p>

<pre><code>SELECT
(pt.n_tup_del + pt.n_tup_upd) &gt; pgs_threshold.setting::int + (pgs_scale.setting::float * pc.reltuples)
AS should_vacuum
FROM pg_class pc JOIN pg_stat_all_tables pt ON pc.relname = pt.relname
                 CROSS JOIN pg_settings pgs_threshold
                 CROSS JOIN pg_settings pgs_scale
WHERE pt.relname='person'
AND pgs_threshold.name = 'autovacuum_vacuum_threshold'
AND pgs_scale.name = 'autovacuum_vacuum_scale_factor';
</code></pre>

<p>You have to keep in mind that the statistics we receive from the <code>pg_stat_all_tables</code> are accumulated since <code>pg_stat_archiver.stats_reset</code>. In the documentation, there is no remark as to which exact statistics the autovacuum daemon uses, but I'm pretty certain that only the tuples updated and deleted since the last vacuum run are included. Otherwise, the autovacuum daemon would have to vacuum every table in every run in the limit. The autovacuum daemon does a similar calculation to decide whether to run analyze; details can be found on the <a href="http://www.postgresql.org/docs/9.5/static/routine-vacuuming.html">PostgreSQL documentation</a>.</p>

<h2 id="improvingvacuumingaidsec14namesec14a">Improving Vacuuming<a id="sec-1-4" name="sec-1-4"></a></h2>

<p>A frequent issue with the autovacuum daemon is that it gets to work at unexpected times of the day, maybe in the middle of a high load period, and causes deteriorated performance. Another symptom of improper vacuuming regime is queries that are executed with suboptimal query plans. The primary reason for this is incorrect table statistics, which can be alleviated by <code>ANALYZE</code> statements that run as a part of vacuuming. As you can see above, it's difficult to imitate the behavior of autovacuum. The ideal case would be to find out whether analytics are out of sync, but that's difficult to find out, and <a href="http://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#VACUUM-FOR-STATISTICS">not even autovacuum does that</a>:</p>

<blockquote>
  <p>The daemon schedules ANALYZE strictly as a function of the number of
  rows inserted or updated; it has no knowledge of whether that will
  lead to meaningful statistical changes.</p>
</blockquote>

<p>You are also strongly advised to never turn off autovacuuming, because of the risks it involves. Even if you do frequent manual vacuuming, there might be unexpected bouts of high activity that affect many rows. Also, autovacuum will do little work if it runs when you have manual vacuuming, so it makes sense to just leave it running. The most sensible thing to do is to adjust the settings so that large and active tables are vacuumed more frequently. Here is a query to find out which tables have the most number of rows:</p>

<pre><code>SELECT reltuples,relname FROM pg_class WHERE relkind='r' ORDER BY reltuples DESC;
</code></pre>

<p>You can either schedule a cron job to vacuum the largest tables regularly if you have periods of low load, or as is in our case, if the load on your application is continuous, you can adjust the parameters for these tables to run vacuuming more frequently. The vacuum parameters can be set separately for individual tables with the following query:</p>

<pre><code>ALTER TABLE person SET (autovacuum_vacuum_scale_factor = 0.0);
ALTER TABLE person SET (autovacuum_vacuum_threshold = 4000);
</code></pre>

<p>As per the equation above, these settings would cause autovacuum to vacuum these tables every 4000 row updates or deletes, no matter how many rows are already in the table. This would lead to more frequent vacuuming of these tables, and shorter vacuum times in the runs where all tables are vacuumed, leading to better perforamnce. The settings for individual tables can be queried from the <code>pg_class</code> table as follows:</p>

<pre><code>SELECT relname, reloptions FROM pg_class WHERE relname='person';
</code></pre>

<p>A simple test using <code>pg_dump</code> and <code>pg_restore</code> has revealed that settings changed with the <code>ALTER</code> statement above are also preserved in the dump and restore process, so you don't have to run it for every new instance of your database if you're reading in dumps.</p>]]></content:encoded></item><item><title><![CDATA[Lessons from Legacy]]></title><description><![CDATA[<p>Last two years of my working life have been spent on an e-commerce application that is mainly occupied with coordinating inventory items, orders and shipments. The main user interface of this application is a REST API tied to an Angular frontend. The same API is also used by various middleware</p>]]></description><link>http://okigiveup.net/lessons-from-legacy/</link><guid isPermaLink="false">364876e4-0866-4e4f-8482-b12832cd4d98</guid><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Wed, 30 Mar 2016 14:58:07 GMT</pubDate><content:encoded><![CDATA[<p>Last two years of my working life have been spent on an e-commerce application that is mainly occupied with coordinating inventory items, orders and shipments. The main user interface of this application is a REST API tied to an Angular frontend. The same API is also used by various middleware applications that sync with other e-commerce applications. Since our company has moved on to a new product, but we still have a customer using this legacy system, I had to take on the duty of keeping it running. As I dug deeper into it to make improvements, a couple of decisions we made at the time stood out to me as having proven to be suboptimal. It is one thing to read about general software design principles in the abstract, and another to see them demonstrated on a live, growing system in which you have a stake. Here is my attempt at making my observations as concrete as possible.</p>

<h4 id="dontmakeastraitjacketoutofconstraintsaidsec101namesec101a">Don't make a straitjacket out of constraints<a id="sec-1-0-1" name="sec-1-0-1"></a></h4>

<p>When it comes to securing the integrity of your data, PostgreSQL can do wonders, with such features as constraints, triggers, enumerations, and normalization, the <em>sine qua non</em> of relational databases. Add SQLAlchemy to the mixture, and anyting is possible. For example, with a feature called <a href="http://docs.sqlalchemy.org/en/latest/orm/join_conditions.html#composite-secondary-joins">composite secondary joins</a>, SQLAlchemy allows you to present a join across multiple tables as a field on a model, making complicated normalization schemes possible. On the Python end, you are accessing a simple entity attribute, while in the background PostgreSQL is jumping over foreign keys to make sure that, for example, the total available quantity for a certain product is the sum of all inventories for that product in different storage cells. This is all good and dandy as far as you are aware of the trade-offs. If you rely on the database too much, like we did, there are a number of dangers that lurk around the corner. First is performance. Accessing an instance attribute is a cheap operation in Python, and one does not think twice about it while coding. If there is a complicated join behind such an access, however, you might be creating a time sink that gets deeper with the size of the database. The ugly truth about slow database queries is that they rarely get caught in development or testing, because the database has little data and few connections. Production is where database performance is diagnosed, and avoiding that diagnosis in the first place is not a bad idea.</p>

<p>More impactful than performance, however, was how difficult it became to change things at the database level because of our excessive use of constraints. This is a textbook case of tight coupling: We propagated the constraints in the application into the database where they didn't really belong. Generally speaking, leaving room for action on the database gives you the freedom to circumvent otherwise complicated issues by modifying data. When you fixate everything with constraints, enumerations etc, you are giving this freedom away. For example, we used enumarations to limit the kinds of connectors (our name for the async jobs that were queued for execution). Since we already had these enumerations, we used them further to denote the sources for orders. I recently found out that removing connectors was made pretty much impossible by this dependency. I could purge the code for an unused connector, but the name in the enumeration had to stay, because there were orders created by this connector, and the source field had the name of the connector. Another affected area was deployment. Due to our excessive use of triggers, running migrations required dropping and then recreating all triggers. This made deployment longer and more error-prone, something I will touch upon later.</p>

<p>As general principles, I would venture to extract the following:</p>

<ul>
<li><p>Anything that changes regularly at the code level should be secured
only at that level, and not in the database.</p></li>
<li><p>Anything that requires locking complete tables when modifying is
unacceptable as a consistency feature.</p></li>
</ul>

<p>Make sure that securing your data does not turn into an exercise in rigidity. DB constraints are OK when they are used sparsely, but you should always be conscious of the trade-offs you are making.</p>

<h4 id="haveastrategyforgrowingbusinesslogicaidsec102namesec102a">Have a strategy for growing business logic<a id="sec-1-0-2" name="sec-1-0-2"></a></h4>

<p>Just in case you haven't recognized it yet: Organizing code is difficult. A decent portion of a developer's time goes into figuring out where a certain line of code goes. One serious mistake that we made was distributing business logic around the code base, instead of centralizing it. The logic for routing orders to consumers, for example, was in the module that contained the queued jobs, whereas the code that was responsible for changing the state of orders was in the API module. Among the consequences of disorganized business logic were the following:</p>

<ul>
<li><p>We frequently had a really hard time deciding on the correct location for changing or adding a functionality. Alternatively, finding the location of code responsible for a feature was never straightforward.</p></li>
<li><p>Tests for the various units of code manipulating the same DB model were complicated. Since the logic for state changes and processes was spread all over the place, the DB state had to be created for each test. If we had abstracted away the business logic into code that didn't need database objects, but could work with plain Python instances, testing would have been easier, faster and less error-prone.</p></li>
<li><p>Mentally tracking a process was an exercise in compiling code in your mind. A module was never really complete without inserting code from some other module in your mind's eye. Actually, I would sometimes temporarily copy-paste code to make understanding things easier.</p></li>
</ul>

<p>As soon as you start building business logic, you should have a clear idea of how you want to keep it coherent, united and testable. These orientation points should also make your code open to discovery and modification.</p>

<h4 id="howyoubringcodetousersisasimportantashowyoubuilditaidsec103namesec103a">How you bring code to users is as important as how you build it<a id="sec-1-0-3" name="sec-1-0-3"></a></h4>

<p>Writing the code is half the work of creating a feature. The rest is actually bringing it to the user. This involves testing, merging into main branch, and deploying to production. There were a lot of things we got right here; we were using a very simple integration approach with pull requests merged into master, and integrated testing. There were also some things we got wrong. The first of these was slow testing. This is something most of the industry is getting wrong, in my opinion. We as developers have gotten used to having test suites run in tens of minutes, and accept this as a necessary evil of large code bases. Our test suite was also rather long-running; some test runs took more than half an hour to complete. The result of this was distraction, prolonged deployment times, the occasional skipping of test runs, and overall developer dissatisfaction. A long running test suite is also frequently broken due to reasons unrelated to the code, reasons more related to infrastructure or annoying race conditions. Such failures have led to the biggest frustrations of my life as a developer. As I'm trying to concentrate on a difficult issue, a one-liner commit seemingly breaks the test run. Such things sometimes cost me a whole day, and they usually turned out to be due to the unrelated issues like the minor version of a cli program having changed. Through such breakage and long waiting times, the test suite cannot be used as a tool that gives feedback on correctness. It becomes a nuisance, a bureaucratic process that has to be fulfilled to deploy code. If I had a say for a new project, I would put a hard limit of 10 minutes for test runs, and work very hard to keep this limit.</p>

<p>The next step in delivering the code is deploying it to production. Again, we weren't doing bad in this area, but it could have been better. Deploying code did not take long, but it was prone to breaking. The fixes required better devops skills, which not enough people possessed at the time. I know that this is much more often said than done, but I have to join the chorus: Having a dead simple and robust deployment process should be the highest priority of a team that runs web applications to earn their daily bread.</p>

<h4 id="beveryverycarefulwithendtoendtestsaidsec104namesec104a">Be very very careful with end-to-end tests<a id="sec-1-0-4" name="sec-1-0-4"></a></h4>

<p>Keepin with the testing theme, one thing we really regretted was going overboard with end-to-end (e2e) tests. These were implemented using the test runner for Angular, and fired Chrome to go through many frontend features. The Angular frontend was run against an actual backend. The e2e tests appeared as a real wonder weapon at the beginning. Look, you can actually run the whole application! Can you get a better guarantee that everything is OK and you can deploy? With this thinking, we wrote e2e tests for each corner case of a given functionality. Once we accumulated a decent number of such tests, however, they turned out to be rather problematic. We faced two principal difficulties in this context. First, one had to set up the exact environment through the database so that the frontend could find out what it was looking for. Second, navigating through the interface was painful even with very precise setup, because one had to find the exact selector for the elements (buttons, inputs etc.) to be manipulated. These selectors also broke very easily when the HTML structure changed. Adding one new list to a page led to having to fix the selectors in all the tests for that page.</p>

<p>I know that frontend testing is a difficult area, and won't pretend that I have the solution to it. There was one lesson that surfaced in our review discussions, however: It would have been a better idea to abstract away some of the JS and unit test it without the DOM or the backend application. The browser has many complexities that are difficult to control in a completely automated environment, and it is best to avoid these when you can.</p>

<h4 id="dontleavetransparencytoyourtoolsaidsec105namesec105a">Don't leave transparency to your tools<a id="sec-1-0-5" name="sec-1-0-5"></a></h4>

<p>We had to do our fair share of firefighting, which went like this: Customer calls us, tells us that they are getting weird responses from our API, or maybe no responses at all, which puts us into scrambling mode. We go to the logs for details, but our application is not telling much, so we have to check the logs from the web server, message queue and the database. From the information spit out by them, we need to piece together what the issue is, and deploy a fix after figuring it out. The fact that we had to consult the infrastructure to find out the exact problem was the reason that our customer was calling us in the first place, instead of us getting alerted to the issue.</p>

<p>While discussing this topic with our current CTO, who has a lot of experience with large systems, she remarked that if you leave the job of reporting on your platform to the infrastructure components such as database or message queue, the developers will go to <em>them</em> to understand system behavior. This will make debugging and performance-monitoring an indirect process, where you have to interpret the data coming from these components to draw conclusions on what is happening in your own code. The more sensible thing to do is build in proper logging and monitoring from the beginning to make sure your system is transparent, i.e. it should be telling you what's going on during normal and faulty operation.</p>

<h4 id="focusonyourcoretechnologyaidsec106namesec106a">Focus on your core technology<a id="sec-1-0-6" name="sec-1-0-6"></a></h4>

<p>Every large application relies on at least one component of the infrastructure as the core. Depending on the kind of application, this core can be different.  Data-driven web applications nearly always have an RDBMS in their center, which was also the case with our app, in which case it was specifically PostgreSQL. PostgreSQL is an incredible peace of technology. It is robust, feature-complete, fast, and somehow still evolving. Despite relying on it so heavily, I have to admit that our knowledge of PostgreSQL was still relatively limited, in the sense that we were not using many important features properly. We had proper replication in place, and were using triggers and constraints as explained above, but our use of indexing was really limited, for example. I discovered this when I took the time to examine the index usage of our database (which I documented in <a href="http://okigiveup.net/what-postgresql-tells-you-about-its-performance/">another blog post</a>), and created a number of indexes that improved the most frequent queries. The result was our customer representative telling me that the app was so fast, that he thought there was something wrong. Indexing is obviously not a specialty of PostgreSQL; it is <em>the</em> feature of RDBMS's. It is therefore database theory that we should have tried to understand a bit better, and you can do this much better when you are working on a data heavy application that should have been performing much better yesterday.</p>

<p>Other core technologies such as message queues or key-value stores all have their tricky corners which, when understood properly, can help you navigate difficult situations, and earn yourself some time when you desperately need it. Most important, though, is understanding the theory common to all such systems, and the common algorithms that exist to navigate the constraints. You should take the time to read the documentation a bit, and inform yourself about not only the specifics but also the general theory.</p>

<h4 id="nothingbeatsincrementallyimprovedsystemsaidsec107namesec107a">Nothing beats incrementally improved systems<a id="sec-1-0-7" name="sec-1-0-7"></a></h4>

<p>To finish on a positive note, I need to mention that our application has been improving continuously, and getting more robust and performant as we keep working on it incrementally, attacking one issue at a time. Continuous attention and incremental improvement beats rewriting a large codebase, given that it is not utter spaghetti code, or written without a sound understanding of best practices and the programming model supported by the programming language(s) used. This was one of the biggest advantages of our team: All developers were experienced and dilligent students of Python and/or JS, the two languages our application was built with. Every developer strived to write clear and idiomatic Python and JS, and there was constant exchange on how to do the right thing. The end result is a codebase that can still evolve, even with a single developer left working on it.</p>]]></content:encoded></item><item><title><![CDATA[What PostgreSQL Tells You About Its Performance]]></title><description><![CDATA[<p>Recently at work I was tasked with improving our legacy application. It has been neglected for a while, and takes its revenge by causing frequent firefighting and overall crappy performance. The application is tightly coupled with a PostgreSQL database, and many things that are normally not the job of a</p>]]></description><link>http://okigiveup.net/what-postgresql-tells-you-about-its-performance/</link><guid isPermaLink="false">8a32c936-a597-4ec1-af1f-9417d28fe830</guid><category><![CDATA[postgresql]]></category><dc:creator><![CDATA[Ulaş Türkmen]]></dc:creator><pubDate>Mon, 29 Feb 2016 10:47:19 GMT</pubDate><content:encoded><![CDATA[<p>Recently at work I was tasked with improving our legacy application. It has been neglected for a while, and takes its revenge by causing frequent firefighting and overall crappy performance. The application is tightly coupled with a PostgreSQL database, and many things that are normally not the job of a database (such as keeping version history) are delegated to this single PostgreSQL instance. The result is a feedback loop where the database is under immense load for even the simplest things, causing frequent deadlocks and extremely long queries, which leads to decreased performance and long request times, which leads to even more load. To put an end to this spiral of endless firefighting, and improve my knowledge of Postgres, I decided to spend some time with the legacy application. The first step was analyzing the database performance, to find out whether there is anything that would give us the biggest advantage with comparably small effort.</p>

<p>Generally speaking, the following are the factors that we need to focus on to judge how well a database cluster is performing:</p>

<ul>
<li><p>Index usage: The most important algorithmic fundamental of a relational database is the <a href="https://en.wikipedia.org/wiki/B-tree">B-tree index</a>. If a database is not properly configured, it will do sequential scans across frequently used tables (linear with table size) instead of using an index (logarithmic).</p></li>
<li><p>IO: PostgreSQL does its best not to read data from disk, either delaying reading as much as possible, or using the cache. Whether reading disk can be avoided depends mostly on cache configuration.</p></li>
<li><p>Concurrent connections: Many parallel connections consume <a href="http://hans.io/blog/2014/02/19/postgresql_connection/">a lot of memory and CPU</a>. You should make sure that your database is not plagued with more connections than it can handle.</p></li>
<li><p>Deadlocks: These nasty buggers are the biggest killers in terms of performance, because they lead to long queries, blocked connections, and expensive transaction rollbacks. If you have a lot of deadlocks, your locking queries need a review.</p></li>
</ul>

<h2 id="collectinggeneralperformancedata">Collecting General Performance Data</h2>

<p>Not surprisingly, there are a number of tables that PostgreSQL keeps within its own schema with an abundance of information on the above dimensions. These tables all start with either <code>pg_stat</code> or <code>pg_statio</code>, and are generally referred to as the stats tables. It is important to keep in mind that <a href="http://blog.pgaddict.com/posts/the-two-kinds-of-stats-in-postgresql">there are two kind of statistics in PostgreSQL</a>. The first kind is for its own internal usage, such as deciding when to run <code>autovacuum</code>, and query planning. This data is kept in the <a href="http://www.postgresql.org/docs/current/static/catalog-pg-statistic.html"><code>pg_statistics</code> catalog</a>. As the documentation points out, this table should not be readable to ordinary users. A publicly readable view on this data that is also in a more human-friendly format is <code>pg_stats</code>.</p>

<p>The second kind of statistics is for monitoring, and these tables are the focus of this post. The monitoring stats tables can be subsumed in three groups: Database-specific, table-specific and query-specific. Let's start with <strong>database-specific statistics</strong>. The statistics for a single database are saved in the <strong><code>pg_stat_database</code></strong> table. In addition to the rows that are to be expected, such as database name and id (<code>datname</code> and <code>datid</code>), the following columns that are relevant to our interests are in this table:</p>

<ul>
<li><p><code>numbackends</code>: Number of backends currently connected to this database.</p></li>
<li><p><code>blks_read</code>, <code>blks_hit</code>: Number of times disk blocks were read vs. number of cache hits for these blocks.</p></li>
<li><p><code>xact_commit</code>, <code>xact_rollback</code>: Number of transactions committed and rolled back, respectively.</p></li>
<li><p><code>deadlocks</code>: Number of deadlocks since last reset. As mentioned above, very important for database performance.</p></li>
</ul>

<p><code>numbackends</code> is an important column, not only because too high a value can cause issues, as mentioned above, but also because the change in this number during normal operation gives us a hint about how long queries are taking. Combining the value of <code>numbackends</code> with the oldest running query from the <code>pg_stat_activity</code> table might also be informative, to make sure that there are no long-running connections that were not properly closed.</p>

<p>The ratio of cache hits to total reads can be determined with the following query:</p>

<pre><code>SELECT blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio
FROM pg_stat_database
WHERE datname=current_database();
</code></pre>

<p>This number is the most important metric for measuring IO performance; <a href="http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/">it should be very close to 1</a>. Otherwise you should consider changing <a href="http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-SHARED-BUFFERS">the <code>shared_buffers</code> configuration option</a>. A similar ratio of the number of committed transactions vs. all transactions is also important:</p>

<pre><code>SELECT xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio
FROM pg_stat_database
WHERE datname=current_database();
</code></pre>

<p>Except for <code>numbackends</code>, all these values are accumulated since the time they were reset. Resetting can be carried out by logging into the database and running <code>select pg_stat_reset();</code>. The last time this was done is stored in the <code>stats_reset</code> column. Resetting statistics affects only the monitoring tables; <code>pg_statistics</code> is populated by <code>ANALYZE</code>, and is not affected.</p>

<p>The most useful table-specific stats table is <code>pg_stat_all_tables</code>. Running a simple <code>\d pg_stat_all_tables</code> on this table reveals some very interesting columns:</p>

<ul>
<li><p><code>last_vacuum</code>, <code>last_analyze</code> : The last time vacuum and analyze have been executed manually on this table.</p></li>
<li><p><code>last_autovacuum</code>, <code>last_autoanalyze</code> : The last time this table has been vacuumed or analyzed by the autovacuum daemon.</p></li>
<li><p><code>idx_scan</code>, <code>idx_tup_fetch</code>: The number of times an index scan was made on this table, and the number of rows fetched this way.</p></li>
<li><p><code>seq_scan</code>, <code>seq_tup_read</code>: The number of times a sequential scan was made, and the number of rows read this way.</p></li>
<li><p><code>n_tup_ins</code>, <code>n_tup_upd</code>, <code>n_tup_del</code> : Number of rows inserted, updated and deleted.</p></li>
<li><p><code>n_live_tup</code>, <code>n_dead_tup</code> : Estimated number of live rows vs. dead rows.</p></li>
</ul>

<p>The most meaningful stats from a performance perspective are those related to index vs sequential scans. An index scan happens when the database can determine which rows to fetch by ID only using an index, a data structure that is easy to traverse. A sequential scan happens, on the other hand, when a table has to be linearly processed in order to determine which rows belong in a set. Sequential scans are very costly operations for big tables. The reason for this is that reading rows is an expensive operation, as the actual table data is stored in an unordered heap. The aim of a database user therefore should be to tweak the index definitions so that the database does as little sequential scans as possible. I strongly recommend the book <a href="http://sql-performance-explained.com/">SQL Performance Explained</a> on the topic of indexes. The ratio of index scans to all scans for the whole database can be calculated as follows:</p>

<pre><code>SELECT sum(idx_scan)/(sum(idx_scan) + sum(seq_scan)) as idx_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname='public';
</code></pre>

<p>The user has access to the tables in the current database plus some other system tables, such as the TOAST tables, which necessitates filtering these out by looking at only those in the <code>public</code> namespace. The value returned by the above query should be very close to 1, otherwise you have a serious problem. In order to see a more detailed report of how individual tables are faring in the same area, you can use the following query, which calculates the same ratio per table and puts them in ascending order:</p>

<pre><code>SELECT relname,idx_scan::float/(idx_scan+seq_scan+1) as idx_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname='public'
ORDER BY idx_scan_ratio ASC;
</code></pre>

<p>As pointed out <a href="http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/">in this blog post</a>, it might be a good idea to pay special attention to index usage on tables with many rows, and make sure they are as highly optimized as possible.</p>

<p>Running the query <code>select pg_stat_reset();</code> as superuser resets also <code>pg_stat_all_tables</code> as well as <code>pg_stat_database</code>.</p>

<h2 id="triggerbehavioraidsec21namesec21a">Trigger behavior<a id="sec-2-1" name="sec-2-1"></a></h2>

<p>One question we had in mind was how the stats were related to queries running within trigger functions. PostgreSQL is known for doing the sensible thing, so we expected stats to be collected also within triggers, but it's best to make sure by running a simple test. Let's create an empty database with the following simple tables:</p>

<pre><code>CREATE TABLE person (
  id SERIAL PRIMARY KEY,
  last_name VARCHAR(255),
  first_name VARCHAR(255)
);

CREATE TABLE address (
  id SERIAL PRIMARY KEY,
  person_id integer REFERENCES person(id),
  fullname VARCHAR(255),
  street VARCHAR(255),
  city VARCHAR(255)
);
</code></pre>

<p>We can insert the following rows into the <code>person</code> and <code>address</code> tables:</p>

<pre><code>INSERT INTO person (first_name, last_name)
VALUES ('Hercule', 'Poirot');

INSERT INTO address (person_id, fullname, street, city)
VALUES (1, 'Hercule Poirot', 'Rue des Martyrs', 'Paris');
</code></pre>

<p>A quick check of the pg<sub>stat</sub><sub>all</sub><sub>tables</sub> after resetting stats reveals the following:</p>

<pre><code>SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person';
SELECT * from person where first_name='Hercule';
SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person';
</code></pre>

<p>The first <code>SELECT</code> query on <code>pg_stat_all_tables</code> returns <code>0, 0, 0</code>, whereas the second one returns <code>0, 1, 0</code>, as one would expect. In order to test whether these statistics take into account triggers, we can add a trigger to the <code>person</code> table with the following lines:</p>

<pre><code>CREATE OR REPLACE FUNCTION update_fullname() RETURNS TRIGGER AS $$
    BEGIN
        UPDATE address
          SET fullname = concat(NEW.first_name, ' ', NEW.last_name)
          WHERE person_id = NEW.id;
        RETURN NULL;
    END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS update_fullname_trigger ON person;
CREATE TRIGGER update_fullname_trigger
    AFTER UPDATE ON person
    FOR EACH ROW
    EXECUTE PROCEDURE update_fullname();
</code></pre>

<p>After installing the <code>update_fullname</code> trigger, which changes the fullname column in the <code>address</code> table when a <code>person</code> changes, we can reset the statistics and run a simple update to see what happens:</p>

<pre><code>SELECT pg_stat_reset();
UPDATE person SET first_name = 'Marcel' WHERE id=1;
SELECT idx_scan,seq_scan,n_tup_upd FROM pg_stat_all_tables
  WHERE schemaname='public' AND relname='address';
</code></pre>

<p>This should return <code>0, 1, 1</code>, meaning that the query ran by the trigger was registered in the statistics.</p>

<h2 id="monitoringqueryperformance">Monitoring Query Performance</h2>

<p>The tables mentioned until now give you a general overview of the performance characteristics of your database. When it comes to finding the reasons for these characteristics, you need to go one level deeper, to individual queries. The one table that has the most information on the performance of individual queries is <code>pg_stat_statements</code>. Unfortunately, this table is populated by a plugin that has to be first enabled, requiring a database restart. I would strongly encourage you to install the plugin though, since the data registered by it is impossible to derive or collect otherwise. Enabling the plugin is a matter of installing the package <code>postgresql-contrib-9.X</code> for your version of PostgreSQL and Unix, and adding (or uncommenting) the following lines in <code>postgres.conf</code>:</p>

<pre><code>shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
</code></pre>

<p>Afterwards, you should log in to the database of interest and run <code>CREATE EXTENSION pg_stat_statements;</code>. From now on, various statistics will be collected for each individual query, and stored in the <code>pg_stat_statements</code> table. The important identifier columns on this table are the following:</p>

<ul>
<li><p><code>dbid</code>: This column has the ID of the database on which the query was ran. The corresponding column in the <code>pg_database</code> table is called <code>oid</code>, and is hidden. You normally don't have to filter for this column, though; only the queries for the currently connected database are visible in the <code>pg_stat_statements</code> table.</p></li>
<li><p><code>queryid</code>: This is a hash of the internal representation of the query. The way this hash is calculated involves a number of subtleties. These will be discussed a few lines below.</p></li>
<li><p><code>query</code>: A representative text for what PostgreSQL considers to be the same query.</p></li>
</ul>

<p><strong>Query hash generation</strong> takes as its input the representation that PostgreSQL generates after a query is parsed and matched to the relevant tables or indexes. The scalar values in the query are then stripped out for <em>plannable queries</em>, i.e. <code>SELECT, INSERT, UPDATE, DELETE</code>. The resulting internal representation is an abstract "summary" of the query. Different queries can thus match to the same <code>queryid</code>, for example in the cases where the order of the select fields or the join order is different. See <a href="http://www.postgresql.org/docs/current/static/pgstatstatements.html">the PostgreSQL documentation on the topic</a> for further details.</p>

<p>The columns in the <code>pg_stat_statements</code> table relevant for performance analysis are the following:</p>

<ul>
<li><p><code>calls</code>: Number of times executed</p></li>
<li><p><code>total_time</code>: Time spent in this query</p></li>
<li><p><code>min_time</code>, <code>max_time</code>, <code>mean_time</code>: The min, max and mean of all
query runs.</p></li>
</ul>

<p>As with the above statistics tables, <code>pg_stat_statements</code> aggregates values between resets. This table requires a different function to reset, the aptly named <code>pg_stat_statements_reset</code>.</p>

<p>A simple test shows that the queries ran through triggers are accounted for in the <code>pg_stat_statements</code> table, too. After creating the tables, registering the triggers, and resetting the statistics with <code>SELECT pg_stat_statements_reset()</code>, let's run the following simple query again:</p>

<pre><code>UPDATE person SET first_name = 'Marcel' WHERE id=1;
</code></pre>

<p>Asking for the statistics shows us that the <code>UPDATE</code> statements in the trigger have been registered properly:</p>

<pre><code>test=# select calls,total_time,left(query,30) from pg_stat_statements where dbid=874591
order by calls desc;
 calls | total_time |              left
-------+------------+--------------------------------
     2 |      0.201 | select calls,total_time,left(q
     1 |      0.019 | UPDATE address                +
       |            |           SET f
     1 |      8.898 | select pg_stat_statements_rese
     1 |      0.564 | UPDATE person SET first_name =
</code></pre>

<p>Once the <code>pg_stat_statements</code> extension is enabled, improving database performance in terms of query duration (the most important thing, as far as the users are concerned) is as simple as finding the longest-running queries ordered either by average or total time, finding sample values for the parameters, and running them with <code>EXPLAIN</code> or <code>EXPLAIN ANALYZE</code>. See <a href="https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT">this old but still relevant tutorial</a> for a quick introduction to using <code>EXPLAIN</code>.</p>

<p>One more thing we wanted to achieve was to regularly query our database instance for the above mentioned pieces of information, and display them on our Kibana dashboard. Unfortunately, Logstash proved to be a roadblock with its weird parsing behavior and incomprehensible bugs (hence my current attempt to <a href="https://github.com/afroisalreadyinu/stashpy">rewrite it in Python</a>), but for the time being, here is a bash script which uses psql to query PostgreSQL for the stats tables, and pipes everything to syslog:</p>

<pre><code>set -e

case "$1" in
    database)
        psql -x db_name -c "select numbackends,blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio,xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio from pg_stat_database where datname=db_name;" | grep -v RECORD | sed '/^$/d' | tr '\n' ' ' | logger
        ;;
    statements)
        psql -x db_name -c "select queryid, total_time, (total_time::float/calls) as mean_time, left(query,40) as short_query from pg_stat_statements order by total_time desc limit 10;" | tr '\n' ' ' | sed 's/-\[ RECORD [0-9]* \]-*/\n/g' | xargs -d '\n' -n 1 logger
        ;;
    *)
        exit 1
        ;;
esac
</code></pre>]]></content:encoded></item></channel></rss>