OK I GIVE UP

Golang JSON Gotchas That Drove Me Crazy But I Have Learned to Deal With

Ulaş Türkmen — Sun, 28 Feb 2021 19:34:19 GMT

Assumed reader level: Intermediate
Content level: Advanced beginner

JSON is JSON, it's everywhere, and if you're working with Go you're most probably doing tons of JSON marshalling and unmarshalling. Having experience in languages that have nearly identical built-in syntax for JSON (Javascript and Python), I repeatedly ran into certain issues, having to do with Go's idiosyncracies and my deep-seated habits. Keep in mind that these points apply to other encodings in the Go standard library, and generally to all packages that implement the same interfaces and patterns.

Only public fields are (un)marshalled

This is the gotcha that annoyed me the most until I got it carved into my mind after spending countless minutes debugging it. Traditionally, JSON object keys start with lowercase letters, whereas Go uses capitalization to determine public vs private. When code accesses the fields of a struct within the same module, you will not get into trouble with private fields, as they are treated as accessible within the same module. JSON is a different module, however, and it will not be able to write to these private fields. For example, the following will work:

var data struct {  
    Key string
}
jsonData := []byte(`{"Key": "Value"}`)  
json.Unmarshal(jsonData, &data)  
fmt.Printf("%v\n", data)

This should print {Value}. But lest you forget that Key has to be capitalized, so that encoding/json can write to it; then it will not help you even to set the field tag, as follows:

var data struct {  
    key string `json:"the_key"`
}
jsonData := []byte(`{"the_key": "Value"}`)  
err := json.Unmarshal(jsonData, &data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)

This will, of course, print {}, and a nil error. What does catch this error, however, is go vet, which prints the following friendly error message:

./jsong.go:10:3: struct field key has json tag but is not exported

You will not get this message, though, if you don't have JSON tags. Long story short: Use tags even when keys match, and use go vet.

Unmarshaling is not for error checking

As encoding/json unmarshals a JSON-encoded byte array to a struct, you would expect some kind of error checking to happen. Let's take the following example:

type Data struct {  
       IntField  int  `json:"intfield"`
       BoolField bool `json:"boolfield"`
}
jsonData := []byte(`{"intfield": "yolo", "boolfield": "ctulhu ftaghn (whatever the hell that means)"}`)  
var data Data  
err := json.Unmarshal(jsonData, &data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)

As you can see, we are packing all kinds of junk in the JSON object keys that correspond to the Data struct fields. Go deserializes this as far as it can, and when it can't do so anymore, leaves the rest of the fields as they were beforehand. The error message reports the last field that could not be deserialized, resulting with the following output in the above case:

{0 false}
json: cannot unmarshal string into Go struct field Data.intfield of type int

So keep in mind: Deserialization is not validation. For purposes of validation, you should use a library such as https://github.com/go-playground/validator/, or even better, something that validates the input JSON directly (which I haven't found a library for yet).

Struct tags are not error-checked in any manner

When logic is put into strings in a programming language, trouble is inevitable. Language capabilities go out the window, and you are left alone with your tired eyes and mind to catch errors. Go's struct tags are no exception. Since their contents are not code, any errors you make go straight through the Go compiler without any warnings. Let's have a look at this example:

type Data struct {  
    IntField int `json:"int_field or something`
}
jsonData := []byte(`{"int_field": 43}`)  
var data Data  
err := json.Unmarshal(jsonData, &data)  
fmt.Printf("%v\n", data)  
fmt.Printf("%s\n", err)

This will print {0} and no error. One might think that struct tags are always simple, such as those for JSON deserialization, and an average programmer should be able to deal with them in a normal state. Unfortunately, tags are used by all kinds of libraries, which implement their own syntax embedded in the tag string. One example is the tag structure used by the validation library I linked to above, demonstrated in the following type definition:

type IntegrationInput struct {  
    IntegrationTypeID int32 `json:"integration_type_id" validate:"gte:1"`
}

Can you see the error here? The validate tag has to be "gte=1" and not "gte:1". Things like this are difficult to get right and debug, especially when multiple tags are interacting, as in this example. As with unexported struct fields, go vet can help you with tags, generating the following error for the first example:

./tags.go:10:3: struct field tag `json:"int_field or something` not compatible with
  reflect.StructTag.Get: bad syntax for struct tag value

But go vet cannot help you with the validate tag, because those tags have their own logic. So use go vet to avoid type field tags, but also pay extra attention to the format of the more complex tags.

Bonus: struct tag matching is case-insensitive

Thanks to procach on /r/golang for this tip.

You would think that, if you use field tags to match JSON fields, you would be able to precisely match the case of fields in JSON data. This is not really the case, however. Even if you use tags, the match is case-insensitive, as the following example shows:

var data struct {
    Key string `json:"TheKey"`
}
jsonData := []byte(`{"thekey": "Value"}`)
err := json.Unmarshal(jsonData, &data)
fmt.Printf("%v\n", data)
fmt.Printf("%s\n", err)

This will output {Value}. Even though TheKey and thekey are differing strings, encoding/json will match the fields to each other. Another thing to keep in the back of your head, in case unmarshalling behaves in unexpected ways.

Conclusion

I consider it a useful restriction-cum-feature that Go requires you to convert JSON data to native structures to manipulate them conveniently. Languages like Python, which have built-in syntax for similar structure, can lead to JSON-driven development, which I had discussed in another blog post. If you want to have a good time converting between JSON and Go, make sure you don't skip error checks, regularly use go vet, and pay attention to capitalization, and you should be all fine and dandy.

Structured Debugging

Ulaş Türkmen — Mon, 10 Aug 2020 12:59:55 GMT

In this piece I would like to describe a practice I adopted a few years ago, after seeing how effectively colleagues were applying it. The idea of what I will call structured debugging is very simple: Document every step of the debugging effort of a complicated bug in an interactive environment in such a way that your thought process and deductions can be followed and verified. Despite its usefulness, simplicity and effectivenes, I see structured debugging practiced rarely. It helped me with complex bugs on multiple occasions, especially in distributed applications, and also leads to artefacts that have value on their own, independent of the debugging.

The first step of structured debugging is figuring out where to document your progress. If you are happy working with the bug tracker that is at your disposal, you can document your work as comments on the bug ticket. If you prefer working in your editor, as I do, I would advise you to create a markdown file named after the ID of the bug. When you want to quickly switch to this buffer, you can use the bug ID, and once you are done, or want to notify others of your work-in-progress, you can copy-paste the contents. Every half decent bug tracker out there accepts markdown input these days; by editing the report in your editor as markdown, you will have best of both worlds, with local shortcuts and utilities, in addition to clickable links and decent formatting for code snippets etc.

Once you have picked out the documentation environment, you should start proceeding in a systematic way. This can be done in the standard debug loop (gather information, set up conjectures, test them, repeat). What you want to achieve iterating over this loop is a reproducible narrative: Document each of the steps in such a way that any other developer with acquintence to the code base can open the ticket, follow through the comments and repeat any commands, arriving at the same conclusions as you do. When gathering information, it is common to make use of SQL queries, for example, or even simple scripts that join data from multiple sources. You should gather all of these in your report, together with the results at the point you ran them. One nice side effect of making this information available in a nice form is that you can take the time to make them as informative and simple as possible, for example by using joins instead of multiple queries in SQL. In order to gather information from your colleagues, you can tag them in the bug ticket, so that they can write their responses there, enriching the bug hunt.

The most relevant source of information in debugging live systems is logs. In the old days, web application logs were stored either as text files on servers, rotated and zipped regularly, or piped to syslog. Both made accessing these in a linkable form problematic. More recently, however, dedicated systems for log analysis have become more and more widely used. These all (or nearly all; CloudWatch doesn't allow linking to a single line) have means for linking to individual lines, time windows or the results of specific queries. Instead of just copy-pasting the relevant log lines, or in addition to that, consider using these links. Any readers can open these links and try alternative searches. Another important source of information for especially complicated bugs is diagrams for explaining workflows, relationships or complex constellations. Timeline of events, for example can be explained using sequence diagrams, which are much better than convoluted text. When the difficulty of the bug warrants it, these are a great addition to the narrative.

Structured debugging can cause significant extra work, but it has major benefits. Most importantly, the end result will leave little doubt as to anything was missed. The conclusions you derive will not be based on conjectures and assumptions, but concrete data and tooling, open for everyone to read and verify. Furthermore, the artefacts resulting from structured debugging are valuable on their own. Not only once did I see the tools used in such debugging actually getting incorporated into internal products, such as SQL queries turned into internal web pages & reports, or Kibana searches that were added to dashboards as graphs. In case the bug proves tougher than you thought, or something more important comes in between, the report will prove invaluable: Once it's taken up again, you or anyone else can read it, and easily take off from where you left. Last but not least, this method will make it visible to the team what gaps in visibility and diagnostics exist, making analyzing and linking the whole system harder or incomplete.

The How and Why of Go, Part 1: Tooling

Ulaş Türkmen — Tue, 30 Jun 2020 12:49:29 GMT

I'm one of those people surprised at the success of the Go programming language. Here is a language that prides itself in offering less than languages designed decades ago, unabashedly not OOP, and without a decent dependency management system (at least initially), but still wildly successful, with a number of significant open source projects written in it (e.g. Docker, Terraform & Kubernetes). Another intriguing thing is that people who use Go as their primary language rarely complain about it (maybe a generics would be nice here and there), while those who come into initial contact with it can't stop swearing, at least initially (mea culpa). I used this gap in affinity as a chance to understand the intricacies of Go by diving into the platform, and writing down what I think is necessary knowledge for newcomers to become productive on it. The target reader group would be developers already proficient in one language and platform; as the text is already quite long, I didn't want to explain common programming terminology. Unavoidably, my perspective is skewed by the technologies I'm acquainted with, especially Python, with which I frequently compare Go, but it should be useful for all newcomers, even those without too much programming experience. I hope that those already working in Go can also find a useful tip here and there.

And now to the "Why" in the title. The design of Go is a bit curious, in that it leaves out most features of other popular programming languages, going for simplicity rather than recommending itself with more features. My aim was to make a proper attempt at understanding the context for this choice, by following the path from expectations from a language, to design principles, to language features, and finally to the embodiment of the language in terms of compiler, runtime and tooling. This process is never perfect for any technical product, as there are incidental turns taken at every step, but I think the knowledge of how different aspects of a language came to be the way they are, while depending on each other and the context, is very important and useful. I therefore attempted to start with an overview of the "intellectual" history of Go, connecting following discussions of features to this history.

This first part in what I intend to be a two-part series will concentrate on the Go toolchain, that is, the set of tools for writing, verifying, compiling and maintaning Go applications. As we will see, Go tooling has come a long way, and offers a first-class development environment for writing correct and performant applications. The second installment will deal with the most important features of the language, also in the light of Go design decisions. I would like both parts to stay as up-to-date and relevant as possible, so if you have any comments on improving, do leave a comment, and I will make sure to address it.

Since the text is rather long, here is a table of contents, in case you want to jump to a subsection:

Why is Go the way it is?

In order to appreciate the design decisions that went into the Go language, it's important to understand where the language designers started from, and what problems they expected their language to solve. As Rob Pike explains in detail in this article from 2012, Go was not designed to experiment with PLT concepts. The languages and technologies it intended to replace were those in daily use at Google (C++, Java and Python); Go was designed to solve the problems these platforms presented at Google scale. A rough three-part categorization of the problems Go is intended to address can be done as follows:

Build issues: The problems that C and C++ model of compilation presents are well-known. As explained in the linked article, compiling a moderately large C++ codebase can lead to gigabytes of IO. Go avoids this and similar issues by making unused imports an error, and replacing header files and includes with an inverted dependency model of compilation. Circular imports are also not allowed. One interesting side effect of this stress on dependency hygiene is that copy-paste is preferred to importing large packages:

Dependency hygiene trumps code reuse. One example of this in practice is that the (low-level) net package has its own integer-to-decimal conversion routine to avoid depending on the bigger and dependency-heavy formatted I/O package. Another is that the string conversion package strconv has a private implementation of the definition of 'printable' characters rather than pull in the large Unicode character class tables; that strconv honors the Unicode standard is verified by the package's tests.

Developer ergonomics: Go is a minimalist language that tries to do away with features of languages used at Google, such as C++, Java and Python, which are not coincidentally also quite popular in the rest of the dev community. In fact, as this candid blog post explains, the initial drive to develop Go came from unpleasant experiences developing concurrent code in C++, and the intention of the standards committee to make the language even more complex. While differentiating itself from popular languages, Go cannot stray too far away from them, as it is intended to be used in production at a company the size of Google. As such, it has to be easy to learn, and familiar to junior developers. This simplicity is at the service of solving modern programming challengs, foremost being concurrency. Concurrency in Go is provided through communicating sequential processes (CSP), the advantage of which is that it is easy to integrate into a procedural language. Another modern feature that now meets a C-like language in Go is garbage collection. Due to the type system and memory allocation features of Go, however, garbage collection is different from the way it works in languages like Java or Python; we will discuss this in the follow-up post.

Google-scale: The design of Go is optimized for disambiguity and parsability. The underlying reasons are the ease of writing external tools and avoiding discord in large developer teams. As an example, Pike mentions that having languages that are whitespace-sensitive, such as Python, is not an issue in itself, but Python embedded in SWIG declarations turns out to be a huge problem. In order to preclude such nuisances, Go has curly braces and clear formatting rules. Another example is the now famous auto-formatting tool that provides a standard through implementation. This, and similar tools like gofix which we will discuss later, are possible because the language is easy to parse and unambiguous (in comparison to e.g. C++, which can have statements that can be parsed multiple ways). These tools enable standards to be set within large groups, and also systematic changes such as API changes to be made on large code bases. Generally, it can also be said that the rest of the design concerns gathered in the previous categories also contribute to scaling Go, especially those concerning concurrency primitives and strong standard library support. Another aspect of scale is the number of people working on a project. As Pike correctly observes, developers tend to stick to a subset they understand of a complex language with many features. As Go has a rather limited set of features, there is no subset to agree on.

Obviously, not all the design choices that went into Go can be explained through these points. There are quite a number of things that are put to good use in other languages, but are explicitly shunned in Go, such as OOP, exceptions and generics. In my opinion, there is one general thread that connects the dots, which is that in Go, things that are a pain in the large are not allowed in the small, either. Or in Rob Pike's words:

As with many of the design decisions in Go, it forces the programmer to think earlier about a larger-scale issue … that if left until later may never be addressed satisfactorily.

Another important aspect you need to keep in mind when reading Go documentation, and wondering at how basic it is, is that priorities in keeping the implementation simple and feasible in certain areas led the designers to simply omit what one takes for granted in other languages, but leads to hidden complexity in the implementation. As stated in the official FAQ:

Go was designed with an eye on felicity of programming, speed of compilation, orthogonality of concepts, and the need to support features such as concurrency and garbage collection. Your favorite feature may be missing because it doesn't fit, because it affects compilation speed or clarity of design, or because it would make the fundamental system model too difficult.

The Go toolchain

The core of the Go toolchain is the go command line tool that bundles the most important components, including the compiler. In the rest of this text, Go refers to the language and ecosystem, whereas go (with lowercase g) refers to the command line tool. Go adheres to the recent pattern of delivering development tools where the entry point is one single command which accepts a first argument as an action (other examples could be git and kubectl). In daily work, when building Go code, one rarely has to deal directly with the actual compiler or linker, which are hidden somewhere in the Go distribution. The Go compiler is in fact written in Go itself, and thanks to this fact and the snappy compile times, boostrapping the Go toolchain is one of the simplest ways to get an up-to-date version on your computer. You will first need to get the out-of-date but still useful version of go from system repositories, as in apt install golang. Afterwards, download the latest source package from the official downloads page. After unpacking it, run the command ./make.bash in the src/ subdirectory. This will compile the compiler, various other tools and the library. On my relatively outdated i7 2.40GHz computer it took 5 minutes in total. The compiler will now reside in the bin/ directory; you can either use it by referencing it explicitly or by setting the search path with export PATH=`pwd`/bin:$PATH. If you pack the following into the file hello.go, and compile it with go build hello.go, you should have the traditional Hello World:

package main

import "fmt"

func main() {  
    fmt.Println("Hello world")
}

The executable is created by default with the name of the file, i.e. hello, which means no more a.out. As we mentioned, the compiler and linker are being called in the background; you can find out how and where by running the same command with the -x option, i.e. go build -x hello.go. In this verbose output, you can see how go creates a temporary working directory, creates some files for specifying where the various obkect files are, and then brings everything together.

You can also run the Hello World file with go run hello.go; this will directly execute the code without creating an executable. The argument to this command does not have to be a file; it can also be a package or directory (with a main package; more on this later).

Cross-compiling Go

As mentioned, it is possible to cross-compile Go code for another platform. This can be achieved using environment variables that specify the target operating system and architecture. In order to compile for Linux on the Raspberry Pi which uses an ARM chip, for example, you would need to run the following:

GOOS=linux GOARCH=arm go build hello.go

If you now look at what kind of a file the resulting binary is with file hello, you will see that it's an ELF 32-bit LSB executable, ARM.

Built-in Go tools

As the Go language targets large teams of developers without too much experience, the toolchain contains a couple of tools that effectively set standards by implementing them. The best-known of these is the gofmt tool that automatically formats code. The aim is avoiding bikeshedding discussions by providing one correct, automated way of formatting Go code. The gofmt tool is delivered as a part of the Go codebase; if you built the code from source as explained above, you should have it lying next to the go tool. In daily usage, gofmt is called using the alias go fmt, which is simply gofmt -l -w. With these options, gofmt reformats the files in-place, and prints their names. This isn't all gofmt can do, however. It is also a useful tool for simple transformations using the -r option. Let's say that you modified a frequently called function yarbl, and changed the order of its arguments; the first and second arguments have to be switched. That is, instead of yarbl(x, y, z), you need yarbl(y, x, z). The following command will update all calls to yarbl in file code.go (we will see later how to refer to a module or package) and make them fit the new signature:

gofmt -r 'yarbl(x, y, z) -> yarbl(y, x, z)' -w code.go

In the pattern specification, you need to use single lowercase letters to match sub-expressions; anything else will be matched exactly. With the above pattern, the following code:

yarbl(x, y, z)  
yarbl(foo, bar, zap)

will be changed to the following:

yarbl(y, x, z)  
yarbl(bar, foo, zap)

This feature is rather useful for refactoring Go code, e.g. in order to change the name of a function or variable in order to export it publicly. Another switch gofmt accepts is -s, which can be used to simplify your code, but the transformations carried out by this option are relatively limited, in complexity and in number.

Environment variables

Before we go any further, I would like to explain the role of a couple of environment variables in the functioning of the go tool. We already saw above how the target platform and operating system can be passed into the Go compiler via environment variables. There are three more environment variables that determine the way Go looks for, stores and compiles code. In the order of importance, these are GOPATH, GOBIN and GOROOT. Other environment variables affect other functionality, as you can read in the official documentation (or on the commandline with go help environment), but they are not as significant. You can also print all the environment variables that Go consults by running go env, or go env VARNAME to get the value of a specific variable. These commands will also print the default values if the particular variables are not set.

GOPATH used to have a very central role in how code under development was organized; you needed to place your code in a very specific place under GOPATH for the go tool to work, but this situation has changed with the new module system, which we discuss below. By setting GOPATH, you can determine where go downloads third party packages and source code. If not set, it defaults to the go directory in user home. You can set it to an arbitrary directory, for example the directory you are in with export GOPATH=`pwd`".

GOBIN determines where Go saves executables that are compiled with the two other very frequently used go commands, go install and go get. These commands are used for compiling and putting executables to the GOBIN directory from local and remote code respectively. You can get a taste of the first command by running go install hello.go in the directory where the Hello World code resides. This should place the hello binary in the GOBIN directory. When not set, GOBIN defaults to $GOPATH/bin.

GOROOT is the directory in which Go looks for the standard library. In normal usage, you don't need to set this yourself: go will figure this out by looking at where it's running.

Organizing your code in modules and packages

The Hello World example above had as its first line the declaration package main. Every Go code file needs such a line at the very top (optionally after some comments for documentation), telling the compiler in which package the code in the file belongs. In order to understand and use packages, we need to start at a higher level of abstraction, namely modules. Modules are the distribution units of Go code, be it libraries or executables. Technically, they are collections of packages that have common dependencies and compilation conditions. In the old way of doing things, modules were determined by the path in which Go files were located with respect to the GOPATH, but this is not necessary anymore; you can define modules with a single command, as we will see later, which creates a go.mod file in a certain format. Once defined this way, you can organize your code into packages, just like the main package that we used above. Before we continue with examples, I would like to point out that you can get rather detailed documentation on modules on the command line with the go help modules command (available online here). As per this documentation, the module-related behavior of the go tool can be controlled in detail using certain environment variables. Generally, however, you can assume that if you're in a module (i.e. there is a go.mod file in a supervening directory), you are in module mode, and the instructions here apply. We will later handle downloading and installing Go code without the use of modules.

We will create our module in an empty directory; the name of the directory is not important. Within this directory, run the command go mod init myprinter. This will create the aforementioned go.mod, which should have the following content:

module myprinter

go 1.14

Side note: In the Go world, module names are connected to how they can be found on the internet; the conventional way of naming a module is prefixing it with its repository URL. We will deal with this topic later, in order not to complicate the matters at this point.

Obviously, these are the module name and the Go version with which it was created. Let's add some code to this module; add the following to the file myprinter.go right next to go.mod:

package main

import (  
    "fmt"
    "os"
    "path/filepath"
)

func main() {  
    dir, _ := filepath.Abs(filepath.Dir(os.Args[0]))
    fmt.Println(dir)
}

This file has the package declaration main, but the filename is completely different, which go permits. You can in fact put code for the same package in different files in the same directory, with the restriction that there is only one package in a directory. The only exception to this single package rule is the test package; more on this later. Now within the same directory, run the command go install myprinter. Before doing so, however, make sure that you have set GOBIN to a practical location. You should end up with the executable myprinter in the GOBIN directory, and when you run it, its output should be the path to the executable you just ran. The main package has a special meaning in Go. When you ask Go to create an executable from a code directory, it will look for the main package within that directory and compile it to an executable, with the main function as the entry point. That is, you cannot create an executable out of an arbitrary file; it has to be a main package, even if it's a subpackage. For subpackages, the last part of the path specification will be taken as the name of the executable. If the main package is at the base, as with the toy example here, it will be the name of the module. Fittingly, you cannot create a main package and import code from it; Go will complain that the location you are trying to import from "is a program, not an importable package".

Now let's move the logic for finding the path of the current executable to a separate package. To add a new package to our module, create the subdirectory pathfinder and put the following in the file pathfinder.go in that directory:

package pathfinder

import (  
    "os"
    "path/filepath"
)

func Find() string {  
    dir, _ := filepath.Abs(filepath.Dir(os.Args[0]))
    return dir
}

Here is something you should pay attention to: the Find function has to be capitalized. Otherwise Go will complain that it cannot be found when accessed in main.go. This is an interesting feature of the Go language: Visibility is tied to capitalization. We will see more on this in the second installment, but you should keep it in mind in case you see an error. Also modify the main.go file to look like this:

package main

import (  
    "fmt"
    "myprinter/pathfinder"
)

func main() {  
    fmt.Println(pathfinder.Find())
}

As you can see, we are importing our new package as myprinter/pathfinder. Go does not have relative imports; every import path has to uniquely identify the package it is importing – another feature through a lack of feature, making code analysis and refactoring easier. You can now run go install myprinter, and it should create a binary in the same location that does the same thing. The last argument to go install is optional; when omitted, Go will build and install the main package in the current directory. The command go build we saw earlier will do something very similar, simply dropping the compiled executable in the current directory instead of moving it to $GOBIN/bin.

You might be asking yourself, how can one check whether a package that is not an executable but simply a library is error-free and can be compiled? This can be done with both go build and go install. For non-main packages, both of these commands will compile the intermediate package binary, and then discard it (this behavior is new with modules; in the past, go install used to compile packages to $GOPATH/pkg).

Other useful subcommands

In addition to fmt, build and install, the base go tool has a number of very useful subcommands. You can list these by simply running go. Additional information on each subcommand can be printed by running e.g. go help build. I would strongly recommend you to read these help pages every now and then; I found out about go build -x while going through the help page, for example. In this section, I would like to go into a bit more detail on two subcommands that are rather useful, go list and go doc. The go list subcommand prints information about the packages specified as arguments, or the packages in the current directory if none are specified. We can list the packages under our myprinter module, for example, with the command go list myprinter. You have to do this while inside the directory, because otherwise module mode will not be activated, and the module will not be found. The output will simply be the name of the base package, myprinter. What if we want to refer to all the packages of a module recursively? Ellipsis, or three dots, is the operator you need for this purpose, as in go list myprinter/.... All go subcommands accept an argument with ellipsis; for example, to build all of the myprinter module, we could run go build myprinter/..., which would be totally useless in this case. We will see more useful applications of ellipsis later.

If we run go list myprinter/..., we will get the following list:

myprinter
myprinter/pathfinder

This is all nice and dandy, but not that useful; the same could be achieved with some grep (well, someone else could do it, at least). The real power of go list is in the use of the template argument, documented in go help list. The template can be given as the argument -f, and can include statements that interpolate from the Package data structure (also documented in the help printout). For example, for each path, we can print the package path and the imports, as follows:

$ go list -f "{{ .ImportPath }}: {{ .Imports }}" myprinter/...
myprinter: [fmt myprinter/pathfinder]
myprinter/pathfinder: [os path/filepath]

If you want to try out this command on a large module, you can try something from the standard library, such as net/.... Alternatively, you can also use the special argument all, which will print information on all "active" packages, meaning those that are depended on, including those in the standard library. go list can also be used to print information about modules, with the flag -m. With this flag, the struct that is used for interpolation is, as one would expect, Module instead of Package. For both packages and modules, there is a lot of extra information that can be printed out, which can be rather useful for automated analysis and overviews of large code bases.

Once you list out the packages in a module, you will probably want to get more information about what's in them. The command for this purpose is go doc. If you go ahead and try to print documentation on our toy module with go doc myprinter, you will see that an empty line is printed out; this is because there is no documentation. Let's add the following to the top of the file main.go:

// A module with an entry point that prints the path to the binary.
//
// This module is for demo purposes. It does not do anything useful.
// You can read the blog post at http://okigiveup.net.

If you now run go doc myprinter, you will see the above text. This is the convention for documenting Go packages: a short description, and then a longer text, both as comments at the very top, and separated by a blank line. By default, go doc does not print any members from a main package. If we run it on the pathfinder subpackage, we will see that it prints information on the Find function:

package pathfinder // import "myprinter/pathfinder"

func Find() string

When given a single argument that is a package path, go doc will print the documentation for the package and list the exported symbols (as mentioned above, this is done by capitalizing their names). As you would be prone to guess, we could get extra documentation on the Find function, but we don't have any. The go doc tool looks for a comment block right before a function as its documentation (the same is valid for constants, package variables etc); let's add the following to pathfinder.go right before Find :

// Find finds and returns the path to the currently executing binary

Now, in order to get this documentation, we would need to refer to the Find function somehow. There are two ways of doing this: either with myprinter/pathfinder.Find, or by providing a second argument, as in go doc myprinter/pathfinder Find. Both should give you the following result:

package pathfinder // import "myprinter/pathfinder"

func Find() string  
    Find finds and returns the path to the currently executing binary

Another built-in tool that is useful for checking the correctness of Go code is go vet. There are certain kinds of errors that are possible in Go code which the compiler can't (or won't) find; for example, string interpolation arguments can be missing or invalid (a %d where a string is specified), or nil checks can be unnecessary because a value cannot be nil. go vet has a number of built-in checks that are all applied by default; you can see a list with go doc vet (or by following the above link). When you run a test using go test (details of this command will be discussed later), go vet is applied with a subset of these checks, such as the printf check, which concerns the aforementioned string interpolation. If you have a CI pipeline, it makes sense to add go vet to catch any subtle issues that might otherwise slip through.

Dependency management and the build system

Dependency management in the Go world is a curious story. In earlier versions of Go dependency management was, mildly put, quite difficult to get used to. It was essentially very close to a bash script that used go list to print out and clone all the git repositories referenced in the code. For a long time it wasn't even possible to pin versions of dependencies. The recommended way to get reproducible builds for a project was to copy dependencies into the project repository (see the last paragraph of the previous link). You also had to put your own code in a very specific place, along with the dependencies, which had a weird feeling of propagating the dependencies up the code hierarchy, instead of down (i.e. in a subfolder like node_modules). Fortunately, the new module system, available since version 1.11, frees developers from this sorry state of affairs. It is the result of a nearly two year long design discussion; you can read the various posts that explain the state of the design, together with extensive discussion in the comments section, here. The resulting dependency management system is the new standard, and is miles better than the old way of doing things. Therefore I will not discuss the old GOPATH-based one, and concentrate solely on the module-based dependency management system here.

A very interesting decision Go has taken from the beginning is to combine the package system with code hosting. Above, we called our module myprinter; this is actually not the conventional way of naming packages. What we should have done is to name the module after the version control location where it would be hosted, i.e. something like github.com/afroisalreadyinu/myprinter. When you do so, Go can fetch and install these modules without any additional work on your or the community's part, like hosting a module index such as PyPI, the Python Package Index. The details of the remote import path specification can be found in the documentation with go help importpath. The gist of it is as follows. Certain well-known code hosting sites, such as Github and Bitbucket, have built-in support so that you can use them in package paths. You can also directly use VCS urls, such as ones that end with .git for Git repos. The VCS's with which Go can work is not limited to git; you can also point to bazaar, fossil, mercurial and subversion repositories. A third, more general remote import mechanism is possible through the use of meta tags on HTML pages. If a page has a specifically formatted meta tag that points to a location that hosts a code repository, the URL of that page can be used in an import path. The details can be found in the importpath documentation mentioned above, or online in the go command documentation.

This relatively simple scheme does make the import strings longer than usual, but it is actually a nice solution to the perennial problem of specifying which package you are referring to in which import. Since the import path refers also to the location, you will not run into problems using libraries that share a name, and you can easily clone a repository to same other location, and use that version instead of the "canonical" one. Go faced some criticism that tying package names to code hosting sites would centralize package distribution, especially considering the dominance of Github in this space, but compared to other package hosting solutions, such as Python's PyPI and the registry of node, Go's solution is actually more decentralized, since one can host a package on many different, easy to set up locations. Go also has a well thought out module proxy protocol; you can read about it in go help goproxy. This proxy protocol enables one to host dependencies without resorting to any public infrastructure with very little pain, as there are multiple independent implementations. You can read up on using a module proxy, and reasons you should host one, in this blog post.

So how do you add a dependency to your project? By simply importing it. Let's say we would like to print our message to the terminal in color using github.com/fatih/color. In order to do so, we first modify myprinter.go to import and use it:

package main

import (  
    "github.com/fatih/color"
    "myprinter/pathfinder
)

func main() {  
    color.Blue(pathfinder.Find())
}

If you now run go install myprinter, you should see go fetch the new dependency and place it in $GOPATH/pkg/mod directory, with subdirectories named in the same scheme as the URL module path. In addition, the go.mod file should get updated, and the following line added:

require github.com/fatih/color v1.9.0

When you add a new dependency as we did right now, and then run a go subcommand (such as build, install, test or list) Go will pick the latest stable release version, based on semantic versioning, download it, and add it to go.mod. What Go won't do is to extract and add the dependencies of the new package to go.mod. If you look at the go.mod of the new dependency, you will see that it depends on two other packages, but these are not in the updated go.mod of our module. This is in comparison to pip in Python, for example, where all dependencies will be spit out if you do a pip freeze. If you remove a dependency from your code, you can reliably remove it from go.mod by running go mod tidy. As we will see later, there is one more file that is edited when new dependencies are added, but before that we need to discover the go get command.

Installing and updating software with go get

We have seen how one can build locally available code with go install and go build. What if we want to install a command, such as goversion, which gives information on the compilation context of a binary? The command we need is go get. It accepts the same URL of the package that you would use in an import statement. Using go get, we can install goversion either from Github, or from the domain of its developer (rsc.io), which redirects to the correct repository. Let's opt for the latter:

go get rsc.io/goversion

This will download, compile and install the executable as $GOBIN/goversion. If you run go get from within the module directory, you will see that it has been added to the go.mod file as a dependency, but with the comment indirect at the end of the line. An indirect dependency of a module is one that is not directly visible from the code. Using go get to install an executable is one way to get such a dependency; the other is updating a dependency-of-a-dependency (called a transitive dependency) manually, which is also done with go get. In semantic versioning, version numbers are specified in the format MAJOR.MINOR.PATCH. As we will see later, in Go, major version changes are never done automatically; they are pretty much treated as a different module. One can use Go tooling, however, to view and apply minor and patch updates. We saw above the go list command; we can use it to view all the dependencies of a module, with go list -m all. This will list all the dependencies, also the transitive ones. There is another very useful flag that adds update status information to this output; running go list -m -u all will list, for each dependency, the current version and the available update. What do you do if there is a dependency in there that you don't know how it got in there? There is a command for it; go mod why -m MODULE will figure out the shortest direct path to that module through your dependencies and print it.

Our toy module depends on github.com/fatih/color, which has been frozen for a while and did not have its dependencies updated. When I run go list -m -u all, I can see that there are a number of dependencies with available updates. In such a situation, we principally have three options: Update one single dependency, update all transitive dependencies stemming from a direct dependency, or update all dependencies. Go allows all of these. The first one, updating a single dependency, can be achieved with e.g. go get github.com/mattn/go-isatty; this will update to the highest version in the currently used major version (i.e. minor and patch updates). If you want to update to a specific version instead, you can do this by specifying the version explicitly, as in go get github.com/mattn/go-isatty@v0.0.13. Keep in mind that Go always expects that single-letter v prefix wherever a version has to be specified; @0.0.12 will not work. The version here can also be provided as @latest, which will mean the highest version under the current major version.

The second option, updating all transitive dependencies stemming from a direct dependency, can be achieved using the -u flag. If we run go get -u github.com/fatih/, go will fetch the next valid update version for all the dependencies of this one dependency, and update them. If you want to run only patch updates, you can use -u=patch. The last action, updating all dependencies, can be done by omitting all arguments, and running go get -u at the base of the module. With any of these update commands, if you also append -t, test dependencies will also be updated.

Whichever way we update indirect dependencies, the new versions will be tagged as indirect in go.mod. The next time myprinter is built, these new versions of the transitive dependencies will be used, overriding the dependencies in github.com/fatih/color. In case the latter is updated, however, obviating the need for the indirect dependency, the next go command will remove the indirect dependency from go.mod. If you want to do this explicitly, you can run go mod tidy.

File hash checking

As already mentioned, there is another file in addition to go.mod that is changed when dependencies of a module change. This file is go.sum, which contains the cryptographic hash of each module, even the transitive dependencies that were not included in go.mod. An error will be raised if the contents of a module do not hash to this value that is first saved when the dependency is added. In fact, even if you haven't already installed a module before, Go will check its hash against a central database to make sure the code has not been modified (or manipulated, if you are so inclined) since the version has been published. The URL of this service is stored in the GOSUMDB environment variable, with the default value sum.golang.org. If this environment variable's value is off, or if the go command is called with the -insecure flag (also turning off HTTPS certificate validation), checksum validation is skipped. The sum is done lazily, only when a module is downloaded. If you want to make sure that the locally cached dependencies have not been tempered with, and have the same sum as when they were downloaded, you can run the command go mod verify.

These correctness checks might sound tad excessive – they are definitely much more detailed than the ones I'm used to from other languages – but they are direct results of the priorities set in the design discussion of the Go build system. These priorities are discussed in detail in the blog post Reproducible, Verifiable, Verified Builds, where it's explained that the Go build mechanism should provide builds that have the following three properties:

Reproducible: When repeated, a go install or go build will create the same result
Verifiable: A build artefact should record information on how it was exactly produced.
Verified: Build process should check that the expected source code packages are being used.

The use of go.mod and go.sum as explained above enable reproducible and verified builds. In order to make build output verifiable, the Go compiler packs in the necessary build information into its output. We can use the goversion tool that we installed above to print this information. By default, goversion only prints the Go version with which a binary has been built, but it can be made to print the complete build context. If we run it on our little executable with $GOBIN/goversion -mh $GOBIN/myprinter, you should get something similar to the following:

/home/ulas/go/bin/myprinter go1.14
    path  myprinter
    mod   myprinter                      (devel)
    dep   github.com/fatih/color         v1.9.0                              h1:8xPHl4/q1VyqGIPif1F+1V3Y3lSmrq01EabUW3CoW5s=
    dep   github.com/mattn/go-colorable  v0.1.4                              h1:snbPLB8fVfU9iwbbo30TPtbLRzwWu6aJS6Xh4eaaviA=
    dep   github.com/mattn/go-isatty     v0.0.11                             h1:FxPOTFNqGkuDUGi3H/qkUbQO4ZiBa2brKq5r0l8TGeM=
    dep   golang.org/x/sys               v0.0.0-20191026070338-33540a1f6037  h1:YyJpGZS1sBuBCzLAR1VEpK193GlqGZbnPFnPV/5Rsb4=

Given a Go binary, a user has complete access to the build context. I find the design of the build system rather impressive, as it strictly adheres to clear principles without compromising on usability. Especially for mission-critical applications that need to be testable with different dependency configurations, and debuggable deep into the dependency tree, Go offers a very convincing toolchain without burdening the developer with too many tools and commands.

Replacing packages with local copies

One thing that I frequently do in Python is open the code of a dependency and edit it or add debug statements while developing my own code. If you use virtual environments, the Python tool for isolating dependency contexts, this is particularly easy, as it would affect only a single such environment. How would one go about doing this in Go? One could fiddle around with the code in the package cache, but this is not recommended practice, and it will break the hash validation. In fact, the source files of dependencies downloaded by go are not even editable on my computer. The supported way of doing this would be to use the replace feature of go.mod. One can tell the module system, through a line in the go.mod file, that a local directory should be used for satisfying a dependency instead of downloading it. Let's say that I checked out github.com/fatih/color locally to /home/ulas/code/color, made a couple of changes to it, and would like to make sure it works with our sample repo. I can tell go to use this local checkout with the following command:

go mod edit -replace=github.com/fatih/color=/home/ulas/code/color

This will add the following line to go.mod:

replace github.com/fatih/color => /home/ulas/code/color

One can of course add this line manually, instead of using a command. Now, when we build myprinter, the local code checkout will be used. This replacement can be removed either by removing the replace directive from go.mod, or with the following command:

go mod edit -dropreplace=github.com/fatih/color

Import paths and major versions

Go takes semantic versioning rather seriously. The idea behind the major version in semantic versioning is that it signifies backwards-incompatible changes. Go treats such different major versions as different modules; you can import different major versions of a module, refer to them in the same package namespace, and have multiple references to different major versions in go.mod. This is called semantic import versioning. In order to demonstrate this, I have forked github.com/syohex/gowsay, turned it into a library instead of an executable, and added two versions to it. Version v1.0.0 is pretty straightforward: gowsay.MakeCow accepts a string to wrap and an options struct. Version v2.0.2 (I had to up the version a couple of times because I didn't get things right) improves the interface by exporting enumerations for the cow types and accepting one as an argument. There are two things you have to pay attention to when writing a library for external use – or rather, that I didn't pay attention to and cost me time. The first is that the module name in go.mod should be the same as how you would refer to it when used, i.e. with the repository path. In the case of gowsay the module name has to be github.com/afroisalreadyinu/gowsay. The other thing is that the version tag has to start with a v; otherwise go will not recognize it as a valid version, and will simply use the latest state of the repo. Now let's use gowsay in our demo codebase, by modifying main.go to look like this:

package main

import (  
    "github.com/afroisalreadyinu/gowsay"
    "github.com/fatih/color"
    "myprinter/pathfinder"
)

func main() {  
    path := pathfinder.Find()
    message, err := gowsay.MakeCow(path, gowsay.Mooptions{})
    if err != nil {
        message = path
    }
    color.Blue(message)
}

We see an example of error handling the Go way here; gowsay.MakeCow has multiple return values, with the second one being an error. If this error is not nil, we print only the path, and not the cow-wrapped path. If you now do a go install, you should see the following new line in the require section of go.mod:

github.com/afroisalreadyinu/gowsay v1.0.0

Although there are two versions, Go automatically picks version v1.0.0, and not the highest version. Conceptually, the basic module path github.com/afroisalreadyinu/gowsay always refers to version 1.

Updating major versions

What if we want to use gowsay version 2? The solution designers of Go have come up with is having the version built in to the import path. That is, if we import gowsay as github.com/afroisalreadyinu/gowsay@v2, any following command such as go install myprinter will download and compile version v2.0.2. A subtle and important point when changing the major version of a library you are working on is that you have to make sure to change the module name in go.mod. For version v1.0.0 of gowsay, for example, the first line of go.mod will simply be the following:

module github.com/afroisalreadyinu/gowsay

When we tag and release the next major release, we have to change this line to the following:

module github.com/afroisalreadyinu/gowsay/v2

Otherwise, go will complain with a message similar to the following:

go get github.com/afroisalreadyinu/gowsay@v2.0.2:
github.com/afroisalreadyinu/gowsay@v2.0.2: invalid version: module contains a
go.mod file, so major version must be compatible: should be v0 or v1, not v2

Another way to update the go.mod file and use the next major version of a dependency is to use go get with a higher version. But you have to be careful here: If you simply run go get github.com/afroisalreadyinu/gowsay@v2.0.2, you will get the same error message as above. The reason is that github.com/afroisalreadyinu/gowsay refers to major version 1. Go will check out version v2.0.2 and will look for the module name without the v2, and failing at this, issue an error message.

Once you import version 2.0.2 of gowsay, you can use the new interface, referring to the new version using the same name as before:

package main

import (  
    "github.com/afroisalreadyinu/gowsay/v2"
    "github.com/fatih/color"
    "myprinter/pathfinder"
)

func main() {  
    path := pathfinder.Find()
    message, err := gowsay.MakeCow(path, gowsay.BeavisZen, gowsay.Mooptions{})
    if err != nil {
        message = path
    }
    color.Blue(message)
}

It wouldn't be the case with our silly gowsay library, but if you felt the need to refer to different major versions of a library in the same go file, you can definitely do that. One of the imports, however, has to be prefixed with an alternative reference, so that the names do not clash, as in the following example:

import (  
    "github.com/afroisalreadyinu/gowsay"
    gowsayTwo "github.com/afroisalreadyinu/gowsay/v2"
)

Go module system has a number of other features we will not go into detail here, such as vendoring, where code depended on is stored in the repository. The best place to read up on these is the Modules page of the Go wiki on Github, which is exhaustive as far as I can judge. I would highly recommend at least skimming through that page, in order to have an idea of the tools that are available, and get a glimpse of the versatility Go offers.

Testing

Testing is an integral part of Go, as one would expect of a language of our times. Beyond built-in support at the language and library level for automated testing, there are multiple tools for putting tests to use in various ways. A good starting point for testing in Go is the output of go help test. We can demonstrate Go testing facilities by adding a relatively useless test to our myprinter module. In terms of where to put the test code, our options would be either a separate directory, which would make matching code to tests very difficult, or having test files next to the code they exercise. The latter would put us in a difficult situation, since test code would have to be in the same namespace as functional code due to the one namespace per directory rule. In fact, Go allows a separate namespace for tests through a built-in exception. Any file that matches the pattern *_test.go is considered a test file. These files are excluded when normal application code is built. When you run go test, however, test files are compiled and linked against the application code. Test files can also have the package name package_test, where package is the package name of the application code. We can demonstrate this by putting the following into a file named pathfinder_test.go in the pathfinder directory of our mini-project:

package pathfinder_test

import "testing"

func TestFind(t *testing.T) {  
    t.Fail()
}

If you now switch to the pathfinder subdirectory and run go test, you should see a report like the following.

--- FAIL: TestFind (0.00s)
FAIL
exit status 1
FAIL    myprinter/pathfinder    0.002s

As with many other commands, go test will use the package in the current directory if no argument is supplied. If we wanted to run the failing test from the base directory, we would need to call the command as go test myprinter/pathfinder. What if we want to run all the tests in a project? One might expect go test myprinter to work, but that refers only to the myprinter base package; the way one can refer to all subpackages of a package is by using the ellipsis, as in go test myprinter/....

There is an interesting feature of the go test runner. If you run the same non-failing tests consecutively without modifying relevant code, the tests will actually not be run; you will see a (cached) in the output next to the test's name. This is a great feature that lets you run all the tests of a package without unnecessary overhead, but in case you want to override the cache, you can enforce running them by using the -count option, as in go test myprinter/... -count 1. This option enables setting the exact number of times a set of tests is run.

A test that fails without a decent output is of course quite useless; we need assertions that provide more information. Go doesn't come with an assertions library, interestingly, but there are excellent third party alternatives. One widely used open-source package is github.com/stretchr/testify/assert. This library has many useful tools for writing better tests; you should definitely have a look at the readme. We can improve our test by asserting that pathfinder.Find does not return an empty string, which might be the case if the underlying filepath.Abs call fails:

package pathfinder_test

import (  
    "github.com/stretchr/testify/assert"
    "myprinter/pathfinder"
    "testing"
)

func TestFind(t *testing.T) {  
    assert.NotEqual(t, "", pathfinder.Find())
}

Useful test options

The go test command has quite a few tricks up its sleeve, helping you get the most out of automated tests. The flags of go test are documented under go help testflag; don't be surprised if you can't find them under go help test. Among the arguments, the count argument was already mentioned; this is very useful when you are trying to debug intermittantly failing tests. If you want to run only a single test, you can use the -run option. This option accepts a regular expression and runs only the tests matching it. When you are running multiple tests, the test run will continue even if there are failing tests. You can override this behavior, and have the test run stop when a test fails, by supplying the failfast option.

Coverage analysis of tests are built into the test tool; you can enable it with the -cover flag. The coverage analysis tooling of Go is quite intricate and versatile; you can read the details in this blog post from the time of its release. Using only the -cover option will make go print the percentage of statements covered in the module targeted by a test file. If you want to get a detailed analysis of which lines were covered, you have to use the -coverprofile option to provide a filename in which coverage analysis will be saved. For example, we can do a coverage analysis of our pet project with the following command:

go test ./... -coverprofile=cover.out

The resulting cover.out is a text file that can be turned into a nice HTML page using the command go tool cover -html=cover.out. Running this command will pop a browser window with a colorful display. Lines covered will be in green, whereas lines skipped will be in red. You can also see the exact number of times a line was called by running the test with -covermode=count option. When this option is used, the intensity of the green will actually change depending on how many times a line was executed; you can also see the exact count by hovering over a line. The default value of the covermode option is set, which records whether a line was run at all. The third and last option is atomic, which can be used in parallel tests, and which we will deal with in the second part of this tutorial.

You might wonder how tests are run, considering that Go is a compiled language and all code that runs must be packed into an executable. This is what Go does behind the curtains; tests are compiled in a per-package manner into executables in temporary directories and executed. You can achieve the same thing with the -c flag; in our module, if you switch to the pathfinder subdirectory and run go test -c, you should end up with an executable named pathfinder.test. This is not only useful, but pretty much necessary if you want to use a debugger (more on these later) to debug your tests.

One last useful option to go test worth mentioning here is -race that enables the built-in race detector. We will look at the concurrency features of Go in the second part; this option will be covered when the topic comes up.

There are two more areas handled by Go's testing module: Benchmarking and example code. We will not go into the details of these here, but keep in mind that there is extensive support for these in the standard tools, and you don't need to roll out your own.

Further Go tools

It is possible to speak of three levels of Go tools: The ones that are first-order subcommands of the go command, those that are available under go tool, and those that need to be installed with go get. We have dealt with those in the first group, such as fmt, build, list above already. The second group comprises a set of tools directed to more fine-grained compilation, analysis and debugging of Go programs. You can get a list if you run go tool. Covering all of these subcommands is beyond the scope of this tutorial, but you can have a quick look at the documentation for a command CMD with go doc cmd/CMD. Most of them are relevant for more involved work with the Go compiler and the language; we saw one, go tool cover, which can be used to convert coverage report output to html. Another important go tool subcommand is pprof, which is used for displaying profiling output.

Viewing documentation in the browser with godoc

Among the third-party tools for working with Go code, a couple are very useful for daily work. The first of these is the godoc tool, not to be confused with go doc. Whereas go doc prints documentation, godoc runs a server with documentation for all the packages that can be found in the standard library and installed modules. After installing it with go get golang.org/x/tools/cmd/godoc, you can start it with $GOBIN/godoc -http=:6060, the argument providing the location to listen at. If you now go to http://localhost:6060/ on your computer, you should see a web page looking very similar to the official Go documentation. This is way better than trying to figure out the right path to a symbol on the command line. Another useful feature of godoc is the -index flag that makes the documentation searchable. When called with this argument, a search box will be available on the top right of the page.

goimports

One tool I find very useful is goimports, which makes it much easier to work with Go import statements. Because unused imports are an error in Go, one frequently needs to add and remove imports to a file as one tries things out. Particullary annoying is adding print statements with fmt.Printf, having the compiler tell you that you need to import it, removing the same statement after resolving the issue, and then having the compiler tell you that you now have an unused import. goimports is a tool that solves this issue by adding and removing the respective imports. After installing it with go get golang.org/x/tools/cmd/goimports, you can use it as a gofmt replacement, since it takes care of the imports in addition to running gofmt. In case you are using Emacs, integrating it with the default Go mode is as simple as setting it as the formatter with (setq gofmt-command "path/to/goimports").

One remarkable thing here is that a tool like goimports is possible because the language is so simple and strict. In Python, for example, in order to figure out what a file imports, you pretty much have to execute it, as an import statement can happen anywhere. It is actually common practice to do an import within a function to beat circular imports, which are prohibited in Go. In Go, imports are allowed only in the header; that's why one can automate handling them, or create dependency graphs and analysis. That is, it was the thinking that went into the design of Go that allows tools like these to be written.

errcheck

We saw an example of error handling in Go above: A function can return multiple values, one of which can be an error. The calling code has the responsibility to read this error value, and handle it accordingly. What frequently happens is that either the second return value is not read at all, by binding the return to a single variable, or it is bound to the blank identifier _. Although this might make sense in some contexts, you want to avoid it as much as possible in production code. errcheck is a Go tool for detecting cases of error return values not being handled. You can install it with go get github.com/kisielk/errcheck, and call it in the same manner as other go tools, e.g. with errcheck ./... at the base of a module to check all packages. By default, errcheck will report only on the cases where the return value of a function with an error is not matched at all; passing in the -blank flag will make it also report cases where error values are matched to the blank identifier.

Debugging

The last topic we will touch upon is debugging Go programs. A considerable subset of developers shun using debuggers, especially for compiled languages like Go or C, but mastering a debugger definitely pays back in reduced debugging time, even if you consider only the time spent adding new print statements and recompiling. Go does not include a built-in debugger, and opts for exporting debug symbols and providing lightweight support for GDB, the GNU debugger. Since the debug symbols are exported by default when building a go executable, you can start debugging our toy module with gdb $GOBIN/myprinter, once you have installed it. You will get a curious message when GDB starts; it will either tell you that a file name runtime-gdb.py has failed to load due to a configuration error, or that it has been loaded. This file, the only Python file in the Go source repository, is a GDB extension responsible for integrating Go types and concepts (such as goroutines) with GDB. If it could not be loaded, you can follow the directions in the initial output of GDB to enable it.

I will not go into the details of using GDB with Go; you can read up on it on the Go blog. You will recognize, however, that even this post on the official Go blog recommends the third-party alternative Delve, instead of GDB. Delve, a Go debugger written in Go, is in fact much easier to use, as it is integrated into the Go toolchain, and more complete. First, install it with go get github.com/go-delve/delve/cmd/dlv. To debug a Go executable, simply navigate to the main package directory (in our toy module the base directory) and run $GOBIN/dlv debug. You can also debug your tests, by switching into the appropriate directory and running dlv test. Once the debugger is started, there are a number of commands available. All frequently used commands have two forms: standard and a short alias. The most useful commands and their aliases are break (b) to set a breakpoint, continue (c) to continue execution until a breakpoint or termination, next (n) to execute one source line, print (p) to print the value of a variable, and list (ls or l) to show code. When you start delve with dlv debug, you land in the initialization of the executable; list will show you some go runtime C. You can land at the beginning of your program by setting a breakpoint there with b main.main, and continuing until the breakpoint with c. Delve will run the code until the start of the main function and print the surrounding code context. When debugging our toy module, you could for example enter n twice after the beginning of main.main is reached, landing at the line after path is set, and then print the value of this variable with p path. This is a very simple example of what delve can provide; I would highly recommend reading the getting started guide, and going through the commands listed when you enter help at the delve console.

Both GDB and Delve are CLI debuggers. If you are more into visual debuggers, you can use one of the popular IDEs with Go integration, such as VSCode or Goland from JetBrains. Unfortunately, I'm not familiar with any of these, but a quick Google search shows that they can be used as debuggers for Go.

In the next episode

In this first part of this tutorial, we looked at the reasons Go was developed, the fundamental ideas behind its design, and the tooling packed mostly into the go command and some other third-party packages. You should now be ready to write, test, combine, debug and package Go code using these standard tools. In the next part of this tutorial, we will go into more detail of what kind of code to actually write.

Resources

Go at Google: Language Design in the Service of Software Engineering provides the reasoning behind the design of Go, the trade-offs and explicit non-goals. It's a great resource for understanding which problems the designers wanted to solve, and why they left certain things out. A similar, but less systematic text is Less is exponentially more, which details the very early driving guidelines and design decisions that went into Go.
The design of Go dependency management was discussed over a number of blog posts which are linked to on this page; I would definitely recommend [??]. Once the design was finalized and implemented, it was announced over a couple of posts on the official Go blog; the first post has links to the further ones. These posts discuss practical aspects of working with modules and dependency management. If you want to read the intricate details and questions, these are discussed extensively on the wiki of the go project on Github.
Go tooling in action is an excellent screencast on developing and improving Go code with standard and a couple of third-party tools, especially improving performance using pprof, go-torch and go-wrt. A nice display of how fast Go code, especially web service, can be made to perform.
An Overview of Go's Tooling is an excellent tutorial that covers a lot of similar ground to this post. I actually picked up quite a few tricks from it. It also covers a couple of topics such as compiler options and benchmarking that are not covered here.
Go's Tooling is an Undervalued Technology is an enthusiastic look at a couple of aspects of Go tooling. It covers topics skipped in this post, such as vendoring.

Why Thinkpad X220 is the best laptop ever made

Ulaş Türkmen — Sun, 10 May 2020 19:56:05 GMT

I'm rather certain that you will come to the same conclusion, esteemed reader, as I did, namely that the Thinkpad X220 is the best laptop ever made, in consideration of the following points.

1 - It cost me 250 Euros

You will have to accept that that's quite a bargain, considering that a new laptop costs upwards of at least twice as much, and if you want something with a decent brand and maybe even a metal chassis, at least four times as much. Yes, I bought it used, but it has worked flawlessly since more than a year, it has what feels like a new keyboard, and with the addition of a 128 GB SSD that cost 20 bucks (how that is possible, I don't know), it actually freaking does stuff like run Emacs and a browser without melting down.

2 - It has the best keyboard in the world

Have you ever typed on a X220 keyboard? If not, you should. This thing is heaven on earth. I stopped typing on my ergonomic mechanical clickity-clack keyboard just because of this thing. It has the je ne sais quoi of an old Thinkpad keyboard: robust, light, and I will have to admit, not clickity-clacky, but at least tippity-tap, in a rather pleasing way. It has a hardware volume up-down and mute button at the top, coupled with a blue, weird-looking ThinkVantage button that I mapped to Emacs. I'm pretty sure the designers intended it that way. If that is not enough, there is a big-ass escape key at the top left. This key is, like, the biggest escape I have ever seen, without exaggeration. Seriously. It's that big.

3 - You can't watch videos on it

Twitter videos don't work at all, and any other video is way too taxing for the hardware. This is a feature. It's good. You won't get distracted.

4. You can listen to MP3s on it

Now is the time to ditch Spotify, that hog of memory and CPU, and go back to the late 90s vibe of MP3s. You can actually play music from the disc, and not have your computer heat up to the sun's surface temperature, would you believe that? Just throw in some winampy goodness with audacious and copy over a couple of gigabytes of pirated music from your college years (which should fall under the statute of limitations, right?), and you're good to go. The loudspeakers are crap, and there is no bluetooth (again, a feature!), but that's what the expensive headphones you bought two years before the Airpods came out are for.

5. There is a tiny flashlight on the lid

It comes with its own tiny cute flashlight to (I think) illuminate the keyboard, although it more illuminates the screen. You can turn it on and off with one keyboard shortcut, and it's the sweetest thing ever. When I'm bored, instead of watching videos, I turn the flashlight on and off. Much more entertaining.

6. It has differently shaped ports

These days, if you get a new laptop, on the one side there are 2 of a certain port, and on the other side there are 3 of the same. Not on this bad mofo. It actually has 8 ports that all look different. USB, HDMI, audio, VGA, you name it. I can't make any promises in terms of your adapter requirements, but damn it feels good to have a laptop with an actual CAT5 port (I hope that's what those creepy internet-from-telephone sockets are called). And what is that switch under the big-ass emptiness that looks like a 1980s disk drive? An actual physical flight-mode switch? Sweet!

Conclusion

Seriously though, this computer is the perfect compromise between a work device and distraction monster. The keyboard is great, and the form factor is optimal for sticking it under your arm and setting off. Emacs and all kinds of compilers and runtimes work perfectly well on it, but various other distracting things like web video and news sites don't. If you want one, hit me up, I know a very good dealer.

Discovering AWS with the CLI Part 2: ECS and Fargate

Ulaş Türkmen — Fri, 25 Oct 2019 09:13:58 GMT

In the first part of this tutorial, we looked at provisioning AWS EC2 resources using the CLI client, and delved into the details of how various networking components function. In this second part, we will look at using containers instead of virtual machines to deploy applications. In the recent years, containers have become the predominant form of delivering server-side software, due to their versatility and limited resource use. Especially Docker has made it possible to package services and online applications so that they can be distributed from a central repository, and replicated with very little effort. ECS (Elastic Container Service) is AWS's entry into the container orchestration space, where other alternatives are Kubernetes, Mesos and the like. There are two different ways to use ECS: The old way, where you have to provision the computing resources manually, and the new way, where AWS is responsible for running the infrastructure. We will use the latter method, which is named Fargate.

As in the first part of the tutorial, we will be using the AWS CLI; you should install it and set up the necessary credentials and environment variables using the tips from the first post. In order to build the container images that will be deployed on Fargate, you will need Docker; it can be installed by following the standard installation instructions. The necessary files for the container images and the demo applications are in this sample repo. Finally, as with the first part, you can find all the commands in this tutorial in a bash script in the same repository.

Organization of Fargate

As mentioned above, Fargate is a launch type, i.e. a method of deploying containers on ECS, the Elastic Container Service. In ECS, applications are deployed as tasks, which are collections of containers working together, similar to pods in Kubernetes, on clusters, groups of container and networking infrastructure that can spawn multiple AZs in a region (see here for a diagram of how ECS clusters are organized). A set of tasks that are scheduled according to a scaling strategy, and on which load is distributed, is a service. There are two launch types on ECS. Fargate, the one we will use, offloads the management of computational resources to AWS, and leaves only the work of defining tasks and services, in addition to networking, to the user. The EC2 launch type, on the other hand, requires the user to create and manage the VMs on which the containers run.

An important component of ECS is the container agent. This agent is installed on EC2 instances on which the tasks run, and is responsible for pulling, running and stopping the containers. When using the EC2 launch type, it's the user's duty to install and run the agent, but Fargate absolves the user of this task by automating it. You nevertheless need to be conscious of the fact that this agent is doing work for you in the background, however, as we will see later.

Preliminary commands

There are two things we need to take care of at the start. The first is picking a region. Many commands, such as creation of subnets or VPC endpoints, require the explicit specification of a region, which we would like to simplify by putting it into a variable, as in REGION=eu-central-1. The second preliminary is a bit more complicated. ECS uses a longer format for ARNs, which due to some reason makes it impossible to tag services. There is an option, however, which you need to opt in to, which enables this feature. You can opt in either using the web console (the Account Settings tab in the ECS service view), or by running the following command:

aws ecs put-account-setting-default --name serviceLongArnFormat --value enabled

Warning: This will set the option for all the IAM users on an account. If you don't want to do this, you should change it on the web console for the specific IAM user and use the API keys for that account in the rest of the tutorial.

Creating a repository

Containers are distributed by building an image, and uploading it to a container repository from which they can be downloaded. Repositories on AWS are provided by ECR, the Elastic Container Registry (not Repository, since a registry is a collection of repositories). Each AWS account has a single registry, which can house many repositories; you can't delete this registry, or add any new registries. If you want to push images for a service, you need to create a repository for it. Let's go ahead and create a repository for the static-app (which is just Nginx with an index page, but I named it app due to some reason, and now it's too late to change) in the sample code repo:

aws ecr create-repository --repository-name static-app \
  --tags Key=Environment,Value=Demo

STATICAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names static-app \
  --query "repositories[0].repositoryUri" --output text)

As you can see, we are sticking to the habit of setting the Environment tag to Demo for all our resources, as in the previous installment. The CLI also lets you log into the repository you just created without having to deal wih a complicated process, using the ecr get-login subcommand. The output of this command is itself a command you can use to log your docker client into the registry. You can avoid the extra copy-paste by executing the return value of this command, as follows:

$(aws ecr get-login --region $REGION --no-include-email)

Be mindful of the --no-include-email option, as the command returned without it is not valid. Now it's time to build and push a container to this registry. In the directory of the static-app, there is a Dockerfile that you can use to create an image. Once you check out this repository, navigate to the directory static-app, and run the following commands:

docker build -t $STATICAPPREPOURL:0.1 static-app/
docker push $STATICAPPREPOURL:0.1

We now need to deploy this image on Fargate. The first resource we need to create is a cluster. We will use the name demo-cluster for our cluster:

aws ecs create-cluster --cluster-name demo-cluster --tags key=Environment,value=Demo

If you now run aws ecs list-clusters, it should show your brand new cluster as the only entry.

IAM role for the ECS agent

The ECS agent mentioned above needs to carry out certain operations in order to orchestrate the task containers. Among these are checking for images in the registry, downloading these images, and creating and piping to log streams (see here for details). In order to give it the right permissions, we need to create the approriate IAM role, and give the ECS agent the permission to take on this role. Let's first create a role named ecsTaskExecutionRole, giving the ECS agent the right to take on this role:

ROLEARN=$(aws iam create-role --role-name ecsTaskExecutionRole \
  --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"ecs-tasks.amazonaws.com\"]},\"Action\":[\"sts:AssumeRole\"]}]}" \
  --query "Role.Arn" --output text)

We will later use this role name in our task definitions. We now need to attach the right policy to this role. Fortunately, we don't have to manually create the policy, or attach the individual permissions one by one, since there is a policy managed by AWS that contains all the individual permissions. We will now get the ARN of this policy, named AmazonECSTaskExecutionRolePolicy, and attach it to the role we just created:

POLICYARN=$(aws iam list-policies \
  --query 'Policies[?PolicyName==`AmazonECSTaskExecutionRolePolicy`].{ARN:Arn}' \
  --output text)
aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn $POLICYARN

Registering a task definition

Having pushed a container image, created a cluster, and given the cluster agent the right permissions, what we need to do next is create a task definition. A task definition is a JSON file that specifies which containers have to be deployed together as a unit, and on which ports these containers are listening. Here is a template that will serve as the base of our task definition for the static-app container (the file static-app/task-definition.json.tmpl in the sample repository):

{
  "family": "static-app",
  "networkMode": "awsvpc",
  "executionRoleArn": "$ROLEARN",
  "containerDefinitions": [
    {
      "name": "static-app",
      "image": "$STATICAPPREPOURL:0.1",
      "portMappings": [
    {
      "containerPort": 8080,
      "hostPort": 8080,
      "protocol": "tcp"
    }
      ],
      "essential": true
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

In this template, you need to either manually replace $ROLEARN and $REPOURL with the actual values, or use this file as a template, by first exporting the necessary values with export ROLEARN STATICAPPREPOURL on the command line, and then subsituting them with envsubst < static-app/task-definition.json.tmpl > task-definition.json. Now we are ready to create a task definition with the following command:

TASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)

A couple of things worth pointing out in this task definition:

The networkMode is awsvpc, which is an AWS-native implementation of container networking. awsvpc enables tasks to connect to the AWs networking infrastructure just like VMs over an elastic network interface (ENI), with the ability to give them private IPs and DNS entries. When using Fargate, the networkMode has to be specified as awsvpc.
containerPort and the hostPort have to match because we are using awsvpc; see the section Port mappings in this part of the documentation.
You can't use arbitrary values for cpu and memory. See here for the combinations of values that are allowed.
The family field is used to generate an index for the task definition versions. When the task definition is first created, it starts at version 1. Every request to register a task definition with the same family field will up this number by one, and this version number can be used when a service is created or updated. This numbering is also the reason we are saving the new task revision in a variable, so that we don't accidentally deploy old versions of our tasks.

Creating a service

A service is a group of tasks, managed by the container orchestration system (in our case Fargate). The tasks sit behind a common interface, and the incoming requests are distributed among them based on load and availability. Fargate, similar to other container orchestration systems, makes it easy to scale the number of tasks and dedicate resources. In order to turn our static-app into a service, we need to use the previously created task definition, specifying how to scale it, and route requests to it.

If you thought we would be able to navigate around the networking stuff from the first post, I'm sorry to disappoint you. The first thing we need to deal with to create an online service is networking infrastructure. Let's start with the VPC and its subnets:

VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo

# We will need this later when we deploy services with DNS to our VPC
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support

SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.1.0/24 \
  --availability-zone "${REGION}b" \
  --query "Subnet.SubnetId" --output text)
SUBNET2ID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.2.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
PRIVATESUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.3.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $SUBNET2ID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $PRIVATESUBNETID --tags Key=Environment,Value=Demo

Here, we are laying down the networking infrastructure for the rest of the tutorial; this is the reason for creating two subnets. As we will see later, application load balancers require at least two subnets, hence the two public subnets. These two subnets also need to be in different availability zones for reliability; this is why we are distinguishing them using a single extra letter, as explained in the AWS documentation on regions and AZs. The private subnet will be used to host an internal service. Now let's create a gateway, which we need for the communication between the services on the VPC and the rest of the Internet, as explained in the first part of this post:

GATEWAYID=$(aws ec2 create-internet-gateway --query "InternetGateway.InternetGatewayId" \
  --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID --internet-gateway-id $GATEWAYID

Once we have the gateway, we need to modify the default route table to use it, and allow ingress to the network security group:

ROUTETABLEID=$(aws ec2 create-route-table --vpc-id $VPCID \
  --query "RouteTable.RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID
aws ec2 associate-route-table  --subnet-id $SUBNETID --route-table-id $ROUTETABLEID
aws ec2 associate-route-table  --subnet-id $SUBNET2ID --route-table-id $ROUTETABLEID
SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

Now that we have the necessary networking elements and security rules, we can go ahead and create our first service, based on the simple-app task definition:

aws ecs create-service --cluster demo-cluster --service-name static-app-service \
  --task-definition static-app:$TASKREVISION --desired-count 1 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200 \
  --network-configuration "awsvpcConfiguration={subnets=[$SUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"ENABLED\"}"

aws ecs wait services-stable --cluster demo-cluster --services static-app-service

Let's go through some of the arguments:

The launch type is FARGATE, which we also specified as a required compatibility in the task definition.
The scheduling-strategy argument lets us specify how tasks are instantiated and maintained. The REPLICA strategy tells Fargate to keep desired-count (another argument) instances of the task running. We can increase or decrease this number as need be, and Fargate will take care of starting, stopping and (together with a load balancer, which we will see later) routing traffic to these tasks.
An important aspect of a container orchestration platform is how new containers are deployed. The deployment-controller and deployment-configuration arguments are how we specify the deployment strategy. The ECS deployment controller is used for rolling deployments, in which new containers are started, and depending on whether these reach running state, old ones are stopped after draining connections to them. The numbers in deployment-configuration specify the percentage of new containers to start and old ones to stop at the same time. Refer to the documentation for details.
The network configuration options, required for the awsvpc infrastructure, specify that the service should attach to one of the public subnets, run under the default security group, and receive a public IP.

Once the service is created, we use the wait command, which we previously used to wait for an EC2 instance (albeit with ec2 wait instead of ecs wait), to wait for the service to be stable, i.e. for the number of running tasks to be equal to the number of desired tasks. Once this command returns, we can fetch the IP address of the service task with the following command:

aws ec2 describe-network-interfaces --filters "Name=subnet-id,Values=$SUBNETID" \
  --query 'NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp' --output text

You should now be able to access this task at the resulting IP address. We can't yet call it a day, however. The way we are using ECS is suboptimal due to a number of reasons. Because each task gets a separate IP address, clients will need to know which task has which IP to make a request (assuming that our service does something useful, of course). Load balancing between multiple tasks of a service will be difficult, as the clients need to keep track of the IPs of the tasks. There is also a clear security risk, as all tasks would have public interfaces. We will adress these issues in the next section.

Microservices on Fargate

What we want to achieve in this section is being able to use Fargate as a microservices platform. This involves the following features that are missing from our primitive, one-public-IP-per-task setup:

Ingress configuration: Based on the request path, we want to be able to route requests to different services.
Load balancing: Both for public and private services, we want to distribute requests between the tasks in a manner independent of the client.
Internal DNS to implement service discovery.

Ingress and load balancing with ELB

Thanks to awsvpc networking, it is very easy to connect an ELB instance to a subnet, and assign task containers to it. The kind of load balancer we will use is called an application load balancer (ALB), which allows only HTTP and HTTPS traffic. Let's first scale down our static-app service to zero tasks and delete it, as it is too basic for this demonstration:

aws ecs update-service --service static-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service static-app-service --cluster demo-cluster
aws ecs wait services-inactive --service static-app-service --cluster demo-cluster

An ALB is configured through three entities: Load balancer, target group and listener. The load balancer is the point of contact for the clients, and the target group gathers the target units (in our case tasks) that receive the requests. Listeners connect these two to each other, and are used to specify which conditions are used to route requests to which target groups. Now let's create these:

LBARN=$(aws elbv2 create-load-balancer --tags Key=Environment,Value=Demo --name demo-balancer \
  --type application --subnets $SUBNETID $SUBNET2ID --security-groups $SECURITYGROUPID \
  --tags Key=Environment,Value=Demo \
  --query "LoadBalancers[0].LoadBalancerArn" --output text)

TGARN=$(aws elbv2 create-target-group --name hostname-app-tg \
  --protocol HTTP --port 80 --target-type ip --vpc-id $VPCID \
  --query "TargetGroups[0].TargetGroupArn" --output text)

aws elbv2 add-tags --resource-arns $TGARN --tags Key=Environment,Value=Demo

LISTENERARN=$(aws elbv2 create-listener --load-balancer-arn $LBARN --protocol HTTP \
  --port 80 --default-actions Type=forward,TargetGroupArn=$TGARN \
  --query "Listeners[0].ListenerArn" --output text)

We are not adding tags to the listener, as this is not supported. As already mentioned, load balancers require at least two subnets from different zones on creation, for reasons of reliability; we are using the two subnets we created in different AZs here. The target group we create is empty, and will be populated later by a new service. We will be using a different service for demo purposes in this section; you can find it in the samples repo. This service is called hostname-app because it displays the value of the HOSTNAME environment variable; we will see why this is relevant later. Another thing we will need is a security group for internal services through which we can control traffic between various parts and the internet. We will allow traffic between this security group and any interfaces on the VPC network:

PRIVATESECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name private-security-group --description "Private SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16

Finally, we need to create a new container repository for this service, push an image, and create a task description:

aws ecr create-repository --repository-name hostname-app \
  --tags Key=Environment,Value=Demo

HOSTNAMEAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names hostname-app \
  --query "repositories[0].repositoryUri" --output text)

docker build -t $HOSTNAMEAPPREPOURL:0.1 hostname-app/
docker push $HOSTNAMEAPPREPOURL:0.1
export ROLEARN HOSTNAMEAPPREPOURL
envsubst < hostname-app/task-definition.json.tmpl > task-definition.json

HNTASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)

VPC Endpoints

We can now create a service for the hostname app, which, unfortunately, is not going to be particulary successful. Let's go ahead and see why. Here is the command we need to create the service:

aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo

We will go through the new arguments to the create-service command later, but first let's query the state of the task that is started by the Fargate agent for this service with the following commands:

TASKARNS=$(aws ecs list-tasks --cluster demo-cluster \
  --service-name hostname-app-service --query "taskArns" --output text)
aws ecs describe-tasks --tasks $TASKARNS --cluster demo-cluster

If you do this a short time after the service is created, you will see an error message similar to the following in the field tasks[0].containers[0].reason:

"CannotPullContainerError: Error response from daemon: Get https://$REPOID.ecr.eu-central-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)"

This error is caused by Fargate not being able to fetch the container images required for the task, because there is no network path to the ECR repository. When we deployed static-app, our tasks could communicate with the rest of the Internet in a straightforward manner, as they had public IPs. In the new layout, the tasks are on a private subnet, and can be contacted only through the load balancer. It is possible to solve this issue using a NAT (Network Address Translation) gateway, but NAT gateways are relativelssy expensive, and require an elastic IP address. A better solution can be achieved using VPC endpoints. What VPC endpoints essentially provide is that AWS services work as if they are part of a private subnet. There are two kinds of VPC endpoints: Interfaces and gateways. Interface endpoints function by creating an endpoint network interface in the specified subnets. Gateway endpoints, on the other hand, function by manipulating the route table of a VPC. Although there are two different kinds of endpoints, you as a user do not have much choice as to which to use for which service, since gateway endpoints have to be used for S3 and DynamoDB, and interfaces for the other services. We will therefore go ahead and create an interface VPC endpoint for ECR, and a gateway endpoint for S3, as container image layers are downloaded from S3. Bur first let's first delete the existing service:

aws ecs update-service --service hostname-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service hostname-app-service --cluster demo-cluster
# This takes some time
aws ecs wait services-inactive --service hostname-app-service --cluster demo-cluster

In order to make sure we can isolate different pieces of our cluster security-wise, let's also create a separate security groop for the endpoints, and authorize ingress and egress between the private security group and this new group:

ENDPOINTSECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name endpoint-security-group --description "VPC Endpoint SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $ENDPOINTSECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $PRIVATESECURITYGROUPID

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $ENDPOINTSECURITYGROUPID

Through these rules, we are allowing requests into the endpoints from the private services (on all ports here, but allowing port 80 for HTTP and 443 for HTTPS should be enough).

And now let's create the ECR and S3 VPC endpoints:

ECRENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Interface" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.ecr.dkr" \
  --security-group-ids $ENDPOINTSECURITYGROUPID --subnet-id $PRIVATESUBNETID \
  --private-dns-enabled --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $ECRENDPOINTID --tags Key=Environment,Value=Demo

S3ENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Gateway" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.s3" \
  --route-table-ids $DEFAULTRTID $ROUTETABLEID \
  --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $S3ENDPOINTID --tags Key=Environment,Value=Demo

The ECR endpoint accepts a security group id argument, for which we use the default security group of the VPC. The S3 endpoint, on the other hand, does not accept such an argument. The question now is, how do we specify that requests from our private subnet to S3 are allowed? We can't use IP addresses, as we don't know the private IP which the S3 gateway is appointed. Security groups are not an option, as the gateway does not have on. The solution is using what are called prefix lists to specify a group of IP prefixes that point to the S3 endpoints the gateway will choose among. In the following, we first get the ID of the prefix list we are interested in using aws ec2 describe-prefix-lists, and then we allow requests to these IP addresses from our services using the --ip-permissions option of the authorize-security-group-egress:

S3PREFIXLISTID=$(aws ec2 describe-prefix-lists --region $REGION \
  --query "PrefixLists[?PrefixListName == 'com.amazonaws.${REGION}.s3'].PrefixListId" \
  --output text)

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
    --ip-permissions IpProtocol=tcp,FromPort=0,ToPort=65535,PrefixListIds="[{Description=\"Why isnt this in the docs\",PrefixListId=${S3PREFIXLISTID}}]"

Afterwards, let's try to create the service once more, with the command repeated here for ease of reference:

aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo

aws ecs wait services-stable --cluster demo-cluster --services hostname-app-service

Let's now go through the arguments to this command that differ from the previous one that created static-app:

The desired count is this time 2. The service will create 2 tasks for us, and the incoming requests will be load balanced among these over the load balancer we created.
The network configuration this time around specifies that the network interface should be placed on the private subnet, and that public IP is disabled. Our container cannot make or receive requests to/from the rest of the internet, except for the AWS services for which we created VPC endpoints.
The additional argument --load-balancers specifies that the service bind to the load balancer target group created earlier. Here we are specifying that the containers named hostname-app (this should align with the name field in the task definition) should be contacted on port 8080, which is the port our app listens on.

Once again, we are waiting for the service to reach a stable state where all tasks are running. Once this command has run through, we can fetch the URL of the load balancer, at which we can access the service, with the following command:

aws elbv2 describe-load-balancers  --load-balancer-arns $LBARN \
  --query "LoadBalancers[0].DNSName" --output text

You should now see a page that displays the hostname of the task that responds to the request. If you reload the page, you should see the displayed hostname alternate between two options, as the consequent requests are rotated between two targets as per the round robin algorithm. We can scale our service by changing the number of tasks using the aws ecs update-service. New tasks will be added to the service, or old ones removed, with the load balancer target group draining the connections from the removed ones, and rerouting alternatively to new tasks. Here is an example for reducing the number of tasks to one:

aws ecs update-service --service hostname-app --cluster demo-cluster \
  --desired-count 1

Health Checks

One thing you have to pay attention to when creating the task and the load balancer is the health check option of the load balancer target group. Health checks are used by load balancers to determine which targets (in our case, containers, but it could also be VMs) are healthy, and should be routed requests to. The default health check for ALBs is whether a GET request to the index (i.e. /) endpoint of the target returns a 200 response code. If your app does not respond to such a request in the expected manner, you can use the health check options of the create-target-group subcommand to specify a more suitable one. A tricky issue to debug is when the app is configured to bind to localhost or 127.0.0.1 instead of 0.0.0.0. When this is the case, the app will not respond to the requests on the host it is given by the Fargate agent, thus failing the health request checks. New instances of the same task will be created in a loop, without the service reaching stable status. So make sure that your app binds to the general 0.0.0.0 interface instead of the loopback interface.

Internal DNS and Service Discovery

If we want to use Fargate as a microservice platform, we need a means to contact the tasks of a service on a private subnet under a single name for easy server-side service discovery. To give an example, Kubernetes achieves this functionality by giving each service a DNS that resolves to a cluster IP. This cluster IP is used to proxy connection requests to a service to one of the service pods at the node level. The way to implement similar functionality on Fargate would be through the ECS service discovery API, which uses Route 53 to create VPC-local DNS entries for services. In our demo of this functionality, we will use yet another app, the random-quote-app, which returns a random quote on programming a JSON. The random-quote-app will not have a public endpoint, in order to simulate microservices. hostname-app service has the route /random-quote which queries the random-quote-app and displays the result.

Commands for creating random-quote-app container registry and task definition are marginally different from the previous two services, so I will not repeat them here, and will instead focus on service discovery. The resources we need for DNS-based service discovery on Fargate are a namespace and a "service discovery service", a terrible name for a straightforward concept. A service discovery service is an ECS service that should be represented in the service discovery mechanism with a name. This name, plus the namespace, are used to resolve DNS queries to the IP address of a task that belongs to the service. In the following, we are first creating the namespace, and then the service discovery service that attacches to it:

OPERATIONID=$(aws servicediscovery create-private-dns-namespace --name "local" \
 --vpc $VPCID --region $REGION --query "OperationId" --output text)

NAMESPACEID=$(aws servicediscovery get-operation --operation-id $OPERATIONID \
  --query "Operation.Targets[0].NAMESPACE" --output text)

RQSERVICEID=$(aws servicediscovery create-service --name random-quote \
  --dns-config "NamespaceId=\"${NAMESPACEID}\",DnsRecords=[{Type=\"A\",TTL=\"300\"}]" \
  --health-check-custom-config FailureThreshold=1 --region $REGION \
  --query "Service.Id" --output text)

The --name argument we supply to the aws servicediscovery create-private-dns-namespace command will be the top-level domain of the cluster DNS. Once we fetch the ID of the namespace with the second command, we can use it to create DNS for our service with aws servicediscovery create-service. The --name argument to this command determines how to refer to the service. Once this second command has ran, any DNS queries to random-quote.local from within the VPC will resolve to up to eight instances of random-quote-app. You should now be able to go to ${LBURN}/random-quote/ and see a random quote on programming. As you can see in the app code, hostname-app uses the URL http://random-qoute.local:8080 to contact the random-quote-app and fetch the quote. The port has to be included in the request, because the task to which the DNS resolves is contacted directly, without a load balancer in between.

Conclusion

As mentioned in the introduction to the first part of this tutorial, the command line client for AWS can be quite useful for discovering what AWS has to offer. Once the going gets tough, however, and numerous AWS services and complicated security and network resources are involved, it gets quite difficult to keep track of the various commands and the minute ways they differ from each other. In another context, I have had the opportunity to implement a very similar microservice architecture, using Terraform, a tool much better suited to provisioning dependent and highly-connected cloud resources. It was a much better experience, and I would say that beyond simple things, and the occasional tricky feature that cannot be implemented with another tool, the CLI should be limited only to discovery and prototyping. That said, I hope this tutorial helped you to understand Fargate and the other relevant AWS components better.

Resources

This blog post gives an overview of the advantages of Fargate over ECS.
This blog post from the AWS team explains the nitty gritty details of container networking in Fargate.
Another blog post from AWS, this one explaining how to create a service registry for a Fargate cluster.
A detailed tutorial on connecting ECR to Fargate using VPC endpoints.
Deep Dive into AWS Fargate is a talk from 2018 that contains a nice overview of Fargate as compared to ECS and standard EC2, with a demo that uses CloudFormation.

Discovering AWS with the CLI Part 1: Networking and Virtual Machines

Ulaş Türkmen — Tue, 27 Aug 2019 12:29:53 GMT

Recently, I started working on moving an application that was deployed manually to an AWS EC2 instance to a more modern, infrastructure-as-code setup. This gave me the chance to dive deeper into AWS concepts, and play around with the various services. There are numerous ways to use the AWS API: On top of the standard tools offered by Amazon, such as the web GUI, CLI client, client packages for a number of languages and CloudFormation, there are various third party tools, such as Terraform and Ansible. Pretty much every other tutorial or book on AWS is a click-through in the web UI, but neither the pedagogic effect nor the resulting programmatic output is optimal: It cannot be reproduced, and when you want to go over it, you need to recall where the hell you clicked, and which values had to be same or related to each other. I found the CLI client to be a much better alternative, because you can linearly follow what has to happen when, and how things connect to each other. You can also use the resulting code for actual productive orchestration work. This tutorial documents what I found out about getting the most out of the CLI client, and how one can use it to understand and discover AWS concepts.

If you don't want to copy-paste all the commands, you can check out the samples repository, which I will refer to extensively in part 2,and use checkpoints-part-1.sh file which bundles all examples into on script. This script notificies the user at the different checkpoints of the current location, and the execution will pause. You can then inspect the state on the AWS console, or run commands in another shell.

Installation and Configuration of awscli

The AWS CLI client is delivered as a Python package named awscli. As such, the easiest way to install it is to use pip, with pip install awscli. Once you have installed it, you need to register you access keys, which can be done with aws configure command. You can add additional profiles with the --profile argument, and you can also rerun the command if you want to change something, such as the default region. When you use the command and want to specify a certain profile, you can either use the `–profile` argument, or set the environment variable AWS_DEFAULT_PROFILE. The same thing is valid for region; you can either pass the argument --region, or export it as AWS_DEFAULT_REGION. If you are ever in doubt of who you are logged in as, you can simply issue the command aws iam get-user, which will show you your username and user ARN.

General usage

The AWS CLI accepts combinations of commands, with the first command being something like the namespace. The default output format is JSON, and you can manipulate this output using JMESPath notation. It is also possible to print the output as a table or in plain text; the former is rarely used, but the latter is necessary if you want to use the output as input for other commands. A really useful feature is autocompletion, which provides a quick means to search among the many namespaces and subcommands. In order to enable autocompletion, you need to specify that the command aws_completer needs to be used to complete the command aws, which can be done with the following:

complete -C "$(which aws_completer)" aws

Now, tabbing should help you find stuff. More details can be found here.

Uploading an SSH key

We will be creating EC2 instances in the following, and you will need an SSH key to access them. The way this works on AWS is that you upload your public key with a name, and then specify, on creation, that a VM should be accessible with that key. Creating an SSH key is very easy, as famously documented on the Github documentation. Once you have created one, you can add it to the available keys on AWS with the following command:

aws ec2 import-key-pair --key-name brand-new-key \
    --public-key-material file://~/.ssh/id_rsa.pub

You can later refer to this key as brand-new-key and use it to SSH into your VMs.

Resource groups and tags

It is possible to gather AWS resources under resource groups, which enables certain bulk features such as monitoring costs or gathering logs. Unfortunately, deleting resources is not among those features, at least not using the web console or the CLI. A resource group is created by specifying a query that will match resources based on tags. If we want resource groups to be based on the value of the Environment tag, for example (tag names are by convention capitalized), we need to create the resource group demo-environment with the following command:

aws resource-groups create-group \
    --name DemoEnvironment \
    --resource-query '{"Type":"TAG_FILTERS_1_0", "Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Environment\", \"Values\":[\"Demo\"]}]}"}'

As you can see, the format is really god awful. The use of tags on the command line (and also on the web console, for that matter) is complicated considerably by the non-uniform application of tags to resources. Some resources (such as EC2 instances) accept a tag on creation, whereas others can be tagged only once they are are created; you can see a detailed list here. Resource groups are still rather useful, however, due to which reason we will tag all the resources we create. We will see later how to create resource as a part of the DemoEnvironment.

VM and VPC, the heart of AWS

The heart of AWS is EC2 service, the Elastic Compute Cloud. It provides the means to create, organize, access and interface to scalable computing infrastructure, the infamous EC2 instances. Under the covers, AWS uses EC2 to run the rest of its own services. EC2 instances are just a part of the puzzle, though. You will be dealing even more often with the networking components of EC2, especially with virtual private clouds (VPC). Nearly every resource on AWS is connected to a VPC and a subnet, either directly or with at most one hop. A VPC is a logically isolated network which separates your AWS resources from the rest of AWS, while subnets are tools for finer control of how these resources communicate with each other, and with the internet. Your account comes with a default VPC; if you don't supply the VPC argument, the resource will be created in this default VPC. Here is how to get the default VPC's ID:

DEFAULTVPCID="$(aws ec2 describe-vpcs \
    --filter "Name=isDefault, Values=true" \
    --query "Vpcs[0].VpcId" --output text)"

As you can see, there is no separate namespace for VPC subcommands; they are in the EC2 namespace. Also, we used the --query argument, which can be added to any command to print a specific part of the response JSON. Here we use it to print the ID of the new network; we also pass it the option --output text to get the ID as simple text instead of a JSON string. Talking of the default VPC is a bit misleading; it's more like the default networking infrastructure, as there are a couple of other things attached to this VPC that make it special. The first part of this structure is the subnets. We can print the subnets of the default VPC with the following query:

aws ec2 describe-subnets --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"

This should print a number of subnets; in my case it's 3. One field of significance is the AvailabilityZone. It should be easy to see that each subnet has a different value, but they are all in the same region (for my region eu-central-1, the availability zones are eu-central-1a to 1c). A VPC created in a region will logically span all the availability zones (AZ) in that region. A subnet, on the other hand, is specific to a single AZ. You can also list the network interfaces, which are the entities through which computational resources connect to the network, with the following command:

aws ec2 describe-network-interfaces --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"

One situation where this command comes in handy is figuring out which resources to first delete when you are trying to delete a VPC. When it comes to the dependency graph, VPCs are pretty much at the top of the (top-down) tree. You cannot delete them unless all the other, non-default resources are also removed or detached.

Creating, connecting and instantiating resources in VPC and subnets

If you want to create multiple isolated resource groups, keep control over which resources can access which others, and generally understand how to connect various other AWS things like RDS databases, you will need to deal with VPC's. Let's begin this process with creating one such VPC:

VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
    --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo

The --cidr-block required argument specifies what IP range will be valid within the VPC. This uses the CIDR format, with the suffix /16 specifying how many bits from the beginning constitute the network mask; our VPC will be able to hand out and route between IPs from 10.0.0.0 to 10.0.255.255, that is, 256*256 = 65536 IPs in total. Once this command runs, you should see two results in the output of aws ec2 describe-vpcs: The default VPC, and the new one you created just now. You can also see the new VPC in the list of resources for our new resource group with the command aws resource-groups list-group-resources --group-name DemoEnvironment. A VPC is not enough information for AWS to figure out the networking topology, however: We need a subnet. The subnet needs to have a CIDR block that's a subset of the VPC's. Now let's create one with the following command:

SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID \
  --cidr-block 10.0.1.0/24 \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo

As you can see in the --cidr-block argument, this subnet covers IPs in the ranges from 10.0.1.0 to 10.0.1.255, which is a part of the IPs covered by the VPC. Once we have the subnet, we can go ahead and create our first EC2 instance attached to it. In order to do so, we first need the ID of a proper AMI. I used the following command to list the official Ubuntu AMI's, and picked the newest one:

AMIID=$(aws ec2 describe-images \
  --filters "Name=root-device-type,Values=ebs" \
  "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*" \
  "Name=architecture,Values=x86_64" \
  --query "reverse(sort_by(Images, &CreationDate)) | [?! ProductCodes] | [0].ImageId" \
  --output text)

The reason for the complicated query argument is that we don't want the AMIs that are in the AMI marketplace, and one needs to pay for, or agree to license for. Now let's start an EC2 instance with the AMI the above command picked (as of 11.08.2019, this is ami-0ac05733838eabc06):

aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' --subnet-id $SUBNETID

This instance gets loaded with the SSH key that we uploaded earlier, named brand-new-key. It also gets the same environment tag, but with the convenience of adding it in the creation command, making a second command unnecessary. The --subnet-id argument specifies which subnet the networking interface should connect to. If we hadn't specified this, a subnet in the default VPC would have been picked. We now have a functioning VM, whose status we can query by listing through the resource group, and querying for the instance ID:

INSTANCEID=$(aws ec2 describe-instances \
  --filter "Name=tag:Environment,Values=Demo" \
  --query "reverse(sort_by(Reservations, &Instances[0].LaunchTime)) | [0].Instances[0].InstanceId" \
  --output text)

The query part of this command is again relatively complicated. The reason is that, if you create a couple of VMs and terminate them, they will still appear in the list of VMs when searched by tag. That's the reason we pick the VM that was last launched. When you create an instance and would like to know when it is actually running, you can use the handy wait feature, as follows:

aws ec2 wait instance-running --instance-ids $INSTANCEID

Now if we run the command to list group resources, we should see three entries: A VPC, a subnet and an instance. If you inspect the EC2 instance with aws ec2 describe-instance $INSTANCEID, you can see a couple of fields that are interesting. There's the ID of course, and PrivateDnsName, but peculiarly no public IP or DNS. This is because the subnet was not configured to give this instances an IP address on launch; you can see that this is so in the MapPublicIpOnLaunch field of the subnet we created, which is false. The instance we created is in a vacuum, as far as we are concerned, and cannot be contacted from anywhere. You can also see this by right clicking on the instance in the web GUI, and clicking connect. AWS will ask you to pick a method out of SSH client, web SSH client, or Java SSH client. Interestingly, the first of these shows the private IP of this instance (something like 10.0.1.12), which is in the reserved range and cannot be used for internetworking. If you pick the second option, you will see an error message telling you that the instance does not have a public IP.

Opening a subnet to the outer world

We need to modify and extend our basic subnet in two ways in order for the instances connected to it to communicate with the internet. The first is a gateway. An internet gateway acts as a target for internet-routable traffic, and takes care of NAT (Network Address Translation). You should not confuse an internet gateway with a NAT gateway: The latter is used to connect instances in private subnets to the internet, while they are still unavailable to traffic from the outside. The default VPC has an internet gateway, as you would expect:

DEFAULTGATEWAY=$(aws ec2 describe-internet-gateways \
  --filters "Name=attachment.vpc-id,Values=$DEFAULTVPCID" \
  --query "InternetGateways[0].InternetGatewayId" --output text)
echo $DEFAULTGATEWAY

This should print the ID of the gateway used by the default VPC. A gateway is not automatically created for a VPC, however. Our new VPC is lacking one, which we can see using the following command:

aws ec2 describe-internet-gateways --filters \
  "Name=attachment.vpc-id,Values=$VPCID"

This should return an empty list. We can create a brand new gateway for our VPC with the following commands:

GATEWAYID=$(aws ec2 create-internet-gateway --query \ 
  "InternetGateway.InternetGatewayId" --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID \
  --internet-gateway-id $GATEWAYID

Now we have a gateway that is attached to our VPC. The next thing we need is a means for the networking logic to route the requests that are meant for the internet through this gateway. This is the job of the route table. Every VPC comes with a default route table (see here for details). We can see how these rules look by first looking at the settings for the default VPC and its subnets:

aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$DEFAULTVPCID"

In the Routes entry of the resulting output, you should be able to see two entries. The first of these has the field DestinationCidrBlock set to 172.31.0.0/16, which is the CIDR of the VPC itself (you can verify this with the command aws ec2 describe-vpcs --vpc-id $DEFAULTVPCID --query "Vpcs[0].CidrBlock"). The GatewayId of this rule is local, meaning that it will route traffic locally. The second rule has 0.0.0.0/0 as DestinationCidrBlock, and its GatewayId is equal to the DEFAULTGATEWAY. Since the rules in a routing table take precedence in order of specifity, this second rule will be valid for all requests that are not meant for the VPC IP range. Since, as mentioned above, every VPC has a route table, we do not need to create a new one, and can instead modify the existing route table:

ROUTETABLEID=$(aws ec2 describe-route-tables \
  --filter "Name=vpc-id,Values=$VPCID" \
  --query "RouteTables[0].RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID \
  --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID

With the last create-route command, we are telling the network to route requests that are not to an interface in the VPC to the gateway defined by the GATEWAYID. As we are modifying the default route table, there is no need to explicitly associate the route table with the subnets of the VPC which we want to make public, because in the absence of explicit associations, subnets use the default route table. This association is also not displayed in the result of aws ec2 describe-route-tables, which is the reason we cannot demo it for the default network. If it were the case that we were creating a new routing table, however, the following command would have been necessary for such an association:

aws ec2 associate-route-table  --subnet-id $SUBNETID \
  --route-table-id $ROUTETABLEID

One last step is necessary to make sure that the instances we start in the subnet are getting public IPs. The following will modify the subnet to make sure that is the case:

aws ec2 modify-subnet-attribute --subnet-id $SUBNETID \
  --map-public-ip-on-launch

Normally (as in, in most cases, and definitely for the default VPC), an instance that gets a public IP address is also given a public DNS; this public DNS of an instance can be queried through the PublicDnsName field. Sometimes, however (the documentation is not clear on when and how), the relevant fields on the VPC are not set properly on creation. In order to make sure that your instance gets not only an IP address but also a DNS, you should to set the proper configuration values with the following commands:

aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support

As far as I can understand from the documentation, it is not possible to attach a public IP to a running instance from the subnet pool. You can use an Elastic IP, but that's out of scope for this post. Instead, we will simply delete the running instance, and create a new one:

aws ec2 terminate-instances --instance-ids $INSTANCEID
INSTANCEID=$(aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' \
    --subnet-id $SUBNETID --query "Instances[0].InstanceId" --output text)
aws ec2 wait instance-running --instance-ids $INSTANCEID

Let's check whether our instance now has a public IP address and DNS:

IPADDRESS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicIpAddress" --output text)
PUBLICDNS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicDnsName" --output text)

IPADDRESS should now be a proper IP address, and PUBLICDNS should be a URL that resolves to that IP address. Since we already waited for the instance to start, you can, at least in principle, contact it via SSH with ssh ubuntu@$IPADDRESS or ssh ubuntu@$PUBLICDNS. If you try this now, however, you will again face an empty line, without a response from the new server. The reason for this silence is that the default security rules do not allow inbound traffic to this instance. AWS security groups are means of controlling the traffic between EC2 instances and the internet. A new VPC has a default security group, which also has default rules. These default rules allow all outgoing connections (and the incoming responses these cause), and all connections between instances in the same security group, but nothing else. Since we did not create a new security group (which we could have done with aws ec2 create-security-group), the new instance has been automatically connected to the default security group of the VPC. All is not lost, though: If we change the rules for the security group, it will be instantly applied to any new requests. Let's modify the security group rules, and allow TCP connections from all IP addresses on the default SSH port:

SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 22 --cidr 0.0.0.0/0

See here for more on security groups. Now you should be able to access the VM on the public IP address or DNS.

Cleanup

Cleaning up is relatively straightforward if you have access to the shell session with the variables that store the resource IDs. Remove all the AWS resources we created with the following commands:

aws ec2 terminate-instances --instance-ids $INSTANCEID
aws ec2 delete-key-pair --key-name brand-new-key
aws ec2 detach-internet-gateway --internet-gateway-id $GATEWAYID \
  --vpc-id $VPCID
aws ec2 delete-internet-gateway --internet-gateway-id $GATEWAYID
aws ec2 delete-subnet --subnet-id $SUBNETID
aws ec2 delete-vpc --vpc-id $VPCID
aws resource-groups delete-group --group-name DemoEnvironment

You have to delete resources in this order, otherwise AWS will tell you that dependencies are being violated. If you don't have access to the IDs, you can either query the individual elements via the CLI using the Environment tag, or copy the IDs from the result of aws resource-groups list-group-resources. Unfortunately, as mentioned above, there is no easy command to delete all resources in a resource group. Even worse, there is no way to delete resources by ARN, which is the identifier output of this last command.

Conclusion

The AWS CLI client is, as one would expect from the company that builds AWS, a solid piece of software. As you might have noticed from the command examples, there are some inconsistencies, such as differing names for the same kinds of arguments, or the issue with tags, but I think this is the least one would expect from a client that has to cover such a massive base of functionality. In the second part of this tutorial, we will be looking at creating a Fargate cluster using the CLI. The requirements will get more complicated as we try to create a scalable, decoupled application, and we will use many other AWS services to tackle them.

An Introduction to Cython, the Secret Python Extension with Superpowers

Ulaş Türkmen — Thu, 21 Feb 2019 14:09:55 GMT

Cython is one of the best kept secrets of Python. It extends Python in a direction that addresses many of the shortcomings of the language and the platform, such as execution speed, GIL-free concurrency, absence of type checking and not creating an executable. It is a mature tool with a number of widely used packages that are written in it, such as spaCy, uvloop, and significant parts of scikit-learn, Numpy and Pandas. It smoothly hooks into the latter two, giving you access to underlying data structures in a straightforward way. All these superpowers come with the baggage of certain parts of C, however, which makes becoming proficient in Cython a bit steep for those who don't know C. In this tutorial, I will give an overview of working with Cython, focusing on the parts of C that are relevant.

The requirements for running the code samples in this tutorial are Python 3 (preferably 3.6), a C compiler (GCC or Clang should do), and virtualenv or pipenv. Once you have these, getting Cython is as easy as creating a pipenv or a virtualenv, and within that environment, running pip install Cython.

What is Cython, and what is it not?

Cython is built on the fact that Python the language (or at least the CPython implementation of it) is built on top of a C API that is instrumented through an intermediate language. The way CPython works is by compiling Python code into a bytecode representation, and then executing the result on the virtual machine at runtime. Individual instructions of this bytecode consist of an opcode and a reference to any arguments. You can see the existing opcodes in the file opcode.h. In the main evaluation loop of the runtime, the opcode of an instruction is used to determine what to do next, in the form of one big switch statement. For example, if the opcode is BUILD_LIST, specifying the construction of a new list, the PyList_New function is called with the appropriate arguments. Obviously, there is a great deal of work the Python runtime is doing around this evaluation loop, such as garbage collection and error handling. Another large piece of the runtime work pertains to the dynamic nature of Python, where the actual methods to run have to be figured out at runtime, based on the object structure available. For example, when you try to multiply two numbers, Python has to figure out whether these are floats, integers etc., or the multiplication operator has been overloaded as with the string type. In many other languages, explicit type information helps the compiler figure this out at compile time, leading to faster compiled code. This distinction is known as early vs. late binding, and is the source of one of the major performance gains one can achieve by using Cython.

Cython makes use of the architectural organization of Python by translating (or 'transpiling', as it is now called) a Python file into the C equivalent of what the Python runtime would be doing, and compiling this into machine code - this can be a Python extension which can be dynamically loaded, or an actual executable. The resulting module makes calls to the Python runtime in order to deal with things like above mentioned dispatch, which means that straightforward Python code will not be executing much faster than it would anyway. The speed difference becomes significant when you code using Cython-specific constructs that are transpiled directly to their C equivalents, thereby avoiding the Python runtime. We will see what these constructs are in a minute, but let's start with a simple example to get used to working with the Cython toolchain.

Starting off

The first thing you need to do is to install Cython, obviously. Please refer to the official Cython documentation for the installation instructions. You will also need a C compiler; GCC on Linux and Clang on Mac should do. Once you have have these two, we can start with the usual hello world. Save the following in a file named hello_world.pyx (or simply clone and use the sample code repository):

def say_hello():  
    print("Hello world from Cython!")

And then execute the command cython hello_world.pyx. You should end up with a file named hello_world.c. This file can be compiled into a shared library with the following command:

gcc -shared -fPIC `pkg-config --cflags python-3.6m` hello_world.c -o hello_world.so

Your Python extension should now be the ready, in the form of a file named hello_world.so. You should be able to drop into a Python shell in the same directory with this file, import it with a simple import hello_world, and then run hello_world.say_hello(), the output of which should be "Hello world from Cython!". Voila, your first Cython extension.

Now let's have a quick look at the hello_world.c file. The file is pretty big, and fortunately, you don't need to understand any of it to work with Cython. Still, it's interesting to have a look at what Cython did to your Python code. Go ahead and search for Hello world from Cython in this file (there is a much easier way to compare the generated code with the original, which we will see in a minute). You will see that Cython has generated C functions for the high-level hello_cython and the inner print functions, and also annotated these by marking in comments what they correspond to in the original code. The call to print has been moved into a separate function (called __pyx_pf_11hello_world_hello_cython on my computer). It makes a call to __Pyx_PrintOne, which is a wrapper for the Python print, and deals with various error conditions and return values using C functions and macros.

Easier compilation of Cython extensions

There is a much easier way to turn Cython code into native modules, involving the distutils core module that is responsible for building and installing modules in Python. This will allow us to delegate the trans- & compilation to Cython and distutils, and be able to import the file like a normal Python module. In order to do so, you need to create a setup.py file with the following contents in the same directory as hello_cython.pyx:

from distutils.core import setup  
from Cython.Build import cythonize  
setup(ext_modules = cythonize("*.pyx", annotate=True))

The role of the annotate argument will be explained in a bit. After running the command python setup.py build_ext --inplace to build the extension module, you should be able to import hello_cython and call hello_cython.say_hello(), getting the same result as above. From now on, whenever you make a change to a Cython file, you need to run the above command, which will build all the Cython files that have changed.

Python is valid Cython

As you can see with the hello world example, Cython accepts and processes correctly the Python you are used to writing everyday. It was mentioned above that there is an easy way to view the C code generated by Cython from the input, and also that the annotate argument would be explained later. If you compiled hello_world.pyx using distutils as explained above, you should now see a file named hello_cython.html in the same directory. This file contains a visual display of the Python code Cython has transformed, with the lines that require access to the Python interpreter colored in yellow. The darker the background yellow of a line, the more Python runtime interaction it contains. There is also a plus sign to the left of every yellow-tinged line, expanding to the C code that was generated by Cython as translation. Generally, the aim when converting Python to Cython for purposes of optimization is to decrease the number of yellow lines, or at least lighten the hue of their yellow. This will ensure that your code is as close as possible to a C version, and thus –probably– faster.

The simple things: Variables, Functions, Loops

Now let's take some Python code that is somewhat slow, turn it into Cython, and make it faster by annotating it with Cython-specific statements, decreasing the amount of yellow in the annotation. We will use this approach to write a fast version of the Sieve of Erastothenes, with which one can find the prime numbers up to a certain limit. Here is an implementation in pure Python:

import math

def sieve(up_to):  
    primes = [True for _ in range(up_to+1)]
    primes[0] = primes[1] = False
    upper_limit = int(math.sqrt(up_to))
    for i in range(2, upper_limit+1):
        if not primes[i]:
            continue
        for j in range(2*i, up_to+1, i):
            primes[j] = False
    return [x for x in range(2, up_to+1) if primes[x]]

Let's find out how fast this code is, by saving it in the file sieve_python.py, and benchmarking it with the handy timeit module, as follows:

python -m timeit "import sieve_python; sieve_python.sieve(200000)"

On my computer, the output is 10 loops, best of 3: 27.9 msec per loop. Admittedly, this does not look like a slow way of computing 17984 primes (computers are freaking fast), but it already gives us a good starting point.

Small note on timeit: The timeit module will run the piece of code you are timing in loops that increase in number of steps, until execution time exceeds 0.2 seconds. So don't be surprised if different attempts lead to different number of loops. From here on, I will omit the complete output, and report only the best time for each timing run. Also, in the following, benchmarking and timing will be used to mean the same thing, namely measuring the running time of a piece of code.

Small note on benchmarking: Since building the extensions and benchmarking a module is an operation we will repeat frequently in the following, I have added a bash script named benchmark.sh that does this for you. It accepts the name of the module as its single argument, as in ./benchmark.sh sieve_python.

First attempt at Cythonizing

Now let's copy the contents of sieve_python.py to a Cython file named e.g. sieve_naive.pyx (also available in the sample code repo), build it as an extension, and benchmark it with ./benchmark.sh sieve_naive. On my computer, this leads to an average runtime of 17.2 msec, which is already an improvement over the pure Python version. Still, an improvement of 40% is not really worth our efforts. If we have a look at the annotation file in sieve_naive.html, we can see that there is Python runtime interaction pretty much on every line, except for the line with continue, which is a keyword in C, too. In order to optimize the naive Cython code, we need to convert all of these lines to Cython-specific code that would be translated to pure C. Without further ado, here is the properly cythonized version, available as sieve_cython.pyx in the sample repo (discussion of the changes will follow):

from libc.math cimport sqrt  
from libc.stdlib cimport malloc, free

def sieve(up_to):  
    cdef bint *primes = _sieve(up_to)
    response = [x for x in range(up_to+1) if primes[x]]
    free(primes)
    return response

cdef bint *_sieve(int up_to):  
    cdef int i, j
    cdef bint *primes = malloc((up_to+1) * sizeof(bint))
    for i in range(up_to+1):
        primes[i] = 1
    primes[0] = primes[1] = False
    cdef int upper_limit = int(sqrt(up_to))
    for i in range(2, upper_limit+1):
        if not primes[i]:
            continue
        j = 2*i
        while j < up_to + 1:
            primes[j] = False
            j += i
    return primes

When we benchmark this new version with ./benchmark.sh sieve_cython, we achieve a considerable improvement in performance: 4.24 msec, which is 6.5 times faster than the Python version. Now let's dive into the Cython features we used to achieve this speed-up.

1. Interface separation

Cython offers two fundamental ways of defining functions: With the usual def (Python functions) vs. with cdef (C functions). The common pattern of organizing Cython code is separating the interface and computation functions, adn then writing them as Python and C functions respectively. Python functions have the exact same facilities as in pure Python, and are accessible from the Python runtime. They are mostly responsible for any conversion of arguments & return values to and from C types, in addition to Python-specific things like exception handling. You can nevertheless use type-annotated variables as arguments or within them (such as the cdef int *primes above). C functions, on the other hand, which are declared using the Cython-specific cdef keyword, are transpiled directly to C functions. These functions (from here on called cdef functions) can be used exactly in the way Python ones are declared (optional arguments, Python objects as argument and return types etc.), but these lead to runtime interaction. In order to circumvent this, cdef functions can be annotated with types in the signature and in the variables used. The _sieve function from the above code example, for example, does not accept, return or process any Python objects; all arguments and variables are annotated with C types, as well as the return value.

cdef functions have a couple of oddities you need to consider, however. The default return type of cdef functions is Python object, and they will convert whatever is returned into one. It therefore makes sense to declare some return type to avoid Python runtime interaction. Also, within them, rules of the C world are valid, meaning that integer division and overflow function differently. Especially if your code is doing intensive numeric calculation, you should watch out for these catches.

A look at the annotation file for sieve_cython.html is rather revealing. In sieve_cython.html, you can see pretty clearly that the _sieve function has been completely converted to C, with no yellow lines. The four-line sieve function, on the other hand, is yellow except for the call to free. If you expand the third line of this function (line 6 in the whole file), you can get a very good view of what Cython is doing in the background. The one-line Python call to create a new list from a comprehension on a range calls has led to 58 lines of error handling, resource management and Python interaction. The above mentioned PyList_New, for example, is called to create a new list.

There is a third type of function you can define with cpdef instead of cdef, which is a hybrid between Python and C functions. In the background, Cython will actually define two functions, a Python and a C one. Calls from Python runtime will be directed to the Python function, whereas calls from within Cython-generated code will be directed to the C function. Therefore, you will be getting the benefit of C optimization when your code is called from other C functions, whereas the Python function will still be available as an interface.

2. Type annotations

The type annotations used in Cython are relatively straightforward, especially if you are acquainted with C types. Any C type can be used as a valid type, including pointer types. Variables with C types have to be defined using the cdef keyword, which can be done in both kinds of functions we have seen. One special type used also in sieve_cython.pyx is bint, which is a normal int in C code, but is converted to the Python boolean when necessary. If a variable is defined in the usual Python way without type information, it's assumed to be a Python object. When you build a Python object out of C types, as on line 9 of sieve_cython.pyx, Cython will do certain kinds of type conversions on the fly, without asking for more information. Similar conversions are possible also the other way around, from Python to C. You can see which conversions are possible, and how they work in the documentation on type conversions.

3. C libraries

Instead of importing and using the Python math module, sieve_cython.pyx uses the C math library, which makes type conversions unnecessary. Cython facilitates this step by providing the C libraries as Python-like imports, as in from libc.math cimport sqrt. The same is valid for the dynamic memory management routines malloc and free, which are discussed next.

4. Memory management

In C-land, memory demands much more of the programmer. The difficulty arises from the interaction of dynamic memory management and the stack vs. heap distinction. Data allocated on the stack is automatically managed, i.e. removed when the reference goes out of scope. If you want to keep a reference to a piece of data after its initial context is gone, however, you will need to allocate it using the notorious malloc, and then return the memory back with free. In our case, we need to keep a reference to the primes list after _sieve is done; this is the reason it is created using malloc, which takes as input the amount of memory that needs to be set aside, and returns a pointer to it. The topics of dynamic memory management and pointers are very C-specific, but they are relatively straightforward, so any decent book on C should provide enough information to set you up for their use in Cython.

5. Loops

Python has numerous facilities for patterns that are handled using a single tool in C, the for loop. In the pure Python implementation of the sieve, we have three loops: A list comprehension (line 7 of sieve_naive.pyx and two for-in loops, all three with a range call. In C, all of these have to be written using a for loop with auxiliary variables and all, and Cython does its best to convert them properly, both the loop and the range call. In the very first case, we don't really care that much about what the code is converted to, as it's an interface function, and a list comprehension is mixed with type conversion and range, but I would like to note that the call to range is of a simple form, with a start and an end index. The second call to range (on line 10) is also of the same form. This form can be converted relatively easily into a for loop by Cython. The last use of range, however, involves a step argument (on line 13), which means that it would require Python runtime interaction, which we want to avoid. For this reason, in the Cython-optimized version, this third call, together with the for loop, has been turned into a while loop with explicit loop counter increment to achieve the same functionality.

Defining and Collecting New Types with Cython

The above example is relatively straightforward, since it uses only built-in
Python and C types. To delve into more complex use cases, I will use as an
example the 8-puzzle that is a simple task used to demonstrate search
algorithms. The 8-puzzle involves integers from 1 to 8 placed on a grid, with
one empty cell. Moves can be made by sliding one number vertically or
horizontally to the empty cell. The board is considered "solved" if the numbers
are in order, and the last cell is the empty one. In the sample repository, you
can see the solutions I already wrote in Python and C. Both require a start
state as the single argument, for which there is a sample in the file
state.txt. In both implementations, state of the puzzle board is represented as a 2-dimensional array of integers, with 0 representing the empty cell.
Breadth-first search is used to find a sequence of moves that solves the puzzle,
printing this sequence in reverse. The set of board states that were already
seen is represented as a trie, as membership in this set is checked frequently,
and must be approriately fast.

Let's start off with benchmarking. As we are comparing an executable with a
Python script in this step, the timeit module won't cut it; we will have to
resort to a more general solution. The most comprehensive Linux tool for this
purpose is perf, but I should warn you that on Ubuntu it depends on kernel
tools specific to the kernel version, which caused quite some headache for me,
so proceed at your own risk. With that out of the way, here is how to benchmark
the python script:

    perf stat -r 10 python eight.py start.txt > /dev/null

On the last line, I can see that it took 0.201 seconds on average on my
computer. I also gave pypy a try for completeness' sake, simply replacing
python with pypy in the above command, which resulted in 0.360 seconds on average, surprisingly slower than Python itself. The C code delivers the
goodies, however: When benchmarked, the average runtime with the same input is 8
msec, nearly 30 times faster than Python. The question is now whether we can get
close to it using Cython.

First iteration of eight puzzle in Cython

As with the previous example, we will start by simply copying the Python solution to eight_cython.pyx, and using it as a starting point. In order to get this file compiled into an executable, we need to use the cython CLI command instead of the extension building mechanism, and pass it an extra argument, as in cython eight_cython.pyx --embed. Don't forget adding --annotate to the mix if you want the annotation file. The resulting eight_cython.c file can be compiled into an executable with the following command (given that you have Python 3.6 and the relevant development package installed):

gcc `pkg-config --cflags python-3.6m` eight_cython.c -lpython3.6m -o eight_cython

When benchmarked, this file already leads to a significant improvement, clocking at 200 msec, 23 msec less than letting the Python runtime execute the script. There is a lot to be done to speed this up, however, which I went ahead and did already. You can see the results in the file eight_cython.pyx. This version is considerably faster: With an average runtime of 43 msec, it is five times faster than the pure Python version. There were two major improvements that were used to achieve this speedup, explained in the following.

1. Class vs Struct

The main data structure in the Python version is the State class that wraps the integer array and provides the necessary methods for generating children, checking final status etc. Cython can definitely deal with this class, as we saw in the naive attempt, but it will be approximately as slow as the Python version, so we have to convert it to something else. We have two options what this something can be: C structs and extension types. The former is the plain old C struct that can be declared in Cython using the cdef struct construct. Here is an example from eight_cython.pyx:

cdef struct BoardPosition:  
    int row
    int column

Extension types, on the other hand, are a Python-runtime related construct. Their instances are proper Python objects, managed by the Python garbage collector, and they can be subclassed in the usual way by other extension types or Python classes. You cannot add new attributes at runtime, however, as these are stored directly on the C struct that represents an instance object. The attributes are also not accessible to other classes or functions unless explicitly declared to be so. The way special methods of extension types function also differs in subtle ways; you should check in the list whether your expectations are met before you implement one. In eight_cython.pyx, the State class is declared as an extension type using the cdef keyword, and it has the attributes board, _zero_index and parent:

cdef class State:  
    cdef int **board
    cdef BoardPosition _zero_index
    cdef public State parent

As you can see, the board 2-dimensional array is declared exactly the way in C, but the reference to the parent State is not a pointer, as it would have been in C. Cython manages references to Python objects (extension types or normal classes) as pointers in the background; you don't have to declare them as such. The State extension type has methods defined with either def or cdef; the same conditions as above are valid, but you cannot declare any special methods (those wrapped in double underscores) with cdef. Also, it is not possible to turn cdef methods into properties with the usual @property decorator (as would have been convenient with the zero_index method).

One special method of State, __cinit__ is worth special attention. This method is called once when an extension type is allocated, and it is the place where code that allocates any further C data structures belongs. In our case, the board array is allocated here. There is a corresponding __dealloc__ special method where you can free resources allocated in __cinit__. This is also implemented in State for completeness sake.

2. C arrays versus lists

In the State.children method, we need to collect structures for representing the two following things:

Possible ways of swapping positions on the eight board, depending on where the empty cell is (as BoardPosition)
Child states that result from these swaps (as State).

When it comes to storing such structures, the simplest option is using the tried and proven Python list. Conveniently, basic Python collection types (list, dict, tuple and set) can be used as a type in cdef functions. The problem with the list structure, however, is that it leads to Python runtime interaction, and is accordingly slow. Therefore, in performance-criticial situations, it is advisable to use C arrays instead. It is the bread and butter of C programming to allocate arrays of structs and iterate over these in every which way possible, and it is not any more difficult in Cython to do so; you can see how it is done with the array of BoardPosiion structs in the State.children method. The situation is more complicated with the child states, however, as these are represented with the State extension type. Since extension types are managed by the Python garbage collector, and they can be referred to only as pointers to Python objects, their storage in arrays as C pointers is rather complicated, involving casts and calls into the reference counting mechanism of Python. For this reason, in order not to complicate the code too much, I opted for the simpler list type in the first iteration. There will later be a more detailed discussion of storing pointers to extension types in arrays.

3. Using DEF to declare a constant

Cython allows C-style constants with the DEF directive. As with C, any values defined this way are replaced within the code at compile time. you have to be careful, though, as only the basic types int, long, float, bytes and unicode can be declared as constants. The number of rows and columns on the eight board is declared in eight_cython.pyx as a constant with DEF SIZE = 3. Cython also allows conditional compilation with the IF directive.

Profiling Cython

Comparing the performance of the C version (9 msec) with eight_cython.pyx (43 msec), we can see that there is still room for improvement to reach C level performance. This is also obvious in the annotation file, which is deep yellow in many parts. To find out which bottleneck to tackle next, however, the annotation file is not enough, as it shows us only how much Python runtime interaction a line of code causes, and not how much it contributes to the total runtime. Profiling is what we need, and fortunately, Cython makes this extremely easy. You need to only add the following compiler directive to the top of a pyx file to make Cython generate profiling data which can be processed by the standard Python profilers:

# cython: profile=True

Another difference to normal operation is that you need to run the code under profiling as a module, and not from the command line, which means that the code has to be compiled as a module with the usual python setup.py build_ext --inplace. Now we are ready to profile our code by starting a Python shell and running the following:

import cProfile  
import eight_cython  
cProfile.run("eight_cython.main('start.txt')", sort='time')

And here is the output I have received:

     403088 function calls in 0.097 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    33453    0.018    0.000    0.022    0.000 eight_cython.pyx:59(add_or_get_child)
    44935    0.017    0.000    0.021    0.000 eight_cython.pyx:67(get_child)
    1    0.009    0.009    0.097    0.097 eight_cython.pyx:202(search)
     3717    0.008    0.000    0.012    0.000 eight_cython.pyx:163(children)
     6413    0.008    0.000    0.037    0.000 eight_cython.pyx:88(contains)
    44935    0.008    0.000    0.029    0.000 eight_cython.pyx:67(get_child (wrapper))
[..snip..]

The runtime increased a little due to the profiling overhead, but we now have ample information on which functions are causing the most execution overhead, namely the get_child and add_or_get_child methods of the TrieNode extension class. This is not surprising, as this code, used for checking whether a state has already been seen, is guaranteed to be executed for every evaluated state. On top of that, it uses the Python list type, leading to huge overhead in interaction with the runtime. This can also be seen in the annotation file in which the two TrieNode methods are a very deep shade of yellow. If we want to speed up our code, we need to tackle the way the board trie is built and used; we will now see how.

Replacing extension types with C structs

The difficulty we are facing in the optimization of the TrieNode class is that its methods need to return lists of TrieNode instances. As already mentioned, interaction with the Python list class requires a lot of runtime interaction, so the way to speed up list iteration is to get rid of it, and use C arrays instead. There are two ways of going about this. The first is turning TrieNode into a C struct, and storing pointers to instances of it in the array. This is the path we will take, and we will have a look at the alternative, using pointers to Python objects, later. The file eight_cython_improved.pyx contains the implementation where TrieNode is now a struct, defined as follows:

cdef struct TrieNode:  
    int value
    TrieNode **children

Code that used to be methods on the TrieNode cdef class is now packed into individual functions, all prefixed with trie, and accepting a pointer to a TrieNode as the first argument. This is a common pattern of organizing C code, somewhat similar to object oriented coding. If you compare this new trie code with the implementation in eight.c, you can see that the Cython version is pretty much a line-by-line translation, except for the semicolons, and with an extra cast of the malloc return value, which Cython needs for typechecking. Another C pattern that is used in both eight.c and this improved Cython version is marking the end of a list with a NULL value. The end of the list of children of a TrieNode is marked with NULL, a keyword in Cython that is synonymous with its meaning in C.

This improved version benchmarks at 33 msec, shaving off 10 more msec from the previous version. Is there any other big win we can score, as with the trie optimization? Once more, the answer lies in profiling. Here is the result of profiling eight_cython_improved.pyx:

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.009    0.009    0.054    0.054 eight_cython_improved.pyx:206(search)
  3717    0.008    0.000    0.013    0.000 eight_cython_improved.pyx:167(children)
  6413    0.006    0.000    0.011    0.000 eight_cython_improved.pyx:93(trie_contains_board)
 44935    0.005    0.000    0.005    0.000 eight_cython_improved.pyx:77(trie_get_child)
 33453    0.005    0.000    0.009    0.000 eight_cython_improved.pyx:64(trie_add_or_get_child)
  3717    0.005    0.000    0.013    0.000 eight_cython_improved.pyx:86(trie_add_board)

There are two more bottlenecks that promise significant improvements: the search function, and the children method of State. These two are connected, however, in that the children method returns a list, and the search function iterates over its result. According to the annotation file, this iteration is the most intensive interaction with the Python runtime in search. We might achieve further speedup by doing the same kind of refactoring we did with the TrieNode structure, namely by turning State into a struct and its methods into functions that accept a State as an argument. The template is already available in eight.c, but instead of this easy fix, I wanted to give the alternative a try, namely using a C array to collect pointers to Python objects. The result is in the file eight_cython_improved.pyx. As you can see there, a ChildSet struct is necessary to return the number of children and the actual pointers to the children:

cdef struct ChildSet:  
    PyObject **children
    int count

The PyObject is the C struct that Python uses to keep track of objects; you can use it in Cython to refer to arbitrary objects. It is your responsibility, however, to manage reference counts when you create a reference to an object, as with the pointer in the child_set.children array on line 198. This is the reason for the call to Py_XINCREF on line 196. PyObject, Py_XINCREF and Py_XDECREF are imported from the same package with the following line at the top of the file:

from cpython.ref cimport PyObject, Py_XINCREF, Py_XDECREF

Py_XDECREF is later used to decrement the reference count, on line 233. Another thing explicitly done in a number of places is casting between the objects and pointers. When we are adding the child State object into the array on line 198, we are casting it to a PyObject *. Later, when this entity needs to be accessed as a State object again, it is cast back again on line 231. Building and timing the version with object pointers, I get an average runtime of 32 msec, which is the same with the version with extension types in Python lists. Apparently, the load of the casts and reference counting balance out the wins from interacting with the list, leading to zero improvement.

Going Further

The material presented here covers only the basics of Cython. There are numerous other features, explained in detail in the official documentation and some other resources, that allow Cython to interact with other C or Python libraries, generate better optimized code, or let you tune the interaction with the Python runtime to your needs. Here are some resources that can help you get further in general Cython or in specific areas that interest you:

The Basic Tutorial in cython documentation is another good starting point, but it leaves off at a point where confusions arise if you try to write useful code.
The Language Basics page is worth going through before embarking on any significant Cython project, as it touches on all Cython features, with links to further documentation.
Typed Memoryviews allow Cython code to interact with memory buffers of uniform data types, such as Numpy arrays or built-in Python array types. This feature is based on the buffer protocol, the C-level infrastructure that lays out the groundwork for shared data buffers in Python.
Cython also allows for easy and GIL-free parallelism using OpenMP with the cython.parallel package.
The book Cython - A Guide for Python Programmers is an in-depth discussion of Cython, with all the ins and outs and corner cases. I would highly recommend it if you are going to work extensively with Cython.
The Day of the EXE Is Upon Us is an excellent talk given by Brandon Rhodes at PyCon 2014. Among others, it touches upon the complicated distinctions between interpreted and compiled code (surprise: even x86 assembly is interpreted), why Python is slow, and why Cython is incredible.

A Tutorial Introduction to Kubernetes

Ulaş Türkmen — Thu, 05 Jul 2018 09:54:16 GMT

Kubernetes is the hottest kid on the block among container orchestration tools right now. I started writing this post when we decided to go with Kubernetes at Twyla a year ago, and since then, the developments in the ecosystem have been simply overwhelming. In my opinion, the attention Kubernetes gets is completely deserved, due to the following reasons:

It is a complete solution that is based on a fundamental set of ideas. These ideas are explained in the Borg, Omega and Kubernetes article that compares the consecutive orchestration solutions developed at Google, and the lessons learned.
While it is container-native, Kubernetes is not limited to a single container platform, and the container platform is extended with e.g. networking and storage features.
It offers an open and well-designed API, in addition to various patterns that suit differing workflows. The wonderful thing is that there is a very well-governed community process whereby the API is constantly developed further. You have to spend effort keeping up, but regularly receive goodies in return.

In this tutorial, I want to document my journey of learning Kubernetes, clear up some points that tripped me as a beginner, and try to explain the most important concepts behind how it works. There is absolutely no claim of completeness; Kubernets is way too big for a blog tutorial like this.

Starting off

The easiest way to start using Kubernetes is Minikube. If you have an account with a cloud provider, and would like to first figure out the details of running a cluster on their platform, this tutorial will still work for you, as the commands work for any recent version of Kubernetes. See here for details on how to get Minikube running on your computer. In order to manipulate the Kubernetes mini-cluster minikube runs, you need the official CLI client named kubectl, which can be installed following the instructions on this page. You will also need Docker to create and push container images. Install Docker on your computer following the instructions here.

Once you have installed everything, make sure they are all available with the following commands:

kubectl version
docker version
minikube version

You can check whether Minikube is running using the following command, which also tells you whether there is an update available:

minikube status

If minikube is not already running, you can start it with minikube start. Normally, when you install minikube, it automatically configures kubectl to access it. You can check whether this is the case with kubectl cluster-info. Its output should be something like the following:

Kubernetes master is running at https://192.168.99.100:8443

If the IP is not in the 192.168.*.* range, or kubectl complains that configuration is invalid or the cluster cannot be contacted, you need to run minikube update-context to have minikube fix your configuration for you.

How is kubectl configured?

I think it is a good idea to shortly mention how kubectl is configured. Which API endpoints and clusters kubectl accesses are defined in the \~/.kube/config file by default. The file that is accessed can be changed with the KUBECONFIG environment variable, which should specify a list of paths, so if kubectl displays weird behavior whih you suspect might be due to the configuration, don't forget checking whether this environment variable is set. The kubectl configuration file is in the YAML format, like many other things in Kubernetes. It has two top-level keys that are of immediate relevance: contexts and clusters. The clusters list contains endpoint and certificate information for the different clusters to which the user has access. A context combines one such cluster with the user and namespace values for accessing it. One of these contexts is the currently active one; you can find out which by either looking at the config file, or running kubectl config current-context. You can also run kubectl config view command to show the complete configuration. You can limit the data shown to the current context with this command using the --minify option.

Nodes and namespaces

Two basic concepts that are relatively straightforward and can be explained without a lot of context are nodes and namespace. Nodes are the individual units of a Kubernetes cluster, be it a VM or an actual computer. What makes such a unit a node is the kubelet process that runs on it. This process is responsible for communicating with the Kubernetes master, and running the right containers in the right way. You can get a list of the nodes with kubectl get nodes. If you are using Minikube, and didn't do anything fancy with the configuration, there will be a single node. Nodes are not particularly interesting. You as a Kubernetes user will not be doing anything fancy with them, and cloud provisioners all have means of automatically or manually scaling the nodes in a Kubernetes cluster.

Namespaces provide a means to separate subclusters conceptually from each other. If you are running different application stacks on the same cluster, for example, you can organize the resources per app by putting them in the same namespace. A resource created without a namespace specified is created in the default namespace. It's not necessary to use namespaces, but they make certain things much easier, by helping you avoid name clashes, limit ressource allocation, or manage permissions. In case you start working with namespaces, and get annoyed by having to provide the --namespace switch to every command, here is a handy command that will set the default namespace for the current context:

kubectl config set-context $(kubectl config current-context) --namespace=my-namespace

Kubernetes dashboard

Kubernetes comes with a built-in dashboard in which you can click around and discover things. You can find out whether it is running by listing the system pods with the following command:

kubectl get pods -n kube-system

If there is an entry beginning with `kubernetes-dashboard`, it's running. In order to view the dashboard, first run the command kubectl proxy to proxy to the Kubernetes API. The Kubernetes API should now be available at http://localhost:8001, and the dashboard at this rather complicated URL. It used to be reachable at http://localhost:8001/ui, but this has been changed due to what I gather are security reasons.

Using a locally built image with Minikube

In the following tutorial, we will be deploying various container images in order to demonstrate Kubernetes features. Kubernetes uses Docker to retrieve and run container images, meaning that the usual rules of Docker container pull logic apply. That is, for a container image that is not available, if only a name and a tag are provided, Docker contacts the Docker Hub, otherwise hitting the registry in the container name. The aim of this tutorial is to get you to playing around with services running within a Kubernetes cluster as quickly as possible. Hence, the method I would recommend for accessing the container images from minikube is directing your Docker client to the daemon running inside minikube, instead of the local one. Configuring Docker to do so is straightforward with eval $(minikube docker-env). Now, any image that you create and tag will be available inside minikube. You can make sure that this is the case by running docker ps. If the output contains a list of images from gcr.io/google_containers, you are doing it right. This proxy to the docker service in minikube will be valid only in the current shell; you will be back to using the local docker service when you switch to another shell.

If you are not interested in modifying and building the sample services yourself, you can also pull the sample images from my Docker.io profile. It should be enough to replace the kubetutorial prefix in the image tags with afroisalreadyin.

Running a service

Let's start off by running our first command to tell us whether there is anything running on the cluster. We will use the above mentioned kubectl client to do so, running the command kubectl get pods. What pods are will be explained in a second. As long as the client is configured correctly, as explained above, you should see only the message No resources found. What kubectl did was to access the Kubernetes cluster running within minikube as specified by the currently active context configuration and present the resulting information. kubectl is just one among many API clients; there are others, such as this Python client which is the other officially supported one. You can view the API requests kubectl is making by increasing the verbosity of the logging with the --v=7 argument, but careful, this will lead to a lot of textual output.

Kubernetes will not figure out for itself what we need to run, so let's go ahead and tell it to run a very simple application, namely the simple Python application from the Kubernetes demos repository. In order to do so, you need to first clone the repo, navigate to the subfolder simple-python-app, and create a container image by running the following command:

docker build -t kubetutorial/simple-python-app:v0.0.1 .

Once the build runs, you should be able to see it in the list of available images in the result of running docker images. After making sure this is the case, we are finally ready to run our first Kubernetes command, which is the following:

kubectl run simple-python-app \
     --image=kubetutorial/simple-python-app:v0.0.1 \
     --image-pull-policy=IfNotPresent \
     --port=8080

It should be obvious that this command somehow runs the container that we just created, since the tag of the image is passed in with the --image argument. The imagePullPolicy=IfNotPresent argument tells Docker to use an existing local image instead of attempting to pull it. We are also specifying the port 8080 here as the port this deployment is exposing. This has to be the same port the application is binding to. Unless we provide this bit of information, Kubernetes has no way of knowing on which port to contact the application. Small side note: The demo service has to bind to this port on the general interface 0.0.0.0 and not on localhost or 127.0.0.1.

How do we reach into Kubernetes to contact our service? This is the perfect time to introduce the most important abstraction in Kubernetes: The Pod. As with the other abstractions, pods are resources on the Kubernetes API, and we can list and query them using kubectl. Let's see which pods are now running, with the same command that we ran earlier, kubectl get pods. The output should closely resemble the following:

NAME                               READY     STATUS    RESTARTS   AGE
simple-python-app-68543294-vhj7g   1/1       Running   0          21s

Great, we have a pod running. But what is a pod, actually? A pod is the fundamental application unit in Kubernetes. It is a collection of containers that belong together, and whose lifetimes are managed together. These containers are deployed on the same node, their lifetimes are managed together, and they share operating system namespaces, volumes, and IP address. They can contact each other on localhost and use OS-level IPC mechanisms such as shared memory. The decision of what to include in a pod hinges on what serves as a single unit across the dimensions of deployment, horizontal scaling, and replication. For example, it would not make sense to put the data store and the application containers of a service into the same pod, because these scale and are replicated independently of each other. What does belong together with the application container is a container that hosts the log aggregation process, for example.

Now that we know what a pod is, and can figure out the name of our single pod running, we can query it using the kubectl proxy feature we already used above. Once the proxy is running, you can access the simple-python-app container on the port we specified in the previous command by querying the special URL that Kubernetes makes available for this purpose (don't forget changing the name of the pod at the end of the URL):

curl http://localhost:8001/api/v1/proxy/namespaces/default/pods/simple-python-app-68543294-vhj7g

We can also see the logs of our brand new pod with kubectl logs simple-python-app-68543294-vhj7g, which should show the stdout of our application. It is also possible to execute a command within the container, similar to the docker exec command, with kubectl exec -ti simple-python-app-68543294-vhj7g CMD. As with Docker, the -ti bit signals that a tty should be allocated, and the command should run interactively. The kubectl exec command allows you to pick which container to run the command in using the -c switch. When ommitted, the default is the only container in the pod, if there is just one, as per the definition of the pod.

Who created the pod?

It's nice that Kubernetes is running our container inside a pod, but we would still like to know where the pod actually comes from. We didn't tell Kubernetes to create any pods. In fact, pods are rarely created manually in Kubernetes. If that were the case, Kubernetes would not be offering anything new; the user would still be responsible for orchestrating the individual application units, and ensuring their availability. What the above kubernetes run command did was to create a Deployment. This can be seen by listing the deployments:

$ kubectl get deployments
NAME                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
simple-python-app   1         1         1            1           1s

Deployments are one of the special kinds of resources in the Kubernetes world, in that they are responsible for managing the lifetime of application containers. These kinds of resources are called controllers, and they are central to the Kubernetes puzzle. You can get more detailed info about the new deployment with kubectl describe deployments simple-python-app. The describe subcommand is a very useful tool for getting detailed information on all resources. It also lists related resources, and events that concern the described resource. For this deployment, you can see a couple of things in the output of kubectl describe. First of all, there is talk of something called a pod template. This is what is used to create the pods when the deployment is being scaled, i.e. new pods are being created to meet the target.

What happens when we delete the pod? In order to view what is happening in real time, I would advise you to open a second terminal, and run the command kubectl get pods -w in it. The -w switch updates the output in regular intervals. Now, delete the existing pod with kubectl delete pod simple-python-app-68543294-vhj7g. In the output of the pod listing terminal, you should temporarily see a state like the following:

NAME                                 READY     STATUS        RESTARTS   AGE
simple-python-app-5c9ccf7f5d-8lbb2   1/1       Running       0          4s
simple-python-app-5c9ccf7f5d-kl77s   1/1       Terminating   0          43s

So as one pod is being deleted, another was already created (the status might also be ContainerCreating instead of Running. The responsibility for this recreation goes to Replica Sets. You can see the replica sets that belong to a deployment using the above mentioned kubectl describe command; the Replica Sets will be listed at the bottom, before the events. You can see that there are two lists: OldReplicaSets and NewReplicaSets. The difference between the two will be explained later in the context of rollouts. You can also list the replica sets with the kubectl get replicasets command.

Looking at the replica set created by our deployment with kubectl describe replicaset $REPLICA_SET_NAME, we can see at a glimpse a number of relevant rows:

# ... snip
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       pod-template-hash=4035281104
                run=simple-python-app-2
  Containers:
   simple-python-app:
    Image:              kubetutorial/simple-python-app:v0.0.1
    Port:               8080/TCP
    Environment:        
    Mounts:             
  Volumes:              
Events:

This Replica Set is responsible for keeping one Pod with our simple-python-app container running, and it is doing that successfully, judging from the 1 current / 1 desired row. But as with pods, replica sets are intended to be created by Deployments, so you shouldn't have to create or manipulate them manually.

Short excursion on networking

As nice and useful as replica sets are, they not much of a help in terms of high availability. When a Pod goes down, another one is started, and it has a different name, a different IP address, and is possibly running on a completely different node. Also, what if we want to load balance these replicas? If Kubernetes were to offer service discovery only based on pod names, the clients of this service would need to do client-side load balancing, and keep an internal list of pods that need to be updated on every pod lifetime event. What about routing incoming traffic to services (ingress)? These are all pesky issues that need simplification. Kubernetes offers much easier mechanisms to achieve HA, load balancing and ingress. The basis for all this is the networking requirements Kubernetes imposes on the nodes and pods. These are the following:

All containers can communicate with all other containers without NAT (Network Address Translation).
All nodes can communicate with all containers (and vice-versa) without NAT.
The IP that a container sees itself as is the same IP that others see it as.

It is possible to use any one of various networking options that fit this model, with kubenet being the default. The above requirements sound relatively straightforward. One would think that each application container gets its IP. That is not the case, however, as it is not the application containers, but the Pods that get the IP addresses. Or in the words of the documentation:

Until now this document has talked about containers. In reality, Kubernetes applies IP addresses at the Pod scope - containers within a Pod share their network namespaces - including their IP address.

You can also verify that pods can be reached by IP on the exposed port by getting the private network IP address of the container with kubectl get pods -o wide. Afterwards, log on to the Minikube node with the command minikube ssh. From within this node, you can query the service with curl $IP_ADDRESS:8080, which should return the response we have already seen.

How are pods that belong to the same replica set organized, in order to provide high availability, load balancing and discovery? The answer to this question is requires introducing another Kubernetes concept.

Services

I have been calling the tiny web application we have been using for demo purposes a service, but service has a totally different meaning in the Kubernetes world. A Kubernetes Service is an abstraction that allows loose coupling of pods to enable load balancing, discovery and routing. Through services, pods can be replaced and rotated without impacting the availability of an application. Let's start with a very simple example where we turn our simple Python application into a Service, which can be achieved with the following very simple command:

kubectl expose deploy simple-python-app --port 8080

If you now run kubectl get services, you should see a list consisting of two entries: kubernetes and simple-python-app. The kubernetes service is a part of the infrastructure, and you shouldn't meddle with it. The other service is what we are looking for, especially the IP address, which is listed under the column CLUSTER-IP. We are interested in this IP address because it is something special. It's a virtual IP Kubernetes has reserved for the new service. In the same output, you can also see that the port 8080 is exposed. We can now log on to the minikube VM (which is a Kubernetes node) with minikube ssh, and query what is now truly a service with curl $IP_ADDRESS:8080, once more returning Hello from the simple python app. The network requirements mentioned above ensure the reachability of the service IP from the node.

Things get much more interesting when there are multiple pods in a replica set. In order to see the effect, let's use another service that provides more information in its response. This service is in the kubernetes-repository as env-printer-app. When the base path is called, it returns a print of the environment variables. Just like with the previous application, you can go ahead and create a container with the following command:

docker build -t kube-tutorial/env-printer-app:v0.0.1 .

We will start the Deployment with a replica count of 3, which will cause Kubernetes to start 3 pods right away. To do so, use the following command:

kubectl run env-printer-app \
     --image=kube-tutorial/env-printer-app:v0.0.1 \
     --image-pull-policy=Never \
     --replicas=3 \
     --port=8080

Now let's create a Service by exposing this Deployment with the following command, which is a slight modification of the expose command we used earlier:

kubectl expose deploy env-printer-app --port 8080

A new service env-printer-app should pop up in the output of kubectl get services. Note the IP address for this service under CLUSTER-IP as $IP_ADDRESS, and log on to minikube via ssh again. Afterwards, run the following command a couple of times:

curl -s $IP_ADDRESS:8080 | grep HOSTNAME

This command makes a request to the service endpoint, and filters the HOSTNAME environment variable out of it. You should observe that the hostname alternates between the various pod names. Kubernetes is distributing the requests among the replica pods for us, giving us load balancing out of the box.

This very short demo of services leads to more questions than answers. How does the service know which pods to hit when a request comes in, for example? Why can we contact our service only from within the cluster? How can we enable external access to it? Before we can answer these questions, however, we need to have a look at a better way of specifying deployments, services and other resources.

Using the command line versus manifest files

Until now, we have been using the command line interface to Kubernetes via kubectl. It is possible to get quite far with kubectl, as it is pretty complete, but it can become difficult to read, share with others, and organize in a repository. A much better method for organizing Kubernetes resources which adheres to the infrastructure as code mantra is using manifest files. These are either YAML or JSON files (although YAML is preferred) that specify in a more structured format the resources to be created and actions to be undertaken. A manifest file takes the form of a list of resources of different kinds, together with metadata and a spec. It is also common and recommended practice to specify the version of the API that is targeted with each entry. The different entries must be separated with a triple dash separator, which signifies the start of a new document in YAML. This separator is mandatory; if you leave it out, only the first item in a list will be processed.

The resource specifications are documented in great detail in the Kubernetes API documentation. What's even better, however, is that the kubectl command is self-documenting. To get documentation on pods, you can use the kubectl explain pods command. This command will print, prefixed by a short description, the various fields a pod manifest can contain. In order to go deeper in this tree, you can run commands such as kubectl explain pod.metadata.labels, which will give more detailed information on individual fields.

If you have a look at the entry for deployment in either the online or command line documentation, you will see that the metadata field is same across all resources, and the name field is required. This field enables us to refer to resources in commands when we want to get detailed information or delete them, or cross-reference from other manifest files. The spec field is required to adhere to the DeploymentSpec configuration, which should have a template field that describes the pod to be deployed. This template, in turn, must have a metadata field itself, and a spec that should contain a list of containers. As per this specification, here is how to create the above deployment example for the env-printer-app, in YAML format:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: env-printer-app
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: env-printer-app
    spec:
      containers:
      - image: twyla.io/env-printer-app:v.0.0.1
        imagePullPolicy: IfNotPresent
        name: env-printer-app

It is possible to see a common pattern of nested resources that all have metadata which is used to refer to each other, templates that tell Kubernetes what kind of resources to create, and various other kinds of auxiliary information, such as the replicas field. You can now go ahead and use this YAML file, saved into deploy.yaml in the kubernetes-repository/env-printer-app directory, to create a deployment by running kubectl apply -f deploy.yaml. It is possible to create all resources in a directory by kubectl apply -f with the directory path.

You can also use kubectl get KIND NAME -o yaml to get a detailed description of a resource in YAML format. This YAML document might include much more than the information you supplied when creating a resource, as the values for the defaults you omitted, and those calculated or set by Kubernetes are also included. Another really great feature that relies on the YAML representation capabilities of Kubernetes (one of my favorite features) is editing a resource with the command kubectl edit KIND NAME. This command will fetch the resource description in YAML, and load it in the editor defined by the EDITOR (or KUBE_EDITOR, if it's defined) environment variable. Once you save your changes and exit, the new resource description will be applied to the resource. This is a great way to try things out quickly without having to keep multiple versions of resource definitions.

Services, continued

Alright, where were we? So we have a bunch of containers running in Pods, provisioned and kept alive through Deployments, bundled into a Service that puts them behind a common IP. And we can put all of these into one or more YAML files to recreate them arbitrarily. This is a good point to explain one very interesting and versatile feature of Kubernetes: Selectors. If you go ahead and get the details of the env-printer-app service we have created above with kubectl describe service env-printer-app, you should see a row that begins with ~Selector: ~. This selector configuration tells you how Kubernetes finds the pods it should collect behind the virtual IP of the service. If you didn't do anything funky in the meanwhile, the value of the selector row should be run=env-printer-app. If you describe the deployment targeted by this service with kubectl describe deploy env-printer-app, you will see exactly the same selector line. Services and deployments use the same mechanism to match the pods that they hit or control. Which pods are these? This question can be answered by filtering a search by label, as in the following command:

kc get pods -l run=env-printer-app

Not surprisingly, these are the three pods created by the original deployment. This selector-based mechanism is used by many components in Kubernetes, and it is very versatile in that it allows custom labels. This opens up a whole lot of possibilities for different patterns, such as A/B deployments, rolling updates (which we will see later) and similar things.

What is thus happening is that a collection of pods, as picked by the spec.selector attribute, is exposed as a service on an IP. This is not the only way to expose a service, however: There are different kinds of Services based on how this exposing happens. The default is the ClusterIP kind, which is what we have now. Other kinds are NodePort, where a service is exposed on the same port on all exposed nodes, LoadBalancer that uses a platform-native load balancer to expose a service to the outer world, and ExternalName which enables you to provide an external service on the local cluster as if it's an internal one.

These all have their use cases, but the ClusterIP service is the one that covers the most use cases, so we will concentrate on it here. Having multiple pods behind a single IP solves many problems, since Kubernetes also takes care of things like load balancing (done by randomly routing requests; a new proxy mode will introduce more options) or managing modifications in target pod set. One thing it does not solve, however, is the problem of figuring out this IP in the first place. This is another point in which Kubernetes shines: Matching a name to an IP address is done using DNS on the internet, and Kubernetes builds on this common protocol by providing an internal DNS service itself. By default, a DNS A record is created, pointing to the service IP, for each ClusterIP service. Hence, we should be able to refer our env-printer-app under this exact name. To see that this is the case, run the following command to run bash on a container:

kubectl run my-shell --rm -ti --image cfmanteiga/alpine-bash-curl-jq bash

There are quite some arguments to this command, which need some explanining. The --rm switch tells kubectl to delete the deployment and the pod once the command is run, while -ti asks it to attach a tty to the container, and make it connect to the stdin of the container process. The --image argument specifies a lightweight alpine-based image with some debugging utilities, and the last argument is the command to use instead of the entry point of the container. In the shell that starts, you can now run curl http://env-printer-app, and enjoy the environment varliable list delivered by the service.

Ingress

Our service is now humming in the cluster, accepting requests when we hit it at http://env-printer-app. In order to make it available to the outer world, we need to do one last thing: Tell Kubernetes to route HTTP requests from the outside to a certain location to this service. This process is called Ingress, and Kubernetes offers a complete system to handle it. There are two things you need to enable to route requests to the env-printer-app from the outside:

An Ingress controller, essentially a reverse proxy running within Kubernetes that can be configured using Kubernetes-native resources. The two built-in solutions are GCE and Nginx-based. In order to use the Nginx-based ingress controller on Minikube, you have to enable the extension with minikube addons enable ingress.
Ingress specifications. These are resources just like Pods and Deployments, and contain information on how to map incoming requests to services, serving as configuration for the aforementioned ingress controller.

An Ingress specification for the env-printer-app is included in the sample project repo as ingress.yml. After activating the minikube ingress plugin, you can run kubectl apply -f ingress.yml to create an ingress that maps requests to http://env-printer to the env-printer-app service. In order to test the ingress, you need to first figure out the IP of the minikube VM with minikube ip, and then edit /etc/hosts on your computer, adding the line $IP_ADDRESS env-printer. You should now be able to navigate to http://env-printer in your browser, and see the output of the env-printer-app service.

Rolling updates

Once you have a deployment managing a set of pods, there are a couple of things you can do with it to adapt to new conditions. First of these is scaling the set of containers to meet load conditions. One way of achieving this is using the kubectl scale command, as follows:

kubectl scale deploy env-printer-app --replicas=4

Alternatively, you can use the kubectl edit deploy env-printer-app command to bring up an editor, and change the spec.replicas field to the required number. If you now run kubectl describe deploy env-printer-app, there should be a new scaling event in the Events section. When the number of replicas is changed, Kubernetes simply creates new pods, or terminates existing ones, without any further complications. It's a different situation when the container spec for a deployment is changed, however. Kubernetes, based on the strategy specified by the user, replaces the pods progressively, to enable a smooth transition from one set of pods to the other. This is called rolling updates.

In order to demo rolling updates, I added another project to the sample Kubernetes services repository, the rollout-app. You can go ahead and create the service by running kubectl apply -f deploy.yml --record in the app's directory, which will create the deployment, the service, and the ingress. The reason for the --record switch will be explained in a couple of paragraphs. If you edit your /etc/hosts file to add http://rollout-app with the minikube IP, you should be able to navigate to this URL and see a big display of the port's hostname.

If you open rollout-app/application.py, you can see two peculiar things there. One is the /healthz endpoint that returns a simple OK message and nothing else, and the other is a time.sleep(5) before the app starts. The purpose of the /healthz endpoint might become clearer if you also look at the deploy.yml in the same directory; this endpoint is registered as a readinessProbe on the deployment. The readiness probe is a part of the pod lifecycle system of Kubernetes. Before this probe is valid (for HTTP probes, it must return a status code between 200 and 400), the new pod is not marked as "ready", and requests will not be routed to it. Due to the sleep of 5 seconds before our application is started, the pods of the rollout-app will not be ready for at least five seconds. Now let's have a look at how this delay interacts with the rolling updates feature of Kubernetes. Once you have deployed the application, change application.py in some minor way, such as adding a newline. Afterwards, create a new docker container with a new tag with docker build -t kubetutorial/rollout-app:v0.0.2 .. Then go ahead and change the Docker image for the rollout-app deployment to the new version with the following command (again with the --record switch which will be explained later):

kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record

Kubernetes gets to work right away, creating new pods and terminating the ones these are supposed to replace. You can see that this is the case by running kubectl get pods. One peculiar (or actually nice) thing is that Kubernetes does not just pull down the running pods, starting their replacements at the same time. A rollout process is applied, whereby new pods are created as old ones are taken down. You can follow this process by running the command kubectl rollout status deploy rollout-app. This command will hang with a message like Waiting for rollout to finish: 2 of 3 updated replicas are available…. So now the deployment is in the middle of a rollout process. We will see where these numbers come from later. A rollout is actually the process of moving from one replica set to another. You can see that this is the case by running the command kubectl get replicaset (or replace replicaset with rs to make the command shorter). You should see two replica sets that begin with replica-set, one belonging to the old state, and the other belonging to the new state. The DESIRED, CURRENT and READY values of one should decrease, while the other one goes up and approaches required values.

One thing you can do is pause this rollout while it is in progress with kubectl rollout pause deploy rollout-app. This will leave the pod counts the way they are when you run the command, and give you the chance to run checks, to make sure everything is OK. Let's say that you start a rollout, pause it to run some checks, and discover that you made a mistake, and would like to rever to the previous version to fix the issue. This can be achieved by rolling back the rollout with kubectl rollout undo deploy rollout-app. But let's say that you want to move back even further in the deployment history. This is where the --record switch to the kubectl apply command comes into play. Thanks to this switch, we can now see the commands that caused a rollout on this deployment, and a version number that we can use to refer to that rollout. After you deploy version 0.0.2 of rollout-app, the output of the kubectl rollout history deploy rollout-app should be similar to the following:

REVISION        CHANGE-CAUSE
1               kubectl apply --filename=deploy.yml --record=true
2               kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record=true

You can switch e.g. to revision 1 with the following command:

kubectl rollout undo deploy rollout-app --to-revision=1

The rollout feature of Kubernetes is very well-designed and feature rich. Other things you can do are precisely control the number of percentage of pods that are replaced, or set conditions on failing rollouts so that they can be rolled back automatically by other tools.

Going further

Until now, I have been singing Kubernetes' praise, but not everything about it is perfect, unfortunately. We have run into a couple of issues building a Kubernetes cluster. Kubernetes, despite being a relatively young project, is under heavy development, and keeping up with it is not a simple job. The development process is very well-managed, but nevertheless it is a full-time responsibility to keep up with the changes. This situation is mirrored on the provider side of things, as cloud vendors are racing to provide the best hosted Kubernetes solution possible, which also leads to considerable trial-and-error. Azure, for example, started off with a feature called ACS, which was supposed to be a generic container management solution, but quickly recognized how popular Kubernetes was coming, and deprecated ACS in favor of AKS which is directed solely towards Kubernetes, and has extra features such as redundant master nodes. Unfortunately, we are on ACS, and need to make the move to AKS at some point.

Another thing you have to keep in mind when running Kubernetes is that it has significant platform-dependent parts, and these are not uniform in terms of correctness and reliability. A short time after moving to Kubernetes on Azure, we found out that there was a serious bug with Kubernetes on ACS that makes the storage mounting feature of Kubernetes nearly unusable. Our solution is to rely as much as possible on the cloud offerings of Azure such as CosmosDB and managed PostgreSQL, but we will need to use local storage in a service at some point. Fortunately, the bug appears to be fixed in Kubernetes 1.10.

As Kubernetes increases in feature set and complexity, tools built on Kubernetes to simplify workloads and provide more integrated workflows have also started popping up. Kubernetes was never meant as the last application level, meaning that there will be tools that build up on it for specific developer workflows, which is already happening. It looks like Helm is the most popular choice on this front, but there are other alternatives such as OpenShift. So be prepared to learn another tool that runs on top of Kubernetes in the near future.

Bonus: Shell Helpers

There are a couple motions you repeat over and over when you are working on a Kubernetes cluster. One of these is getting the name of a pod. As the pod name is derived from the name of the deployment, you end up running kubectl get pods and either grepping it searching it visually. In the case of single-pod deployments, fetching the name of the pod is very eash with the following bash function:

function podname {
    kc get pods | grep $1 | awk '{print $1}';
}

If you want the name of the simple-python-app pod, for example, you would need to run something as simple as podname simple. You can also use this function as argument to other kubectl commands, e.g. to print the logs with kubectl logs `podname simple`.

Another handy snippet (written by my Bash Jedi Master friend Matthias Krull) is the following, which lets you switch between Kubernetes configurations like between Python virtual environments:

function kubeon {
    if [ "${1}" ]; then
        local config_file="${1}"
    else
        echo "Usage: kubeon "
        return 1
    fi

    if [ ! -f "${1}" ]; then
        config_file="${HOME}/.kube/${1}"
    fi

    if [ ! -f "${config_file}" ]; then
        echo "No config file found. Tried ${1} and ${config_file}"
        return 1
    fi

    export KUBECONFIG="${HOME}/.kube/${1}"
    export KUBEON_PROMPT="${1}"
    export KUBE_MASTER=$(kubectl config view|grep server:|cut -d/ -f3)

}

Using this function, you can set any one of the configuration files in your ~/.kube directory as the current configuration with kubeon filename. Among the variables set are KUBEON_PROMPT, which you can use in your PS1 to visualize the active Kubernetes configuration, and the KUBE_MASTER URL which might come in handy if you want to SSH to it.

KubeCon Impressions

Ulaş Türkmen — Tue, 08 May 2018 13:33:47 GMT

I had the chance to attend KubeCon Europe in Copenhagen last week, and it was a total blast. The attendance was huge, with developers from all over the world, and I had great conversations with many different people. There were countless talks of all levels, and many of them (especially keynotes) by core committers to many projects from CNCF (Cloud Native Containers Foundation). In this post, I would like to gather my impressions on what I think were the main themes, and some tendencies and future directions that I think the CNCF, Kubernetes and the other projects will take. In case you wonder who I am, by the way, I'm the guy who walked around the whole conference in a Kramer hairdo, because his hair gel got confiscated at the airport.

The short name of the conference and the Twitter tag was KubeCon, but it was in fact an umbrella conference by the Cloud Native Foundation. There will apparently now be one in Europe, one in the US and another in China each year. I think this is a great idea, because there are a great many people who either can't afford or don't want to go through the hassle of obtaining a Visa, such as myself. The future of Kubernetes was of course a major topic; Aparna Sinha gave a keynote on the state of Kubernetes, especially regarding how it is hosted on GKE. Most of her talk was oriented around how enterprises are accepting Kubernetes, and what kind of developments they expect. Security was a huge topic, with enhancements to authorization, RBAC and pod permissions on the list. A new project from Google named gVisor was released just recently, bringing very simple sandboxed containers to Kubernetes (there was another talk later just on gVisor). On the application front, better support for stateful applications in the form of application operators was mentioned, but I didn't quite get what was new about this. There is already the operator framework by CoreOS, and it sounded like Sinha was talking about the exact same thing, with common features such as application lifecycle operations, backup, restore, monitoring etc. But maybe I missed something; do let me know in the comments if this is a new feature.

How the enterprise is discovering (or discovered and is now getting involved in more deeply with) Kubernetes, and how Kubernetes is also developing in that direction, was a topic that came up frequently in talks and chats with attendees. There was a very interesting presentation by two developers from a consultancy in China who talked about a project they did for the central Chinese banking authority (The Visa and MasterCard of China, as one presenter said). As one would expect from an organization of that size, they had to come up with a rather complicated setup for security and reliability; there were multiple checks for who could do what, and what could be deployed by whom. Security is obviously one of the things that early adopters may ignore, but enterprises like these care a lot about, but as this talk displayed, Kubernetes has made huge advances in this area.

All the big cloud vendors were at the KubeCon, as one would expect, either advertising or actually revealing their hosted Kubernetes solutions. DigitalOcean announced a hosted Kubernetes solution on the second day of KubeCon; it is yet in early access stage, but will be available soon. The common thing about all these hosted solutions was that they promised to handle the major pains of hosting Kubernetes, such as updates. While the big cloud vendors were targeting the difficulties of running a Kubernetes clusters, other vendors were advertising super easy ways of running an application in a Kubernetes cluster. The presenters of one talk I attended demoed a service called hasura.io where it was possible to simply push code to a Git repo, and have it deployed to a pod in a Kubernetes cluster. The description of the cluster is included as YAML files in a repo, and it is possible to attach these descriptions to a cluster using a CLI client. Once that is done, all git push events deploy to the cluster.

Which brings me to what I think is another trend that has been very obvious in this KubeCon: GitOps. Alexis Richardson mentioned this in his keynote, and he came up with the name as far as I could understand. He also went into more depth in a separate talk on how to implement it on Istio, which I missed and had to watch separately later. One half of GitOps is method-wise the same as the "infrastructure as code" part of devops, in that the system is described in declarative terms and stored in a shared repository. What's new is a much tighter connection between new code in a Git repository, and its availability in the cluster. The aforementioned hasura.io is a platform for achieving this connection. Weaveworks implemented their own internal version using operators, which were mentioned above. These operators listen to Git repositories, update services and deployments based on changes, and report the current state to observability tools. The originator of the push-is-deploy kind of flow is of course Heroku, which was mentioned every time the topic came up. It looks like GitOps will be the Kubernetes-based, more generic method of achieving the same workflow. The way I have explained it here is kind of an oversimplification; I would advise you to have a look at the presentation. I would also expect more tooling support to appear and also be standardized in the near future.

Kubernetes offers a very straightforward pattern of component integration. Components can be deployed as pods managed by Kubernetes itself, accessing data from application pods, and changing cluster state based on specifications stored in the etcd data store. An interesting example of this pattern could be witnessed in a demo for Fluent Bit, where pods could be annotated according the kind of log they output, and the output would be parsed accordingly. A core part of this integration pattern is Prometheus as the main source of observability. All pods make data available in the format Prometheus understands, from where it is posted mostly to Grafana for visibility and alarms. There is now also a slew of new applications that are, as per the name of the foundation, cloud-native and first-class citizens of Kubernetes. This means that they play well with the pod lifecycle elements of Kubernetes, are Prometheus-observable, and can cluster easily in a container network. Another common feature is that they are relatively simple, just like Prometheus itself, and concentrate on doing one job well. This point was very prominent in one talk I attended on Nats, a new message queue whose developers refrained from implementing many standard features in other message queues (message headers, complex routing logic etc), opting instead for performance and reliability.

These various components make life easier and enable continuous scaling and growth of the cluster. Their interaction in a living and changing cluster can get rather complex, however, and minor mismatches can lead to serious issues. This point was driven home in one excellent keynote by Oliver Beattie, the CTO of Monzo, an online bank. He explained an outage which took 1.5 hours to fix. The post mortem is pretty good reading, and shows how the interplay of various pieces of complex software can have unexpected error cases. In this case, one of the root causes was an incompatibility between specific versions of Linkerd and Kubernetes, stemming from the representation of an empty service being changed from an empty list to null in Kubernetes. On the one hand, this hits a pet peeves of mine, namely preferring the null value of a type (empty list in this case) instead of null or none. On the other hand, more generally speaking, this is an issue that I think will become more and more acute in the near future. As the number of components used in a cluster and the frequency of updates to them grows, the chances of one or more of those components interacting together to cause issues will also increase.

The solution to component combinatorial explosion might be another practice on which there were two talks by Sylvain Hellegouarch: Chaos Engineering. The second of these talks went into more detail on what I think will be a more and more accepted means of improving the understanding and reliability of complex clusters. Chaos engineering is "the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production". Sylvain explained the usage of the chaos toolkit, which can run pre-planned tests in which load conditions are created, the cluster is "mutilated", and then the reaction of the cluster is tested against the given criteria. The toolkit then creates a report, replete with graphs and detailed information on whether and how the cluster recovered. A couple of points stressed by Sylvain was that chaos engineering is not an effort to simply break a cluster, but probe it with knowledge of what can actually go wrong. The probing is done with a certain aim, such as the cluster repairing itself or alarms going off. The concrete aim is to unearth weaknesses, which are definitely there, to know what to do in critical situations, and instill more trust in the system by being prepared for difficult situations. Chaos engineering is a practice I definitely intend to introduce into our development team. I think it should be done instead of simple load testing, where optimum working conditions are usually taken as given. Instead, for such an in-depth test to deliver useful and relevant information, proper load conditions need to be combined with changes to system and service pods and failure conditions in various places.

Not all was nice and dandy at the KubeCon. A major point of disagreement between presenters: How to pronounce "kubectl". To my horror, the majority pronounced it "kube-cuttle", which is just wrong. kubectl doesn't have anything to do with cuddling or cuttlefish; it's for controlling Kubernetes, ergo kube control. I guess I will have to wait until the next KubeCon to settle this point with a talk of my own.

One last note: I'm nearly done with an introductory Kubernetes tutorial, which should be published in a couple of days. Follow me on Twitter to be informed when it's online.

Big Software

Ulaş Türkmen — Tue, 11 Jul 2017 13:28:32 GMT

"The grateful moon has granted the city of Lalage a rarer privilege: to grow in lightness" - Italo Calvino

A number of software projects I had the pleasure of working on were what I later came to think of as big software. They had common qualities that led the development team to work in a certain way, perpetuating these characteristics in a cycle. These common qualities should be thought of as umbrella terms; not all big software systems have each and every one of them, and none of them are strictly required. In the following, I would like to describe these qualities, and how they are related to each other.

There is an undeniable attractiveness, or at least maze-like quality to big software. If one did not have to change it to satisfy clients, having to keep the whole edifice running in the process, diving into big software could even be considered a fun and revealing exercise. Navigating the parallel paths, conflicts, bolted-on suburbs and dead ends, one could learn about the individual tendencies and social tensions that led to such a system. But this would be software archeology work, fundamentally different from maintenance and extending.

Changing big software is like writing as Philip Roth describes it: In most professions there's a beginning, a middle, and an end. With writing, it's always beginning again. Every change opens a new can of worms, and closing it is temporary. The conflicts and tears are discovered only when change is attempted, and the change introduces new ones itself, because it pulls the software in yet another direction, in yet another manner. Reconciling the various demands on the code is impossible, as such chances are perpetually delayed. In professional programming work, there is little more satisfying than refactoring big software with proper (frequently archeological) knowledge, and enjoying the simplifications and dead code that result.

All sophisticated software is unpredictable, becoming a complex system in the limit. Big software turns complexity into an art form. Any factor –code, operating environment, external systems– can have cascading effects on the behavior of the system. The only way to reliably find out what the system will do is to run it on the live production system with real data – and this not because production is particularly reliable, but because it's what matters for the users. Even then, the behavior will change from one moment to the other, because of subtle changes in the environment. Effects in big software are nonlocal and disproportionate. A developer can never be certain that what she thinks is the core location of a certain functionality is the only relevant place to look. Simple changes to the environment or code might cause ripple effects.

Delivering big software is a complicated process that depends on many other components of software, online resources, and special conditions. The delivery of changes to users is in no corellation to the size of the change. Since the effects of even minor changes are unforeseen, complex testing mechanisms that take a long time to run exercise all software, for every feature and regression. Many security checks that are themselves complicated due to their target are built into the delivery mechanism, which causes the build to be long and fragile. These mechanisms cannot be forfeited, however, because they are the last barrier to the application disintegrating on delivery, or at least they are perceived to be so. Even if the change to be deployed is tiny, it takes hours, if not days, to deliver, because the baseline for integration and deployment is big.

Once in operation, big software is difficult to observe. In order to navigate the immense complexity in operation, very detailed logs are emitted. Understanding and evaluating these logs becomes a domain of its own, with its own independent logic. There is a fallacy hiding here, similar to that of expecting badly written code to become more comprehensible through comments. It is the belief that a complex system can be understood through a large amount of logs. The same diffuse, uncoordinated approach is applied to error handling. In oder to fight the reliability demons, code is written in a very defensive manner. Default values are used for missing data, errors are caught and handled in different ways in multiple frames. These practices serve to hide errors, in that they are not perceived unless brought up by the clients. In case one of these errors actually manages to surface, the core reason has to be assembled from a number of code locations. As per the Fundamental Failure Mode Theorem, complex systems usually operate in a failure mode; the reflection of this theorem in big software is that big software has ambiguous error conditions. Distinguishing correct functionality from incorrect is difficult.

Due to the reasons listed, integrating new code with big software is a royal pain. This leads to a mindset of not undoing work, of letting things chug along as long as there is no urgent reason to rip things out. After a while, it becomes practically impossible to remove things. Thus, it is difficult to scale big up, but down is even more difficult: Big cannot scale down. Regarding resources or scope, big software will not accept any limits. This is also the fundamental source of big's fragility. As big grows, the impact necessary to cause a failure becomes smaller compared to its size. As it cannot scale down, however, the impact threshold does not go down, even when the system is doing less, in terms of load or functionality. That is, even when the system is used less, for fewer functions, it will keep on breaking as often, and need the same amount of maintenance.

A second-degree quality of big software is a result of the way big software systems working with each other store data. Big software usually interfaces with multiple other big systems. These systems share a lot of representational data, things that are supposed to correspond to a shared reality out in the world. These "facts" frequently diverge from each other, however, but not because the facts change due to transmission or storage errors, but because differences in representation lead, over time, to differences in content. Take e-commerce, for example. There are no two e-commerce systems in the world that represent an order in even remotely similar ways. Some store consumer data in independent tables, tying these to orders, whereas others store all such data on the order itself. The product information is either line or item-based, and the primary reference for a product can be one of many different formats. In order to account for the mismatches between the different storage formats, logic is applied to transform data when it crosses between boundaries. This logic is neither static nor lossless. It changes over time, and data transported from system X to system Y cannot be transported back, into its original format. As data is shuffled from one big system to the other, their representations of reality become multi-faceted, rich, and correct and incorrect at the same time.

Considering all the negative and weird things that accompany it, the surprising thing about big software is that there is a lot of it running and keeping customers moderately happy. It has also made a decent amount of money for some people. The inescapable conclusion is that big software still gets most of the job done, most of the time, and its clients are happy. For me, as a developer, the more relevant question is why we have to work with such systems. There is the fact that sometimes one simply has to work on big software. It might be legacy software that has to keep on running, maybe of one's own doing. It is not infrequent that a team outgrows its methods and tools, getting stuck in a system that served as a ladder used to climb to a deeper design understanding. There are also cases where a team (or more frequently in this case, individual developers) have no problem working on big software, despite recognizing the issues. There is a certain joy in working with big software, as alluded to in the beginning. It gives the programmer a sense of working with something big, complex, beyond the capabilities of others. The bug fixes and feature changes are as big as the software itself. Meeting this challenge provides its own satisfaction. What is forgotten in day-to-day efforts, however, is that it's not possible to grow in lightness in such big steps.

Arguments against JSON-driven development

Ulaş Türkmen — Tue, 23 Aug 2016 07:38:36 GMT

I don't have to explain how popular JSON is. There are very few projects that don't need to work with JSON, even when they are not related to network programming. The ubiquity of JSON is causing some developers to rely on it a bit too much, however. I'm witnessing this in the Python world, but would not be surprised to hear that it happens with other languages too. List, dictionary and primitive types have become the exclusive building blocks in many projects, to the detriment of code quality. These days, it's not unusual to see something like the following:

This function receives and returns dictionaries that have either primitives or lists as values. It builds a dictionary by checking for keys and iterating over values, which leads to item lookup and list iteration being strewn all over the place. This coding style (let's call it the JSON-driven style) has a number of serious disadvantages:

It completely defeats object orientation. The above code is C without pointers. It offers nothing of the abstraction powers of object orientation. With dictionaries, there is no encapsulation. Let's say that you want to change the way the cell labels are accessed. You would have to touch the above function, although it's not strictly its business. I know that it's now en vogue to sneer at OO, but done right, it can be very powerful, especially in big and complex codebases. Use dictionaries, and you throw that out the window.

It doesn't say what it's doing. This is mostly a result of the previous point regarding OO, but deserves its own discussion. The above code is filled with auxiliary logic that has nothing to do with what it actually tries to achieve. For example, the for loop matches books from the persistency to shops, but there is nothing there that even remotely signals that; you have to read the code in detail and build the idea yourself.

It doesn't use the excellent built-in object system. Python's object system (or the protocol) is beautifully designed, and very powerful. It has features like properties, dynamic attribute lookup with getattr, and all kinds of metaprogramming magic. Anyone who has worked with one of the Python ORMs such as SQLAlchemy or the Django ORM will know how much can be achieved with these relatively straightforward tools. The above code completely skips that machinery, and gives the developer only loops and key lookup as tools. The resulting code is accordingly primitive.

The infestation is difficult to control. Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them. When you start working and thinking with dictionaries, you also use them even when you don't have to, or when you shouldn't. This will also inhibit discovery of more interesting features of Python which might actually improve your code.

It's error-prone. Dictionaries and lists don't give you any guarantees about their contents. Each time you access something, either the exception case has to be checked or handled propery, or you have to live with various kinds of exceptions. No one in his right mind would do the first, which leaves the second. In the above code, for example, every key lookup could throw a KeyError. One could say that using objects is not much different, since accessing invalid attributes on an object also causes an exception, but the responsibility for setting and handling attributes is localized to the class in the case of objects, and you don't have to distribute it all over the codebase.

It's ugly as sin. One of the distinguishing features of Python as a language is that good Python code also looks good in an editor. It's not zigzagged, there aren't any large or deep indentation blocks, and there is a rhythm to the size of the different scopes such as functions and classes. When you use lists and dictionaries, however, you are bound to frequently check for membership and existence of keys, which makes achieving this aesthetics virtually impossible.

What to do

Here is what you should do: Take the popular advice relating to Unicode, and apply it to JSON. The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation. Work with the business logic objects, which give you all the OO niceties plus Python's object protocol. Once you are done, decode these objects into JSON again, and send them to wherever they are needed.

Update

I had a very interesting discussion with my colleague Mouad, and he pointed out two things. The first is the danger of creating anemic objects, i.e. objects with only data fields and no behavior, and then using these in functions such as the above instead of dictionaries. This of course beats the purpose of having objects, since you are only delegating the dictionary business to the __dict__ attribute of the objects. Real business objects encapsulate their logic. The other topic is the performance aspect. To be perfectly honest, I didn't think about performance at all when writing this. I usually stick to the adage of Make it work, make it good, make it fast. However, if you are working within strict performance bounds, and don't want to get into any monkey business such as compiling C extensions, which might complicate deployment more than necessary, it might make sense to use dictionaries and lists in the critical places, since they are highly optimized.

Why I'm not a big fan of Scrum

Ulaş Türkmen — Mon, 11 Jul 2016 12:28:08 GMT

Scrum is now the default agile software development methodology. This management framework, which is "simple to understand but difficult to master", is used by 66% of all agile companies. After two extensive workshops, more than five years, and a couple hundreds of sprints working in Scrum, I have some points of criticism about it. I think it's not naturally conducive to good software, it requires too much planing effort on the part of the developers, and it inhibits real change and improvement. In the following, I will try to put these into more detail by organizing them around more concrete topics.

Before you go to the comments section to tell me that I have no idea what I'm talking about, please keep in mind a few things. First of all, this is not a rant against agile. I'm a big fan of agile, as it is explained in e.g. The New Methodology, and I believe that the potential of this concept has not been exhausted yet. Also, I'm not against every idea and practice in Scrum. For example, the principles of the whole team taking responsibility for the code base, or always having an integrated, working master are really awesome. Last but not least, the following points are directed against standard Scrum as described in the official guide. If you are doing something totally different but still calling it Scrum, this post is probably not so relevant for you.

One thing I would like to refrain from here is anecdotal evidence. My individual experiences, as far as they are not related to the proper Scrum entities, are not really relevant, since many of them are individual mistakes, and will therefore intentionally be left out from the following.

Obsession with points

The use of story points appears to be one of the defining features of Scrum. Each user story is given a certain count of story points, and a team is confronted with the number of points it has "achieved" at the end of a sprint. Some teams also follow their daily progress in terms of story points with a burndown chart that is consulted every day at stand-up. The points collected at the end of a sprint consitutes the velocity of a team, and the team is expected to keep this velocity. During planning, the team takes on stories it thinks it can finish until the end of the sprint by looking at the velocity from previous sprints. The velocity of the teams serves to project an estimate of what can be achieved in the future, for the purpose of business planning.

There are many murky things about story points, but somehow the scrum masters and coaches will not abandon it. First of all, what are story points? Are they measures of time it takes to complete a story? If yes, then why are they not in terms of time? Are they measures of complexity? If yes, why are we not talking of the complexity of stories, and how we can remove them, instead of how we can achieve as many points as possible? That is, shouldn't we be talking about doing as few points as possible? The best measure I have heard is effort. You can work three hours on half effort, but work hard for an hour and finish a task, which explains why it's not about time.

No matter how you define story points, the real issue with them doesn't go away. The main purpose of points is making planning more reliable, and providing a temporal perspective for business. They never fail to take on a life of their own, however, with teams working to gather points instead of delivering good software. I don't understand why points are special compared to the oft-mocked cases of bug hunts or lines of code written. If devs are measured on points, they will optimize on points. Has the code base improved? Did it become more modular, simpler, habitable (see the section Habitability and Piecemeal Growth in this book (pdf) by Richard P. Gabriel)? None of these questions is of relevance. The points have to be gathered. The spice has to flow. That's what counts.

One can of course counter that if you write stories for accomplishing these counter-examples, you would get points for them. But the point of stories is that they have acceptance criteria that can be tested for, and demo'ed at the end of a sprint (see the point on creating user value below). How can you demo that your code base has become more habitable? Will the acceptance criterion be that the code is "just, you know, nicer"? In practice, refactoring that aims to improve existing code is done as a part of new stories. Instead of simply adding more to the spaghetti code that exists, you try to "leave the grounds better than you found", as per Pragmatic Programmer lore. This might well be true of simple refactoring where you move code, reorganize classes, or rename things, but the really complicated cases that require rethinking of the base abstractions cannot be covered with this simple recipe.

I definitely understand the need for making software development plannable for business purposes. If the business people cannot rely on some kind of an estimate as to how much can be achieved in a certain time frame, they are navigating in the dark, and the developers are in danger of losing their jobs. Programmers also occasionally dig deeper when they have already dug themselves into a hole, so it makes sense to set limits to stories. But there are, must be better ways to make reliable estimations of how much effort stories require.

Meeting extravaganza

Scrum meetings (aka rituals) have been among the most miserable hours of my life, and this is coming from someone with a number of visits at the Berlin foreigners office. First of all, they are long. I don't care that the meetings take place every 2 weeks if I'm going to be sitting there for three hours. They have too many attendants. Most of the stuff presented is not relevant for most people, but everyone comes because there might be something relevant, and because they have to. The review meeting causes utterly unnecessary anxiety (Oh my god, will my feature work?). It's as if the whole work of the sprint will get evaluated then and there (which in some Scrum implementations actually is the case), and you either get the points or don't, no matter how much thought you put into a piece of work. The app is now faster? Who cares, I don't get the exact response that was expected, so no points for you. One implicit requirement of every story is thus "should be reliably demoable to a roomful of people", which requires much more work than you would imagine (think payments).

In the planning meeting, you get to discuss with others about whether something is two points or five, and then actually list the things you are going to do. I presented my gripes with story points above, but in the context of planning meetings a few more sentences are in order. Why estimate stories that you are going to break down anyway? The breakdown will be a much more detailed analysis of stories, so doing that would provide a much more precise estimate. Another thing that outright astonishes me is how little attention is paid to whether estimations are correct. This is one area where teams can learn the most, because incorrect estimates point to misunderstanding of the code base and domain, but decent review of estimates is rarely, if ever done. Tracking and estimate reviews would also enable Monte Carlo simulation of delivery dates, which sounds awesome, but is, again, rarely done.

Next up is retrospective. Frequent feedback meetings (in which also the estimates are reviewed) are actually a great idea, because the best opportunity to learn from something is right after it happened, but in Scrum, the retro is explicitly supposed to be about the Scrum process itself, not about the codebase, the technology stack or development patterns. So you have this hour for the team, and you are supposed to use it to talk about Scrum itself. Blergh.

The daily standup deserves a blog post of its own. This religious ritual has become a staple of every team in the world. Ten minutes of staring into the void, talking about what you did while no one else listens, because they were in the middle of something five minutes ago and will go back to it in another five minutes, and waiting for everyone else to finish. I know this sounds cynical, but it is the end result of asking people to do it every freaking day. Nowadays devs are communicating on all kinds of channels (email, Slack, Github/Gitlab, ticketing system) and tracking detailed progress on some of these. What's the point in having them stand around for another ten minutes to repeat a few standard sentences? The daily standup is in my opinion a manifestation of a significant but unspoken component of Scrum: Control. The main goal of Scrum is to minimize risk and make sure the developers do not deviate from the plan. I will come back to "Scrum controlmania" later.

One problem Scrum meetings share with all other meetings is that they are synchronous. For teams working remotely, this can become a serious issue, because you have to synch across continents, ending up with people attending meetings at 7 in the morning on one side of the world, and at 4 in the afternoon on the other. This might sound like a simple scheduling problem, but synchronicity is more than that: It means cutting into people's daily routines to force information exchange that could as well be handled otherwise. As argued here, the agile manifesto is complicit in this meeting obsession, due to its emphasis on face-to-face communication. What I have a hard time understanding is why the ancient, simple communication form of text is given second seat. The truth of the matter is that, especially under the constraint of distributed teams, it's difficult to beat text. It is definitely true that writing well without offending others is not the simplest thing in the world, but why not educate the developers and stakeholders in this dark art? They will have to learn to communicate anyway, so you might target this asynchronous mode of communication supported by all tools out there. Text is the best means of communication, and a team that masters it will have a huge advantage. Scrum, however, does not build on text, but on meetings.

Sprint until your tongue is hanging out

Scrum is organized in units of sprints. A sprint is an iteration in which work is done, evaluated, and the process is adapted. The idea of the sprint is that the developers take on a certain amount of work, and do their best to finish it, as in, you know, they sprint. Nobody is allowed to change the acceptance criteria of the stories in the sprint, or add/remove stories. The sprint has its own backlog, which can be changed only in agreement with the team and the product owner. I find the idea that you should get somewhere by sprinting repeatedly rather weird. As any developer will tell you, software development is a marathon, not a series of sprints. But let's forget the semantic point for a moment, since it's a bit too obvious, and scrum proponents could claim it's just a convention that does not have to reflect the actual spirit.

But still, why the artificial two weeks unit? In the above mentioned guide, there is even talk of four weeks. Four weeks is a lot of time, and it is an ordinary occurence that one or more stories become superfluous the way they were written, or other, more urgent things come into focus. If the aim is to be agile, why not accept this as the correct way to work in the first place? In my experience, two weeks is too long for review purposes, too: It's impossible to remember at the end of the sprint what bothered or satisfied you in the beginning. If you shorten it to one week, however, it feels like spending twice the time in the scrum rituals, although they might be shorter.

There is a more fundamental problem with the sprint idea, in my opinion. The reason software is so difficult to plan is that you discover new things about the problem at hand and your idea of a solution as you implement it. These discoveries affect not only the estimate, but also the actual path you are taking to the solution (as excellently described in this Quora answer). The immediate work items, which consitutes the head of the backlog, is the most affected by these discoveries. So essentially, a sprint is working on a frozen set of items that are most prone to change within that time frame. This is also relevant for the point made above, of assuming a too linear trajectory for software development.

Oversimplification of development process

What's so difficult about software development? Write stories, put them on a board, split them into tasks, and then start processing them from the top to the bottom. Gather points by closing stories, pick a new story after closing one, and watch your burndown chart go down. There are a million complications with this approach, of course. How should the teams manage dependencies among each others' backlogs? Can I collaborate with someone, or make sure that I'm not stepping on someone else's toes? One of the most central questions of large-scale software development alluded to above is how to rearrange work in the face of new discoveries as you are actually working; how to rebuild the ship while you're sailing, so to say. This does happen in Scrum within the sprint, and the results of the sprint flow into the next planning session, but it is not foreseen, or even taken to be possible, that the development team can rearrange work while it is making progress.

The Scrum coach will find fifty ways of attacking each and every one of these topics, but all of them will be in the form of one more thing. One more meeting, one more document, one more backlog, one more item in the definition of done. The development process of a scrum team resembles one of those overly-pimped cars after a while: There are so many fancy bits and pieces that the actual car is not recognizable underneath anymore. The development process starts to resemble the oft mocked enterprise development process, where devs are occupied with attending meetings and filling up some documents more than anything else. Talking about the code the team is writing, and how to improve the codebase, might just be one of the meetings among others, if it at all exists.

Creating customer value

Every story in scrum has to end in customer value. The acceptance criteria have to explicitly state what the customers will derive from the results of that story, in the well-known "As a …" format. The idea sounds great in its simplicity, but leads to some really convoluted results when taken to the extreme (which Scrum masters have consistently told me should be done). The most obvious thing is refactoring, already mentioned above. If neither the behavior nor performance change, why even bother with refactoring? And one thing I would be ready to bet my career on is, if you want to develop quality software, you should always be refactoring. As an engineer, I care about many things that will not lead to more sales, or the customer going "It got better" in the very short run. Making the platform more reliable, understandable, aesthetically pleasing is worth spending time on, but none of this is easily expressable as delivering customer value. For that matter, is writing a blog post delivering customer value? Will I get points for it? "As a customer, I want to read Ulaş's blog post" just doesn't sound right. What about contributing to open-source software? Reading the code of an important external dependency, such as the web framework your team uses, and working on bugs or feature requests to get a better understanding was not part of any Scrum backlog I've ever seen.

One more note on refactoring, since this is a favorite topic of mine. Why is it that scrum coaches keep on saying "You should always be refactoring"? Because the assumption is that refactoring will be a few hours' work, or even shorter if it's renaming a class here and replacing a file there. These are only the most superficial cases of refactoring, however. The most difficult refactorings, incidentally also the ones that make the biggest difference, target balls of mud that need considerable effort and work to disentangle, and this is not happening "always". It is the ideal condition to be able to do mini-refactorings, and improve code little by little, but small steps bring you nowhere in the case of these hardened balls of mud. It's of course well and dandy if you can somehow magically plan such a complicated refactoring and find a place for it in your backlog. If you can't, which is much more probable given that deep-reaching refactoring is difficult to foresee, good luck telling your product owner that you will be lost in the depths of your codebase for a while.

Scrum is not native to software

Any team that builds something can work on Scrum. This is often touted as a selling point, but it is admission of a shortcoming, in my opinion. Claiming that Scrum is generic is admitting that it is not cut for the specific nature of software development. What is the job of a software developer? Writing code? I don't think so. I think it's inventing and customizing machine-executable abstractions, and Scrum has no facilitating aspects specifically for this kind of work. Scrum does not tell you how to organize interdependent processes that mutate while they are in flux. It doesn't tell you how to match domains to common abstractions. It doesn't tell you how to distinguish important differences from superficial ones based on context.

Of course, one can claim that this is not the job of Scrum, which is a software management methodology, and not a software engineering methodology, that it's only concerned with organizing the teams' time and workload, and anything else is the business of an engineering methodology, such as XP. If that is the case, why the hell am I, the software engineer, doing most of the work – apart from the product owner, whose job description is doing Scrum anyway? Isn't it by definition the job of the managers, and not of the developers, to be practicing Scrum? Shouldn't I, as a developer, be spending that whole batch of time and energy on software engineering relevant things, instead of on demoing stories, discussing the ordering of stories, and debugging the process itself? Why are the developers practicing only Scrum, and not, let's say, XP with bits of Scrum thrown in?

Another sign of the software-distant nature of Scrum is how little talk there is of an agile codebase in Scrum organizations. It's a non sequitur to think that Scrum is agile, agile teams produce agile code, ergo Scrum teams produce agile code. Having and keeping an agile codebase is crucial to "being" agile, and is actually hard work that requires much more than only following Scrum. It is difficult to introduce processes to manage this work, however, because

Scrum makes claims that it is enough for design to "emerge", and
Where there is Scrum, people are reluctant to introduce even more rituals and documents.

In short: Does scrum help you write good code? Does it help you achieve modularization, expression, complexity reduction? The simplest answer I have is a clear no.

Scrum inhibits deep understanding and innovation

This is actually my biggest gripe about Scrum. As mentioned above, in Scrum, the gods of story points per sprint reign supreme. For anything that doesn't bring in points, you need to get the permission of the product owner or scrum master or someone who has a say over them. Refactoring, reading code, researching a topic in detail are all seen as "not working on actual story points, which is what you are paid to do". Specialization is frowned upon. Whatever technology you develop or introduce, you are not allowed to become an expert at it, because it is finishing a story that brings the points, not getting the gist of a technology or mastering an idea. These are all manifestations of the control mania of Scrum.

I recently read Innovation: The Missing Dimension (my review of the book), a book that focuses on an aspect of innovation that is invisible if you look at design only from a problem-solving perspective. An important part of solving a problem is finding the right problem to solve, and this cannot be treated as a problem itself. It rather requires a community (what the authors call an interpretive community) that can reformulate the given domain and create linguistic and technological tools that allow novelty. This idea is inherent to the original agile principles in the form of individuals and interactions taking precedence over processes and tools. Scrum, however, is much closer to the problem solving approach, where analysis (breaking down a problem, and reassembling the solution) is the organizational tool. In order for an interpretive community to emerge, an organization needs ambiguity, open-ended conversations, and alternative perceptions. All of this, Scrum leaves to something else, whatever it is. They are not the domain of Scrum, but where there is Scrum, there is very little time and energy left for anything else. What's more, the conditions necessary for the emergence of an interpretive community, and thus for innovation, are seen by Scrum as risk that has to be controlled and eliminated. You cannot Scrum innovation.

You might of course think that innovation is not necessary for you, or that it's overrated, and your company can survive without innovating. But keep in mind that the software industry is probably the most innovative one out there. There are new technologies every day, and the basic tools of software development go through revolutions every couple of years. If you're not innovating, someone else who does might knock on the doors of your customers at some point. Also, innovation, in the sense of reinventing, is what software developers love to do, and is a great incentive for keeping top talent.

Summary, Ideas for Alternatives

So, in summary, Scrum

wastes too much of the developers' time for management
does not lead to good quality code
is a control freak which does not leave room for new ideas and innovation.

Discussion on software methodologies are a bit like discussions of open-source software. The default answer to any substantial criticism is "What is your alternative?", which is pretty much the equivalent of "Why don't you submit a patch?". Unfortunately, software management lies in the intersection of many disciplines, and is a huge field itself. My priorities as a developer lie elsewhere, namely in algorithms, programming languages, computer networks etc. I cannot squeeze in 500-page-tomes on software management into my already crammed bookshelves.

Which won't hold me back from making probably ill-advised and rather general proposals, or at least a clarification of my expectations as a dev. First, estimations and forecasting. I don't think there is anything wrong with estimating individual stories in terms of time, and deriving a general estimate of how long a project will take from this. The problem here is that the way stories are split and estimated is orthogonal to the way devs are working. That is, the work I, as a dev, put into organizing the backlog is not helping me in processing the backlog. If it were possible to organize and study the backlog so that this process also helps the devs, they would do it much better and eagerly. One way to achieve this might be putting work items through what I would call an algebra of complexity, i.e. an analysis of the sources of complexity in a work item and how they combine to create delays. The team could then study the backlog to locate the compositions that cause the most work and stress, and solve these knots to improve the codebase. The backlog would then resemble a network of equations, instead of a list of items, where solving one equation would simplify the others by replacing unknowns with more precise values.

The other proposal I would have is to get rid of the review, planning and stand-up meetings. Like, just stop doing them. There is no reason to make them so grand and rigid. You can replace most of the synchronous communication with textual communication, and create ad hoc meetings to discuss specific work items. Instead of having sprints that are marked by these meetings, one could simply point to the backlog as a measure of work done and pending. The retrospective, on the other hand, is the only meeting in which I saw magic happen in Scrum, but it has to happen more frequently, and concentrate more on the code base, as mentioned above.

To make it short, my dream workflow would combine offline working, continuous analysis of the sources of complexity and errors, and detailed, open-ended discussion on the path on which the team is approaching the goal (or not). The correct way of building software should align the understanding that devs have of the problem and the complexity involved with the aims of the other parts of the company.

PostgreSQL Vacuuming: An Introduction for Busy Devs

Ulaş Türkmen — Wed, 20 Apr 2016 12:13:56 GMT

If you have interacted with PostgreSQL at any point in your developer career, you have met it: The autovacuum daemon. It fires up every now and then, consumes resources, and disappears again, without telling you what it did, and why it ran in the first place. In this post, I would like to give an idea of what vacuuming is, what the autovacuum daemon does, and how you can become friends with it.

What is vacuuming?

The concept of vacuuming has to do with the way PostgreSQL implements certain RDBMS features. A modern RDBMS has to offer concurrency control for transactions. That is, different transactions have to be able to see different views of the data, depending on which statements they have already executed. This concept is called transaction isolation, and constitutes the I in ACID. Some rows might be edited by a transaction, changing certain fields, whereas others might be deleted in one while they are still available in others. Furthermore, each transaction can be rolled back, leading to undoing of the changes made by the transaction. The management of data state is complicated by the fact that an RDBMS has to keep the storage of the table and any indexes on a table intact while managing data visibility. It cannot just go ahead and modify data on the primary data structures; this would lead to an invalid state.

The solution implemented by PostgreSQL is called Multi-version Concurrency Control (MVCC). The basic idea is to mark rows according to the transactions which are affecting them, and manage visibility accordingly. Each transaction gets an ID from a simple 32 bit integer sequence. Rows are then marked with this ID regarding which transaction last modified or deleted them. These marks are stored in the xmin and xmax columns which are normally hidden, but visible if explicitly queried for. Using the sample tables from the previous post on PostgreSQL performance, we can insert some data, and then see the transaction ID's:

BEGIN TRANSACTION;
SELECT txid_current(); -- prints the current transaction id
INSERT INTO person (first_name, last_name) VALUES ('Hercule', 'Poirot');
COMMIT TRANSACTION;

On my computer, the SELECT txid_current(); statement prints out 156078. When we query the columns xmin and xmax, we can see the following values:

test=# SELECT xmin, xmax, first_name, last_name FROM person;
  xmin  | xmax | first_name | last_name
--------+------+------------+-----------
 156078 |    0 | Hercule    | Poirot
(1 row)

As you can see, the xmin column of the relevant row is set to the ID of the transaction in which it was commited. xmin can be interpreted as the lowest transaction ID that can see this column. Any transactions that have been started beforehand, and thus have lower ID, cannot see this row. The meaning of the xmax column is the exact opposite. This column is set to the ID of the transaction that deletes this row; any transactions that come after it cannot see the row. Essentially, for a transaction to see a row, the relationship xmin < current_txid < xmax should hold. There are two more columns (cmin and cmax) that are used for tracking rows per cursor state, but the details are not relevant here. For details of the algorithm, have a look at these slides.

MVCC is not the only method for RDBMS concurrency control; other databases use other mechanisms, such as rollback segments in Oracle or MySQL. These are like blocks of work which, when a transaction fails, are undone on rollback or in the next read that refers to those blocks. The advantage of MVCC compared to other methods is that rolling back a transaction has minimal cost. There is no cleanup that has to be done when a rollback happens; the memory and processor load is the same as the commit case.

Enter VACUUM

The disadvantage of MVCC is the topic of this post: The necessity of vacuuming. The primary purpose of vacuuming is as a garbage collector. Since PostgreSQL does not remove any rows from physical storage when they are updated or deleted, after some time (depending on the frequency of update and delete activity in the database), the database will be occupying a lot of essentially unused disk space. Garbage collection is not the only purpose of vacuuming, though. Two related things are visibility map and transaction ID wraparound. Visibility maps are PostgreSQL's way of avoiding unnecessary trips to the heap, where the actual row data is stored. When a query finds rows in an index, PostgreSQL has to check whether these rows are visible, i.e. not already deleted and to be vacuumed, by fetching the data from the heap. This IO trip is avoided using visibility maps that record which pages on the heap have only visible data. If a page is on this map, PostgreSQL does not have to visit it to ensure visibility. As a side note, the visibility map is the reason it took Postgresql longer to implement index-only scans, which are possible only since version 9.2.

Transaction ID wraparound is the name given to the fact that since these IDs are 32 bit integers, they cannot be greater than 2³². When a database has processed more transactions than that, the transaction ID overflows, starting at 0 again. If no further action is undertaken, nearly all rows will suddenly become invisible, because they have positive transaction IDs. The solution implemented by PostgreSQL is setting the xmin id of rows with sensibly low xmin values to a special value FrozenTransactionID which is always considered to be lower (ergo older) than any transaction ID. This happens as a part of vacuuming, so if you do not vacuum your database for a long time, there is a real possibility that old data suddenly becomes invisible.

Edit: As Peter pointed out in the comments, the transaction ID comparison is presented in a simplified manner here. The real comparison of IDs involves modulo-arithmetic, so that the space of IDs wraps around. That is to say, for any ID x, there are 2³² IDs smaller than x, and just as many greater. See the documentation for details.

Manual vacuuming is as simple as running VACUUM; in psql, or rather VACUUM VERBOSE; if you want to actually see what is happening. These commands also accept the name of a table as an optional argument. If this option is ommitted, VACUUM is executed on the whole database. Running only VACUUM is what one could call the first level of vacuuming; it takes care of deleted rows and updates the visibility map. What it does not do is to return the storage space to the operating system, however, contrary to what I said above. It actually updates what's called the free space map (FSM) to mark the pages that have free space due to deleted or updated rows. The next time a new row has to be written, PostgreSQL can consult this map and use the free space in the pages, instead of demanding more storage space from the OS. If you want to reclaim all free space, you need to run VACUUM FULL;, which might be necessary if you e.g. manually delete a lot of rows. Full vacuuming reprocesses table data, and rewrites a brand new version that is compacted and consumes exactly the space it needs. However, think twice before you run it: It locks the tables it is processing, and will block the both read and write queries.

Vacuum ≠ Analyze

As I mentioned in my previous post, PostgreSQL relies on statistics of column value distributions to generate efficient query plans. Updating these statistics is not the job of VACUUM, and requires a separate command, namely ANALYZE. You can run ANALYZE; either standalon in psql (or ANALYZE VERBOSE; for more input), or both maintenance commands together with VACUUM ANALYZE;. As with VACUUM, you can pass [VACUUM] ANALYZE the name of a single table. Fun note: Both ANALYZE and ANALYSE work, so go ahead and spell it the British way if you are keen to do so.

The autovacuum daemon

In order to make the jobs of database users worldwide easier, PostgreSQL since 8.1 comes with a daemon that runs both VACUUM and ANALYZE at certain intervals: The famous autovacuum daemon. It runs as a separate daemon process, the presence of which you can check with a simple ps aux | grep autovacuum. If you don't have any running vacuum processes, you should only see a "launcher process", otherwise you might also see workers. The autovacuum daemon checks each database in regular intervals to see whether it needs vacuuming and/or analyzing. If the number of rows that were updated or deleted is above a certain threshold for a table, these processes are executed. The number of deleted and updated rows is read from the statistics views; we can see an approximation for the person table with the following query:

test=# SELECT n_tup_del, n_tup_upd FROM pg_stat_all_tables WHERE relname = 'person';
 n_tup_del | n_tup_upd
-----------+-----------
         0 |         0
(1 row)

The threshold is calculated according to the following formula:

autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * pg_class.reltuples)

The constants starting with autovacuum in the above formula can be queried in psql from the pg_settings table. The last value can be obtained with SELECT reltuples from pg_class WHERE relname='person';. Bringing these together, we can write the following query as an approximation for what the autovacuum daemon does to decide whether to vacuum a table:

SELECT
(pt.n_tup_del + pt.n_tup_upd) > pgs_threshold.setting::int + (pgs_scale.setting::float * pc.reltuples)
AS should_vacuum
FROM pg_class pc JOIN pg_stat_all_tables pt ON pc.relname = pt.relname
                 CROSS JOIN pg_settings pgs_threshold
                 CROSS JOIN pg_settings pgs_scale
WHERE pt.relname='person'
AND pgs_threshold.name = 'autovacuum_vacuum_threshold'
AND pgs_scale.name = 'autovacuum_vacuum_scale_factor';

You have to keep in mind that the statistics we receive from the pg_stat_all_tables are accumulated since pg_stat_archiver.stats_reset. In the documentation, there is no remark as to which exact statistics the autovacuum daemon uses, but I'm pretty certain that only the tuples updated and deleted since the last vacuum run are included. Otherwise, the autovacuum daemon would have to vacuum every table in every run in the limit. The autovacuum daemon does a similar calculation to decide whether to run analyze; details can be found on the PostgreSQL documentation.

Improving Vacuuming

A frequent issue with the autovacuum daemon is that it gets to work at unexpected times of the day, maybe in the middle of a high load period, and causes deteriorated performance. Another symptom of improper vacuuming regime is queries that are executed with suboptimal query plans. The primary reason for this is incorrect table statistics, which can be alleviated by ANALYZE statements that run as a part of vacuuming. As you can see above, it's difficult to imitate the behavior of autovacuum. The ideal case would be to find out whether analytics are out of sync, but that's difficult to find out, and not even autovacuum does that:

The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes.

You are also strongly advised to never turn off autovacuuming, because of the risks it involves. Even if you do frequent manual vacuuming, there might be unexpected bouts of high activity that affect many rows. Also, autovacuum will do little work if it runs when you have manual vacuuming, so it makes sense to just leave it running. The most sensible thing to do is to adjust the settings so that large and active tables are vacuumed more frequently. Here is a query to find out which tables have the most number of rows:

SELECT reltuples,relname FROM pg_class WHERE relkind='r' ORDER BY reltuples DESC;

You can either schedule a cron job to vacuum the largest tables regularly if you have periods of low load, or as is in our case, if the load on your application is continuous, you can adjust the parameters for these tables to run vacuuming more frequently. The vacuum parameters can be set separately for individual tables with the following query:

ALTER TABLE person SET (autovacuum_vacuum_scale_factor = 0.0);
ALTER TABLE person SET (autovacuum_vacuum_threshold = 4000);

As per the equation above, these settings would cause autovacuum to vacuum these tables every 4000 row updates or deletes, no matter how many rows are already in the table. This would lead to more frequent vacuuming of these tables, and shorter vacuum times in the runs where all tables are vacuumed, leading to better perforamnce. The settings for individual tables can be queried from the pg_class table as follows:

SELECT relname, reloptions FROM pg_class WHERE relname='person';

A simple test using pg_dump and pg_restore has revealed that settings changed with the ALTER statement above are also preserved in the dump and restore process, so you don't have to run it for every new instance of your database if you're reading in dumps.

Lessons from Legacy

Ulaş Türkmen — Wed, 30 Mar 2016 14:58:07 GMT

Last two years of my working life have been spent on an e-commerce application that is mainly occupied with coordinating inventory items, orders and shipments. The main user interface of this application is a REST API tied to an Angular frontend. The same API is also used by various middleware applications that sync with other e-commerce applications. Since our company has moved on to a new product, but we still have a customer using this legacy system, I had to take on the duty of keeping it running. As I dug deeper into it to make improvements, a couple of decisions we made at the time stood out to me as having proven to be suboptimal. It is one thing to read about general software design principles in the abstract, and another to see them demonstrated on a live, growing system in which you have a stake. Here is my attempt at making my observations as concrete as possible.

Don't make a straitjacket out of constraints

When it comes to securing the integrity of your data, PostgreSQL can do wonders, with such features as constraints, triggers, enumerations, and normalization, the sine qua non of relational databases. Add SQLAlchemy to the mixture, and anyting is possible. For example, with a feature called composite secondary joins, SQLAlchemy allows you to present a join across multiple tables as a field on a model, making complicated normalization schemes possible. On the Python end, you are accessing a simple entity attribute, while in the background PostgreSQL is jumping over foreign keys to make sure that, for example, the total available quantity for a certain product is the sum of all inventories for that product in different storage cells. This is all good and dandy as far as you are aware of the trade-offs. If you rely on the database too much, like we did, there are a number of dangers that lurk around the corner. First is performance. Accessing an instance attribute is a cheap operation in Python, and one does not think twice about it while coding. If there is a complicated join behind such an access, however, you might be creating a time sink that gets deeper with the size of the database. The ugly truth about slow database queries is that they rarely get caught in development or testing, because the database has little data and few connections. Production is where database performance is diagnosed, and avoiding that diagnosis in the first place is not a bad idea.

More impactful than performance, however, was how difficult it became to change things at the database level because of our excessive use of constraints. This is a textbook case of tight coupling: We propagated the constraints in the application into the database where they didn't really belong. Generally speaking, leaving room for action on the database gives you the freedom to circumvent otherwise complicated issues by modifying data. When you fixate everything with constraints, enumerations etc, you are giving this freedom away. For example, we used enumarations to limit the kinds of connectors (our name for the async jobs that were queued for execution). Since we already had these enumerations, we used them further to denote the sources for orders. I recently found out that removing connectors was made pretty much impossible by this dependency. I could purge the code for an unused connector, but the name in the enumeration had to stay, because there were orders created by this connector, and the source field had the name of the connector. Another affected area was deployment. Due to our excessive use of triggers, running migrations required dropping and then recreating all triggers. This made deployment longer and more error-prone, something I will touch upon later.

As general principles, I would venture to extract the following:

Anything that changes regularly at the code level should be secured only at that level, and not in the database.
Anything that requires locking complete tables when modifying is unacceptable as a consistency feature.

Make sure that securing your data does not turn into an exercise in rigidity. DB constraints are OK when they are used sparsely, but you should always be conscious of the trade-offs you are making.

Have a strategy for growing business logic

Just in case you haven't recognized it yet: Organizing code is difficult. A decent portion of a developer's time goes into figuring out where a certain line of code goes. One serious mistake that we made was distributing business logic around the code base, instead of centralizing it. The logic for routing orders to consumers, for example, was in the module that contained the queued jobs, whereas the code that was responsible for changing the state of orders was in the API module. Among the consequences of disorganized business logic were the following:

We frequently had a really hard time deciding on the correct location for changing or adding a functionality. Alternatively, finding the location of code responsible for a feature was never straightforward.
Tests for the various units of code manipulating the same DB model were complicated. Since the logic for state changes and processes was spread all over the place, the DB state had to be created for each test. If we had abstracted away the business logic into code that didn't need database objects, but could work with plain Python instances, testing would have been easier, faster and less error-prone.
Mentally tracking a process was an exercise in compiling code in your mind. A module was never really complete without inserting code from some other module in your mind's eye. Actually, I would sometimes temporarily copy-paste code to make understanding things easier.

As soon as you start building business logic, you should have a clear idea of how you want to keep it coherent, united and testable. These orientation points should also make your code open to discovery and modification.

How you bring code to users is as important as how you build it

Writing the code is half the work of creating a feature. The rest is actually bringing it to the user. This involves testing, merging into main branch, and deploying to production. There were a lot of things we got right here; we were using a very simple integration approach with pull requests merged into master, and integrated testing. There were also some things we got wrong. The first of these was slow testing. This is something most of the industry is getting wrong, in my opinion. We as developers have gotten used to having test suites run in tens of minutes, and accept this as a necessary evil of large code bases. Our test suite was also rather long-running; some test runs took more than half an hour to complete. The result of this was distraction, prolonged deployment times, the occasional skipping of test runs, and overall developer dissatisfaction. A long running test suite is also frequently broken due to reasons unrelated to the code, reasons more related to infrastructure or annoying race conditions. Such failures have led to the biggest frustrations of my life as a developer. As I'm trying to concentrate on a difficult issue, a one-liner commit seemingly breaks the test run. Such things sometimes cost me a whole day, and they usually turned out to be due to the unrelated issues like the minor version of a cli program having changed. Through such breakage and long waiting times, the test suite cannot be used as a tool that gives feedback on correctness. It becomes a nuisance, a bureaucratic process that has to be fulfilled to deploy code. If I had a say for a new project, I would put a hard limit of 10 minutes for test runs, and work very hard to keep this limit.

The next step in delivering the code is deploying it to production. Again, we weren't doing bad in this area, but it could have been better. Deploying code did not take long, but it was prone to breaking. The fixes required better devops skills, which not enough people possessed at the time. I know that this is much more often said than done, but I have to join the chorus: Having a dead simple and robust deployment process should be the highest priority of a team that runs web applications to earn their daily bread.

Be very very careful with end-to-end tests

Keepin with the testing theme, one thing we really regretted was going overboard with end-to-end (e2e) tests. These were implemented using the test runner for Angular, and fired Chrome to go through many frontend features. The Angular frontend was run against an actual backend. The e2e tests appeared as a real wonder weapon at the beginning. Look, you can actually run the whole application! Can you get a better guarantee that everything is OK and you can deploy? With this thinking, we wrote e2e tests for each corner case of a given functionality. Once we accumulated a decent number of such tests, however, they turned out to be rather problematic. We faced two principal difficulties in this context. First, one had to set up the exact environment through the database so that the frontend could find out what it was looking for. Second, navigating through the interface was painful even with very precise setup, because one had to find the exact selector for the elements (buttons, inputs etc.) to be manipulated. These selectors also broke very easily when the HTML structure changed. Adding one new list to a page led to having to fix the selectors in all the tests for that page.

I know that frontend testing is a difficult area, and won't pretend that I have the solution to it. There was one lesson that surfaced in our review discussions, however: It would have been a better idea to abstract away some of the JS and unit test it without the DOM or the backend application. The browser has many complexities that are difficult to control in a completely automated environment, and it is best to avoid these when you can.

Don't leave transparency to your tools

We had to do our fair share of firefighting, which went like this: Customer calls us, tells us that they are getting weird responses from our API, or maybe no responses at all, which puts us into scrambling mode. We go to the logs for details, but our application is not telling much, so we have to check the logs from the web server, message queue and the database. From the information spit out by them, we need to piece together what the issue is, and deploy a fix after figuring it out. The fact that we had to consult the infrastructure to find out the exact problem was the reason that our customer was calling us in the first place, instead of us getting alerted to the issue.

While discussing this topic with our current CTO, who has a lot of experience with large systems, she remarked that if you leave the job of reporting on your platform to the infrastructure components such as database or message queue, the developers will go to them to understand system behavior. This will make debugging and performance-monitoring an indirect process, where you have to interpret the data coming from these components to draw conclusions on what is happening in your own code. The more sensible thing to do is build in proper logging and monitoring from the beginning to make sure your system is transparent, i.e. it should be telling you what's going on during normal and faulty operation.

Focus on your core technology

Every large application relies on at least one component of the infrastructure as the core. Depending on the kind of application, this core can be different. Data-driven web applications nearly always have an RDBMS in their center, which was also the case with our app, in which case it was specifically PostgreSQL. PostgreSQL is an incredible peace of technology. It is robust, feature-complete, fast, and somehow still evolving. Despite relying on it so heavily, I have to admit that our knowledge of PostgreSQL was still relatively limited, in the sense that we were not using many important features properly. We had proper replication in place, and were using triggers and constraints as explained above, but our use of indexing was really limited, for example. I discovered this when I took the time to examine the index usage of our database (which I documented in another blog post), and created a number of indexes that improved the most frequent queries. The result was our customer representative telling me that the app was so fast, that he thought there was something wrong. Indexing is obviously not a specialty of PostgreSQL; it is the feature of RDBMS's. It is therefore database theory that we should have tried to understand a bit better, and you can do this much better when you are working on a data heavy application that should have been performing much better yesterday.

Other core technologies such as message queues or key-value stores all have their tricky corners which, when understood properly, can help you navigate difficult situations, and earn yourself some time when you desperately need it. Most important, though, is understanding the theory common to all such systems, and the common algorithms that exist to navigate the constraints. You should take the time to read the documentation a bit, and inform yourself about not only the specifics but also the general theory.

Nothing beats incrementally improved systems

To finish on a positive note, I need to mention that our application has been improving continuously, and getting more robust and performant as we keep working on it incrementally, attacking one issue at a time. Continuous attention and incremental improvement beats rewriting a large codebase, given that it is not utter spaghetti code, or written without a sound understanding of best practices and the programming model supported by the programming language(s) used. This was one of the biggest advantages of our team: All developers were experienced and dilligent students of Python and/or JS, the two languages our application was built with. Every developer strived to write clear and idiomatic Python and JS, and there was constant exchange on how to do the right thing. The end result is a codebase that can still evolve, even with a single developer left working on it.

What PostgreSQL Tells You About Its Performance

Ulaş Türkmen — Mon, 29 Feb 2016 10:47:19 GMT

Recently at work I was tasked with improving our legacy application. It has been neglected for a while, and takes its revenge by causing frequent firefighting and overall crappy performance. The application is tightly coupled with a PostgreSQL database, and many things that are normally not the job of a database (such as keeping version history) are delegated to this single PostgreSQL instance. The result is a feedback loop where the database is under immense load for even the simplest things, causing frequent deadlocks and extremely long queries, which leads to decreased performance and long request times, which leads to even more load. To put an end to this spiral of endless firefighting, and improve my knowledge of Postgres, I decided to spend some time with the legacy application. The first step was analyzing the database performance, to find out whether there is anything that would give us the biggest advantage with comparably small effort.

Generally speaking, the following are the factors that we need to focus on to judge how well a database cluster is performing:

Index usage: The most important algorithmic fundamental of a relational database is the B-tree index. If a database is not properly configured, it will do sequential scans across frequently used tables (linear with table size) instead of using an index (logarithmic).
IO: PostgreSQL does its best not to read data from disk, either delaying reading as much as possible, or using the cache. Whether reading disk can be avoided depends mostly on cache configuration.
Concurrent connections: Many parallel connections consume a lot of memory and CPU. You should make sure that your database is not plagued with more connections than it can handle.
Deadlocks: These nasty buggers are the biggest killers in terms of performance, because they lead to long queries, blocked connections, and expensive transaction rollbacks. If you have a lot of deadlocks, your locking queries need a review.

Collecting General Performance Data

Not surprisingly, there are a number of tables that PostgreSQL keeps within its own schema with an abundance of information on the above dimensions. These tables all start with either pg_stat or pg_statio, and are generally referred to as the stats tables. It is important to keep in mind that there are two kind of statistics in PostgreSQL. The first kind is for its own internal usage, such as deciding when to run autovacuum, and query planning. This data is kept in the pg_statistics catalog. As the documentation points out, this table should not be readable to ordinary users. A publicly readable view on this data that is also in a more human-friendly format is pg_stats.

The second kind of statistics is for monitoring, and these tables are the focus of this post. The monitoring stats tables can be subsumed in three groups: Database-specific, table-specific and query-specific. Let's start with database-specific statistics. The statistics for a single database are saved in the pg_stat_database table. In addition to the rows that are to be expected, such as database name and id (datname and datid), the following columns that are relevant to our interests are in this table:

numbackends: Number of backends currently connected to this database.
blks_read, blks_hit: Number of times disk blocks were read vs. number of cache hits for these blocks.
xact_commit, xact_rollback: Number of transactions committed and rolled back, respectively.
deadlocks: Number of deadlocks since last reset. As mentioned above, very important for database performance.

numbackends is an important column, not only because too high a value can cause issues, as mentioned above, but also because the change in this number during normal operation gives us a hint about how long queries are taking. Combining the value of numbackends with the oldest running query from the pg_stat_activity table might also be informative, to make sure that there are no long-running connections that were not properly closed.

The ratio of cache hits to total reads can be determined with the following query:

SELECT blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio
FROM pg_stat_database
WHERE datname=current_database();

This number is the most important metric for measuring IO performance; it should be very close to 1. Otherwise you should consider changing the shared_buffers configuration option. A similar ratio of the number of committed transactions vs. all transactions is also important:

SELECT xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio
FROM pg_stat_database
WHERE datname=current_database();

Except for numbackends, all these values are accumulated since the time they were reset. Resetting can be carried out by logging into the database and running select pg_stat_reset();. The last time this was done is stored in the stats_reset column. Resetting statistics affects only the monitoring tables; pg_statistics is populated by ANALYZE, and is not affected.

The most useful table-specific stats table is pg_stat_all_tables. Running a simple \d pg_stat_all_tables on this table reveals some very interesting columns:

last_vacuum, last_analyze : The last time vacuum and analyze have been executed manually on this table.
last_autovacuum, last_autoanalyze : The last time this table has been vacuumed or analyzed by the autovacuum daemon.
idx_scan, idx_tup_fetch: The number of times an index scan was made on this table, and the number of rows fetched this way.
seq_scan, seq_tup_read: The number of times a sequential scan was made, and the number of rows read this way.
n_tup_ins, n_tup_upd, n_tup_del : Number of rows inserted, updated and deleted.
n_live_tup, n_dead_tup : Estimated number of live rows vs. dead rows.

The most meaningful stats from a performance perspective are those related to index vs sequential scans. An index scan happens when the database can determine which rows to fetch by ID only using an index, a data structure that is easy to traverse. A sequential scan happens, on the other hand, when a table has to be linearly processed in order to determine which rows belong in a set. Sequential scans are very costly operations for big tables. The reason for this is that reading rows is an expensive operation, as the actual table data is stored in an unordered heap. The aim of a database user therefore should be to tweak the index definitions so that the database does as little sequential scans as possible. I strongly recommend the book SQL Performance Explained on the topic of indexes. The ratio of index scans to all scans for the whole database can be calculated as follows:

SELECT sum(idx_scan)/(sum(idx_scan) + sum(seq_scan)) as idx_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname='public';

The user has access to the tables in the current database plus some other system tables, such as the TOAST tables, which necessitates filtering these out by looking at only those in the public namespace. The value returned by the above query should be very close to 1, otherwise you have a serious problem. In order to see a more detailed report of how individual tables are faring in the same area, you can use the following query, which calculates the same ratio per table and puts them in ascending order:

SELECT relname,idx_scan::float/(idx_scan+seq_scan+1) as idx_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname='public'
ORDER BY idx_scan_ratio ASC;

As pointed out in this blog post, it might be a good idea to pay special attention to index usage on tables with many rows, and make sure they are as highly optimized as possible.

Running the query select pg_stat_reset(); as superuser resets also pg_stat_all_tables as well as pg_stat_database.

Trigger behavior

One question we had in mind was how the stats were related to queries running within trigger functions. PostgreSQL is known for doing the sensible thing, so we expected stats to be collected also within triggers, but it's best to make sure by running a simple test. Let's create an empty database with the following simple tables:

CREATE TABLE person (
  id SERIAL PRIMARY KEY,
  last_name VARCHAR(255),
  first_name VARCHAR(255)
);

CREATE TABLE address (
  id SERIAL PRIMARY KEY,
  person_id integer REFERENCES person(id),
  fullname VARCHAR(255),
  street VARCHAR(255),
  city VARCHAR(255)
);

We can insert the following rows into the person and address tables:

INSERT INTO person (first_name, last_name)
VALUES ('Hercule', 'Poirot');

INSERT INTO address (person_id, fullname, street, city)
VALUES (1, 'Hercule Poirot', 'Rue des Martyrs', 'Paris');

A quick check of the pg_stat_all_tables after resetting stats reveals the following:

SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person';
SELECT * from person where first_name='Hercule';
SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person';

The first SELECT query on pg_stat_all_tables returns 0, 0, 0, whereas the second one returns 0, 1, 0, as one would expect. In order to test whether these statistics take into account triggers, we can add a trigger to the person table with the following lines:

CREATE OR REPLACE FUNCTION update_fullname() RETURNS TRIGGER AS $$
    BEGIN
        UPDATE address
          SET fullname = concat(NEW.first_name, ' ', NEW.last_name)
          WHERE person_id = NEW.id;
        RETURN NULL;
    END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS update_fullname_trigger ON person;
CREATE TRIGGER update_fullname_trigger
    AFTER UPDATE ON person
    FOR EACH ROW
    EXECUTE PROCEDURE update_fullname();

After installing the update_fullname trigger, which changes the fullname column in the address table when a person changes, we can reset the statistics and run a simple update to see what happens:

SELECT pg_stat_reset();
UPDATE person SET first_name = 'Marcel' WHERE id=1;
SELECT idx_scan,seq_scan,n_tup_upd FROM pg_stat_all_tables
  WHERE schemaname='public' AND relname='address';

This should return 0, 1, 1, meaning that the query ran by the trigger was registered in the statistics.

Monitoring Query Performance

The tables mentioned until now give you a general overview of the performance characteristics of your database. When it comes to finding the reasons for these characteristics, you need to go one level deeper, to individual queries. The one table that has the most information on the performance of individual queries is pg_stat_statements. Unfortunately, this table is populated by a plugin that has to be first enabled, requiring a database restart. I would strongly encourage you to install the plugin though, since the data registered by it is impossible to derive or collect otherwise. Enabling the plugin is a matter of installing the package postgresql-contrib-9.X for your version of PostgreSQL and Unix, and adding (or uncommenting) the following lines in postgres.conf:

shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all

Afterwards, you should log in to the database of interest and run CREATE EXTENSION pg_stat_statements;. From now on, various statistics will be collected for each individual query, and stored in the pg_stat_statements table. The important identifier columns on this table are the following:

dbid: This column has the ID of the database on which the query was ran. The corresponding column in the pg_database table is called oid, and is hidden. You normally don't have to filter for this column, though; only the queries for the currently connected database are visible in the pg_stat_statements table.
queryid: This is a hash of the internal representation of the query. The way this hash is calculated involves a number of subtleties. These will be discussed a few lines below.
query: A representative text for what PostgreSQL considers to be the same query.

Query hash generation takes as its input the representation that PostgreSQL generates after a query is parsed and matched to the relevant tables or indexes. The scalar values in the query are then stripped out for plannable queries, i.e. SELECT, INSERT, UPDATE, DELETE. The resulting internal representation is an abstract "summary" of the query. Different queries can thus match to the same queryid, for example in the cases where the order of the select fields or the join order is different. See the PostgreSQL documentation on the topic for further details.

The columns in the pg_stat_statements table relevant for performance analysis are the following:

calls: Number of times executed
total_time: Time spent in this query
min_time, max_time, mean_time: The min, max and mean of all query runs.

As with the above statistics tables, pg_stat_statements aggregates values between resets. This table requires a different function to reset, the aptly named pg_stat_statements_reset.

A simple test shows that the queries ran through triggers are accounted for in the pg_stat_statements table, too. After creating the tables, registering the triggers, and resetting the statistics with SELECT pg_stat_statements_reset(), let's run the following simple query again:

UPDATE person SET first_name = 'Marcel' WHERE id=1;

Asking for the statistics shows us that the UPDATE statements in the trigger have been registered properly:

test=# select calls,total_time,left(query,30) from pg_stat_statements where dbid=874591
order by calls desc;
 calls | total_time |              left
-------+------------+--------------------------------
     2 |      0.201 | select calls,total_time,left(q
     1 |      0.019 | UPDATE address                +
       |            |           SET f
     1 |      8.898 | select pg_stat_statements_rese
     1 |      0.564 | UPDATE person SET first_name =

Once the pg_stat_statements extension is enabled, improving database performance in terms of query duration (the most important thing, as far as the users are concerned) is as simple as finding the longest-running queries ordered either by average or total time, finding sample values for the parameters, and running them with EXPLAIN or EXPLAIN ANALYZE. See this old but still relevant tutorial for a quick introduction to using EXPLAIN.

One more thing we wanted to achieve was to regularly query our database instance for the above mentioned pieces of information, and display them on our Kibana dashboard. Unfortunately, Logstash proved to be a roadblock with its weird parsing behavior and incomprehensible bugs (hence my current attempt to rewrite it in Python), but for the time being, here is a bash script which uses psql to query PostgreSQL for the stats tables, and pipes everything to syslog:

set -e

case "$1" in
    database)
        psql -x db_name -c "select numbackends,blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio,xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio from pg_stat_database where datname=db_name;" | grep -v RECORD | sed '/^$/d' | tr '\n' ' ' | logger
        ;;
    statements)
        psql -x db_name -c "select queryid, total_time, (total_time::float/calls) as mean_time, left(query,40) as short_query from pg_stat_statements order by total_time desc limit 10;" | tr '\n' ' ' | sed 's/-\[ RECORD [0-9]* \]-*/\n/g' | xargs -d '\n' -n 1 logger
        ;;
    *)
        exit 1
        ;;
esac