Krzysztof Kowalczyk blog

How I implemented wc in the browser in 3 days

2023-03-21T00:00:00Z

Building wc in the browser

From time to time I like to run wc -l on my source code to see how much code I wrote.

For those not in the know: wc -l shows number of lines in files.

Actually, what I have to do is more like find -name "*.go" | xargs wc -l because wc isn’t a particularly good at handling directories.

I just want to see number of lines in all my source files, man. I don’t want to google the syntax of find and xargs for a hundredth time.

After learning about File System API I decided to write a tool that does just that as a web app. No need to install software.

I did just that and you can use it yourself.

Here’s how it sees itself:

The rest of this article describes how I would have done it if I did it.

Building software quickly

It only took me 3 days, which is a testament to how productive the web platform can be.

My weapons of choice are:

Svelte for frontend
Tailwind CSS for CSS
JSDoc for static typing of JavaScript
File System API to access files and directories on your computer
vite for a bundler and dev server
render to deploy

For a small project Svelte and Tailwind CSS are arguably an overkill. I used them because I standardized on that toolset. Standardization allows me to re-use prior experience and sometimes even code.

Why those technologies?

Svelte is React without the bloat. Try it and you’ll love it.

Tailwind CSS is CSS but more productive. You have to try it to believe it.

JSDoc is happy medium between no types at all and TypeScript. I have great internal resistance to switching to TypeScript. Maybe 5 years from now.

And none of that would be possible without browser APIs that allow access to files on your computer. Which FireFox doesn’t implement because they are happy to loose market share to browser that implement useful features. Clearly $3 million a year is not enough to buy yourself a CEO with understanding of the obvious.

Implementation tidbits

Getting list of files

To get a recursive listing of files in a directory use showDirectoryPicker to get a FileSystemDirectoryHandle. Call dirHandle.values() to get a list of directory entries. Recurse if an entry is a directory.

Not all browsers support that API. To detect if it works:

/**
 * @returns {boolean}
 */
export function isIFrame() {
  let isIFrame = false;
  try {
    // in iframe, those are different
    isIFrame = window.self !== window.top;
  } catch {
    // do nothing
  }
  return isIFrame;
}

/**
 * @returns {boolean}
 */
export function supportsFileSystem() {
  return "showDirectoryPicker" in window && !isIFrame();
}

Because people on Hacker News always complain about slow, bloated software I took pains to make my code fast. One of those pains was using an array instead of an object to represent a file system entry.

Wait, now HN people will complain that I’m optimizing prematurely.

Listen buddy, Steve Wozniak wrote assembly in hex and he liked it. In comparison, optimizing memory layout of most frequently used object in JavaScript is like drinking champagne on Jeff Bezos’ yacht.

Here’s a JavaScript trick to optimizing memory layout of objects with fixed number of fields: derive your class from an Array.

Deriving a class from an Array

Little known thing about JavaScript is that an Array is just an object and you can derive your class from it and add methods, getters and setters.

You get a compact layout of an array and convenience of accessors.

Here’s the sketch of how I implemented FsEntry object:

// a directory tree. each element is either a file:
// [file,      dirHandle, name, path, size, null]
// or directory:
// [[entries], dirHandle, name, path, size, null]
// extra null value is for the caller to stick additional data
// without the need to re-allocate the array
// if you need more than 1, use an object

// handle (file or dir), parentHandle (dir), size, path, dirEntries, meta
const handleIdx = 0;
const parentHandleIdx = 1;
const sizeIdx = 2;
const pathIdx = 3;
const dirEntriesIdx = 4;
const metaIdx = 5;

export class FsEntry extends Array {
  get size() {
    return this[sizeIdx];
  }

  // ... rest of the accessors
}

We have 6 slots in the array and we can access them as e.g. entry[sizeIdx]. We can hide this implementation detail by writing a getter as FsEntry.size() shown above.

Reading a directory recursively

Once you get FileSystemDirectoryHandle by using window.showDirectoryPicker() you can read the content of the directory.

Here’s one way to implement recursive read of directory:

/**
 * @param {FileSystemDirectoryHandle} dirHandle
 * @param {Function} skipEntryFn
 * @param {string} dir
 * @returns {Promise<FsEntry>}
 */
export async function readDirRecur(
  dirHandle,
  skipEntryFn = dontSkip,
  dir = dirHandle.name
) {
  /** @type {FsEntry[]} */
  let entries = [];
  // @ts-ignore
  for await (const handle of dirHandle.values()) {
    if (skipEntryFn(handle, dir)) {
      continue;
    }
    const path = dir == "" ? handle.name : `${dir}/${handle.name}`;
    if (handle.kind === "file") {
      let e = await FsEntry.fromHandle(handle, dirHandle, path);
      entries.push(e);
    } else if (handle.kind === "directory") {
      let e = await readDirRecur(handle, skipEntryFn, path);
      e.path = path;
      entries.push(e);
    }
  }
  let res = new FsEntry(dirHandle, null, dir);
  res.dirEntries = entries;
  return res;
}

Function skipEntryFn is called for every entry and allows the caller to decide to not include a given entry. You can, for example, skip a directory like .git.

It can also be used to show progress of reading the directory to the user, as it happens asynchronously.

Showing the files

I use tables and I’m not ashamed.

It’s still the best technology to display, well, a table of values where cells are sized to content and columns are aligned.

Flexbox doesn’t remember anything across rows so it can’t align columns.

Grid can layout things properly but I haven’t found a way to easily highlight the whole row when mouse is over it. With CSS you can only target individual cells in a grid, not rows.

With table I just style <tr class="hover:bg-gray-100">. That’s Tailwind speak for: on mouse hover set background color to light gray.

Folder can contain other folders so we need recursive components to implement it. Svelte supports that with <svelte:self>.

I implemented it as a tree view where you can expand folders to see their content.

It’s one big table for everything but I needed to indent each expanded folder to make it look like a tree.

It was a bit tricky. I went with indent property in my Folder component. Starts with 0 and goes +1 for each level of nesting.

Then I style the first file name column as <td class="ind-{indent}">...</td> and use those CSS styles:

<style>
  :global(.ind-1) {
    padding-left: 0.5rem;
  }
  :global(.ind-2) {
    padding-left: 1rem;
  }
  /* ... up to .ind-17 */

Except it goes to .ind-17. Yes, if you have deeper nesting, it won’t show correctly. I’ll wait for a bug report before increasing it further.

Calculating line count

You can get the size of the file from FileSystemFileEntry.

For source code I want to see number of lines. It’s quite trivial to calculate:

/**
 * @param {Blob} f
 * @returns {Promise<number>}
 */
export async function lineCount(f) {
  if (f.size === 0) {
    // empty files have no lines
    return 0;
  }
  let ab = await f.arrayBuffer();
  let a = new Uint8Array(ab);
  let nLines = 0;
  // if last character is not newline, we must add +1 to line count
  let toAdd = 0;
  for (let b of a) {
    // line endings are:
    // CR (13) LF (10) : windows
    // LF (10) : unix
    // CR (13) : mac
    // mac is very rare so we just count 10 as they count
    // windows and unix lines
    if (b === 10) {
      toAdd = 0;
      nLines++;
    } else {
      toAdd = 1;
    }
  }
  return nLines + toAdd;
}

It doesn’t handle Mac files that use CR for newlines. It’s ok to write buggy code as long as you document it.

I also skip known binary files (.png, .exe etc.) and known “not mine” directories like .git and node_modules.

Small considerations like that matter.

Remembering opened directories

I typically use it many times on the same directories and it’s a pain to pick the same directory over and over again.

FileSystemDirectoryHandle can be stored in IndexedDB so I implemented a history of opened directories using a persisted store using IndexedDB.

Asking for permissions

When it comes to accessing files and directories on disk you can’t ask for forgiveness, you have to ask for permission.

User grants permissions in window.showDirectoryPicker() and browser remembers them for a while, but they expire quite quickly.

You need to re-check and re-ask for permission to FileSystemFileHandle and FileSystemDirectoryHandle before each access:

export async function verifyHandlePermission(fileHandle, readWrite) {
  const options = {};
  if (readWrite) {
    options.mode = "readwrite";
  }
  // Check if permission was already granted. If so, return true.
  if ((await fileHandle.queryPermission(options)) === "granted") {
    return true;
  }
  // Request permission. If the user grants permission, return true.
  if ((await fileHandle.requestPermission(options)) === "granted") {
    return true;
  }
  // The user didn't grant permission, so return false.
  return false;
}

If permissions are still valid from before, it’s a no-op. If not, the browser will show a dialog asking for permissions.

If you ask for write permissions, Chrome will show 2 confirmations dialogs vs. 1 for read-only access.

I start with read-only access and, if needed, ask again to get a write (or delete) permissions.

Deleting files and directories

Deleting files has nothing to do with showing line counts but it was easy to implement, it was useful so I added it.

You need to remember FileSystemDirectoryHandle for the parent directory.

To delete a file: parentDirHandle.removeEntry("foo.txt")

To delete a directory: parentDirHandle.removeEntry("node_modules", {recursive: true})

Getting bit by a multi-threading bug

JavaScript doesn’t have multiple threads and you can’t have all those nasty bugs? Right? Right?

Yes and no.

Async is not multi-threading but it does create non-obvious execution flows.

I had a bug: I noticed that some .txt files were showing line count of 0 even though they clearly did have lines.

I went bug hunting.

I checked the lineCount function. Seems ok.

I added console.log(), I stepped through the code. Time went by and my frustration level was reaching DEFCON 1.

Thankfully before I reached cocked pistol I had an epiphany.

You see, JavaScript has async where some code can interleave with some other code. The browser can splice those async “threads” with UI code.

No threads means there are no data races i.e. writing memory values that other thread is in the middle of reading.

But we do have non-obvious execution flows.

Here’s how my code worked:

get a list of files (async)
show the files in UI
calculate line counts for all files (async)
update UI to show line counts after we get them all

Async is great for users: calculating line counts could take a long time as we need to read all those files.

If this process wasn’t async it would block the UI.

Thanks to async there’s enough checkpoints for the browser to process UI events in between processing files.

The issue was that function to calculate line counts was using an array I got from reading a directory.

I passed the same array to Folder component to show the files. And I sorted the array to show files in human friendly order.

In JavaScript sorting mutates an array and that array was partially processed by line counting function.

As a result if series of events was unfortunate enough, I would skip some files in line counting. They would be resorted to a position that line counting thought it already counted.

Result: no lines for you!

A happy ending and an easy fix: Folder makes a copy of an array so sorting doesn’t affect line counting process.

The future

No software is ever finished but I arrived at a point where it does the majority of the job I wanted so I shipped it.

There is a feature I would find useful: statistics for each extensions.

How many lines in .go files vs. .js files etc.?

But I’m holding off implementing it until:

I really, really want it
I get feature requests from people who really, really want it

You can look at the source code. It’s source visible but not open source.

Ideas for replit bounties

2023-03-13T00:00:00Z

Apparently replit asks all Pro users about their thoughts.

As it happens, I have a lot of thoughts about how to improve Replit bounties.

Lower transaction costs

Currently the process is:

I post a bounty
one or more people apply
I select an applicant
they do the work
I accept or not

The back-and-forth between bounty creator and applicant is a transaction cost.

The smaller the bounty amount, the higher the cost as percentage of the job.

Choosing applicants is also arbitrary: there’s just not enough info to decide that one person is better than the other but I have to pick one.

Experiment with (optional) mode that works more like 99designs: i.e.:

I post a “first to complete wins” bounty
whoever completes the bounty first wins

There is potential for abuse: someone does the work and I don’t pay. Penalize people who do that, potentially banning them from posting bounties.

You would need a way for devs to provide feedback on bounty creators (and vice versa) and a human who reviews this and takes action.

Messy, yes, but “do things that don’t scale” (Paul G.)

This incentivize meritocracy: bounty makers want fast work and good devs who work fast will make more money.

Educate bounty posters

Most bounties are badly described: vague, not enough information, no clear acceptance criteria etc.

Write a concise “How to create a successful bounty” document (mostly based on what you see as good / bad practices in current bounties).

When creating a new bounty, provide a link to that article and in general promote it.

Topics to cover:

set pricing expectations (i.e. no $5 for a week of work)
examples of good and bad requirements, acceptance criteria etc. *

Purge obviously bad bounties

You want to establish a reputation:

with devs as a place to make money
bounty creators as a place to get work done quickly

It does you no good if there’s a $4 bounty to do weeks of work.

It’ll never be fulfilled, it creates a bad impression for both devs and bounty creators who understand market wages.

You should have a human who reviews new bounties and closes obviously bad ones.

He can use that to also educate i.e. link to the above “how to make a good bounty” article.

Don’t list cancelled bounties by default

Default list is “all” which includes cancelled bounties. Those are just noise.

Educate and penalize bad bounty creators

An example of bad bounty creator: someone who has applicants but doesn’t assign the bounty in reasonable amount of time.

Educate: have a bot that checks for that and e.g. if an applicant isn’t chosen in e.g. 3 days, send a message.

Penalize: e.g. lower their ranking in the list of bounties.

Reverse search: bounty creator looks for devs

In addition to devs picking bounties, allow bounty makers to pick devs.

Create a directory of devs where they list the technologies they know, what bounties they’ve completed, feedback from bounty creators so far, their availability etc.

Allow bounty makers “ping” them i.e. suggest that they are a good candidate for a given bounty.

Change how entering price works

Currently if I open a bounty and enter $200, I pay $200 and the dev gets $200 - replit cut.

As a result there are weirdly priced bounties, like $192.

It should be: if I enter $200 that’s what the dev makes and you charge me $200 + replit cut

Don’t dismiss “Create a Bounty” with outside click

I clicked outside by accident while in the middle of filling out a bounty and thought I lost what I wrote.

I didn’t, because you persist the state, but it’s not at all obvious how to get back.

Only an explicit “Cancel” or (“Save as draft”) should dismiss the dialog.

Also, give it more horizontal space. The most important text box is tiny.

Redesign discussions

There was a bounty where I had expertise in subject matter and knew it couldn’t possibly be done.

Currently the system discourages any feedback or discussion unless between selected applicant.

I get it: you don’t want low quality discussions but I think it’s a bad focus. Moderation can mitigate low quality.

Discussions should be more like stack overflow or discourse: comments are below the bounty description (not as a separate tab) and are encouraged (vs. discouraged currently).

Bounty creators should be able to moderate (hide / delete comments).

Applications and Discussion should be merged: an application is just a comment with “I apply for this job” attribute (e.g. a checkbox).

Monitor failed bounties

Look for cancelled / abandoned bounties.

Ask yourself: “why did this bounty failed” and “what could we do to increase possibility of success for this bounty”?

Your job is not to list bounties and collect the payments.

Your job is to make devs and bounty creators successful and that involves doing things that do no scale, like manually reviewing failed bounties and coming up with ideas on how to make them not fail.

Beyond bounties

Other ideas

Bad error message when replit has issues

I was using replit and I got an error message: “We’re either having technical difficulties or you’ve violated our tos. You can’t access your replit”.

As you can imagine “you’re a criminal” vs. “we’ve fucked up” is a very different message.

If you don’t know which scenario happened, you should fix it.

If you do, you should tell me exactly. “maybe we’ve taken away your access” is not confidence inspiring.

Go improvement

In Go, save should use goimports instead of plain go fmt. This automatically adds necessary imports.

A simple change that would be a significant improvement for Go developers.

Advanced markdown processing in Go

2023-03-11T00:00:00Z

Using gomarkdown/markdown library

This article describes an advanced markdown processing in Go using gomarkdown/markdown library.

All the code examples are available at https://github.com/gomarkdown/markdown/tree/master/examples

Basics first

Here’s a good start for markdown => HTML conversion:

func mdToHTML(md []byte) []byte {
	// create markdown parser with extensions
	extensions := parser.CommonExtensions | parser.AutoHeadingIDs | parser.NoEmptyLineBeforeBlock
	p := parser.NewWithExtensions(extensions)
	doc := p.Parse(md)

	// create HTML renderer with extensions
	htmlFlags := html.CommonFlags | html.HrefTargetBlank
	opts := html.RendererOptions{Flags: htmlFlags}
	renderer := html.NewRenderer(opts)

	return markdown.Render(doc, renderer)
}

Try it online

Basic markdown syntax is very limited and there are many extensions that provide additional feature, like tables.

HTML render is customizable as well.

Here we create markdown parser and HTML renderer with common flags plus some extensions.

Here are all available options for parser and HTML renderer:

https://pkg.go.dev/github.com/gomarkdown/markdown/parser#Extensions : they change how parser interprets markdown
https://pkg.go.dev/github.com/gomarkdown/markdown/html#Flags : they change generated HTML

Advanced processing

Markdown to HTML processing works like this:

github.com/gomarkdown/markdown/parser parses markdown and generates AST tree as defined in github.com/gomarkdown/markdown/ast
github.com/gomarkdown/markdown/html implements HTML renderer that takes AST tree and generates HTML

There are options for even more control:

customize HTML generator by providing html.Renderer.RenderNodeHook. You re-use most of the html.Renderer and change rendering of just some ast.Node.
fork github.com/gomarkdown/markdown/html and make changes you want to HTML renderer
modify ast tree after parsing but before rendering
customize the parser, define your own ast.Node types, add them to the tree while parsing and customize renderer to render those nodes as you want
pre-process markdown before sending to the parser

`ast.Node`

You need to understand AST tree. Start by skimming https://github.com/gomarkdown/markdown/blob/master/ast/node.go.

ast.Node is an interface so you can create your own nodes as long as you implement the interface.

The are two types of nodes:

container node has an array of children nodes e.g. a List contains ListItem nodes
a leaf node doesn’t have children, just content

We have ast.Leaf and ast.Container to make it easy to implement custom nodes:

type MyCustomLeafNode struct {
  ast.Leaf
  // .. additional data for this node
}

type MyCustomContainerNode struct {
  ast.Container
  // .. additional data for this node
}

To render your custom node ast.Node to HTML you provide a render hook function that will be structured like this:

func myRenderHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
  // you must render custom nodes because html.Renderer doesn't understand them
  if leafNode, ok := node.(*MyCustomLeafNode); ok {
    renderMyLeafNode(w, lefNode, entering)
    return ast.GoToNext, true
  }

  if containerNode, ok := node.(*MyCustomContainerNode); ok {
    renderMyContainerNode(w, containerNode, entering)
    return ast.GoToNext, true
  }

  // you can also over-ride rendering of some specific nodes that html.Renderer would render
  if image, ok := node.(*ast.Image); ok {
    renderImage(w, image, entering)
    return ast.GoToNext, true
  }

  // return false to tell html.Renderer to use default render
  return ast.GoToNext, false
}

You should always return ast.GoToNext.

You return true to indicate that you’ve rendered the node. Returning false tells html.Renderer to use default rendering.

For container nodes we typically need to render something before rendering children and after rendering children.

That’s why we need entering argument. The simplest rendering of *ast.Paragraph would be:

func myRenderParagraph(w io.Writer, p *ast.Paragraph, entering bool) {
  if entering {
    io.WriteString(w, "<p>")
  } else {
    io.WriteString(w, "</p>")
  }
}

html.Renderer takes care of recursively rendering children.

For ast.Leaf nodes you only render on entering:

func myHr(w io.Writer, p *ast.HorizontalRule, entering bool) {
  if entering {
    io.WriteString(w, "<hr/>")
  }
}

Skim render.go to see how different nodes are rendered to HTML.

Customizing HTML renderer

To re-use most of html.Renderer but only over-ride rendering of a few nodes, you can provide a render hook.

Here’s an example of a simple hook that renders <div>{children}</div> instead of <p>{children}</p> for a *ast.Paragraph node.

// an actual rendering of Paragraph is more complicated
func renderParagraph(w io.Writer, p *ast.Paragraph, entering bool) {
  if entering {
    io.WriteString(w, "<div>")
  } else {
    io.WriteString(w, "</div>")
  }
}

func myRenderHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
  if para, ok := node.(*ast.Paragraph); ok {
    renderParagraph(w, para, entering)
    return ast.GoToNext, true
  }
  return ast.GoToNext, false
}

func newCustomizedRender() *html.Renderer {
  opts := html.RendererOptions{
    RenderNodeHook: myRenderHook,
  }
  return html.NewRenderer(opts)
}

Try it online

If a render hook needs access to more information than io.Writer and ast.Node, we can capture it in a closure:

import (
  "github.com/gomarkdown/markdown/html"
)

type renderData struct {
  // ... data needed for render hook function
}

func makeRenderHook(data *renderData)  html.RenderNodeFunc {
  return myRenderHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
    // has access to data
  }
}

func newCustomizedRender() *html.Renderer {
  data := &renderData{}
  opts := html.RendererOptions{
    RenderNodeHook: makeRenderHook(data),
  }
  return html.NewRenderer(opts)
}

Modify ast tree

The structure of the code would be:

func modifyAst(root ast.Node) ast.Node {
  // ... tweak AST tree as needed
  return root
}

var mds = `[link](http://example.com)`

func modifyAstExample() {
	md := []byte(mds)

	extensions := parser.CommonExtensions
	p := parser.NewWithExtensions(extensions)
	doc := p.Parse(md)

	doc = modifyAst(doc)

	htmlFlags := html.CommonFlags
	opts := html.RendererOptions{Flags: htmlFlags}
	renderer := html.NewRenderer(opts)
	html := markdown.Render(doc, renderer)

	fmt.Printf("-- Markdown:\n%s\n\n--- HTML:\n%s\n", md, html)
}

You can re-arrange the tree: add, remove, rearrange nodes or add / remove information from nodes.

Working with trees is tricky so you’ll probably use ast.Print(io.Writer, ast.Node) to pretty-print and understand AST tree, both before and after making changes.

I won’t cover changing the structure of the tree but here’s an example that adds target="_blank" attribute to link (<a>) nodes that go outside of my website and adds custom blog-img class to all image (<img>) nodes.

func modifyAst(doc ast.Node) ast.Node {
	ast.WalkFunc(doc, func(node ast.Node, entering bool) ast.WalkStatus {
		if img, ok := node.(*ast.Image); ok && entering {
			attr := img.Attribute
			if attr == nil {
				attr = &ast.Attribute{}
			}
			// TODO: might be duplicate
			attr.Classes = append(attr.Classes, []byte("blog-img"))
			img.Attribute = attr
		}

		if link, ok := node.(*ast.Link); ok && entering {
			isExternalURI := func(uri string) bool {
				return (strings.HasPrefix(uri, "https://") || strings.HasPrefix(uri, "http://")) && !strings.Contains(uri, "blog.kowalczyk.info")
			}
			if isExternalURI(string(link.Destination)) {
				link.AdditionalAttributes = append(link.AdditionalAttributes, `target="_blank"`)
			}
		}

		return ast.GoToNext
	})
	return doc
}

Try it online

Important parts:

ast.WalkFunc is a helper function that recursively calls a callback function on every node in AST
we do modifications only once e.g. when entering is true
all Container nodes have (optional) *Attribute which contains HTML id attribute, class names and attributes, which we can modify
some nodes can have additional data we can modify e.g. *ast.Link has AdditionalAttributes which is array of attributes.

type Attribute struct {
	ID      []byte
	Classes [][]byte
	Attrs   map[string][]byte
}

Custom markdown parser, custom `ast.Node`

You can also extend parser to recognize additional syntax, not present in markdown.

Here’s an example of a parser extension that recognizes the following:

:gallery
/img/image-1.png
/img/image-2.png
/img/image-3.png

Rest of document.

A gallery is a list of image urls. In generated HTML this will be a show as an image gallery.

First define a custom leaf node type:

type Gallery struct {
	ast.Leaf
	ImageURLS []string
}

Then customize parser with a block parsing hook function which gets first dibs at parsing block text.

Block means: it only gets called at the beginning of each text block. Text blocks are separated by newlines.

It’s not possible to do custom inline text parsing.

If parser hook function recognizes its custom syntax, it returns ast.Node it generated (Gallery in this case), number of bytes consumed and inner content to be recursively parsed and inserted as children of ast.Container node.

Number of bytes consumed allows parser to skip the part parsed by your custom hook.

func parserHook(data []byte) (ast.Node, []byte, int) {
	if node, d, n := parseGallery(data); node != nil {
		return node, d, n
	}
	return nil, nil, 0
}

func newMarkdownParser() *parser.Parser {
	extensions := parser.CommonExtensions
	p := parser.NewWithExtensions(extensions)
	p.Opts.ParserHook = parserHook
	return p
}

Here’s the parser for :gallery syntax.

If current text starts with :gallery\n we expect a list of urls followed by an empty line.

var gallery = []byte(":gallery\n")

func parseGallery(data []byte) (ast.Node, []byte, int) {
	if !bytes.HasPrefix(data, gallery) {
		return nil, nil, 0
	}
	i := len(gallery)
  // find empty line that ends the block
  // TODO: should also consider end of document
	end := bytes.Index(data[i:], []byte("\n\n"))
	if end < 0 {
		return nil, data, 0
	}
	end = end + i
	// split into lines, each line is an image URL
	lines := string(data[i:end])
	parts := strings.Split(lines, "\n")
	res := &Gallery{
		ImageURLS: parts,
	}
	return res, nil, end
}

Try it online

I will not cover how it gets rendered as HTML because it’s a very custom solution with lots of HTML obscuring the big picture.

Syntax highlighting

Code blocks are much better with syntax highlighting.

We can use github.com/alecthomas/chroma library to generate HTML with syntax highlighting for many languages.

We hook it up to HTML renderer with render hook function like described above.

import (
	"fmt"
	"io"

	"github.com/gomarkdown/markdown"
	"github.com/gomarkdown/markdown/ast"
	mdhtml "github.com/gomarkdown/markdown/html"

	"github.com/alecthomas/chroma"
	"github.com/alecthomas/chroma/formatters/html"
	"github.com/alecthomas/chroma/lexers"
	"github.com/alecthomas/chroma/styles"
)

var (
	htmlFormatter  *html.Formatter
	highlightStyle *chroma.Style
)

func init() {
	htmlFormatter = html.New(html.WithClasses(true), html.TabWidth(2))
	if htmlFormatter == nil {
		panic("couldn't create html formatter")
	}
	styleName := "monokailight"
	highlightStyle = styles.Get(styleName)
	if highlightStyle == nil {
		panic(fmt.Sprintf("didn't find style '%s'", styleName))
	}
}

// based on https://github.com/alecthomas/chroma/blob/master/quick/quick.go
func htmlHighlight(w io.Writer, source, lang, defaultLang string) error {
	if lang == "" {
		lang = defaultLang
	}
	l := lexers.Get(lang)
	if l == nil {
		l = lexers.Analyse(source)
	}
	if l == nil {
		l = lexers.Fallback
	}
	l = chroma.Coalesce(l)

	it, err := l.Tokenise(nil, source)
	if err != nil {
		return err
	}
	return htmlFormatter.Format(w, highlightStyle, it)
}

// an actual rendering of Paragraph is more complicated
func renderCode(w io.Writer, codeBlock *ast.CodeBlock, entering bool) {
	defaultLang := ""
	lang := string(codeBlock.Info)
	htmlHighlight(w, string(codeBlock.Literal), lang, defaultLang)
}

func myRenderHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
	if code, ok := node.(*ast.CodeBlock); ok {
		renderCode(w, code, entering)
		return ast.GoToNext, true
	}
	return ast.GoToNext, false
}

Try it online

You’ll have to include chroma CSS as HTML generation marks up nodes with chroma CSS classes.

Pre-process markdown before parsing

Imagine you want to add ability to include markdown files.

You can build a parser extension that e.g. recognizes this syntax:

@include "foo/bar.md"

It’ll get complicated if you want to add more advanced functionality, like loops, variables etc.

But template/text library already implement such features.

You can pre-process markdown with one of the many templating Go libraries before sending it to the parser.

Persisted Svelte store using IndexedDB

2023-03-09T00:00:00Z

I’m working on notepad2 for web and I need a history of opened files that persists across browser session.

Since I’m using Svelte, having it available as a store makes sense.

This article describes how to implement a Svelte store whose values are persisted in IndexedDB.

What is Svelte store?

The simplest Svelte store is an object with subscribe function.

You call subscribe to provide a callback function that will be called when the value changes:

let foo = makeSvelteStoreFoo();
foo.subscribe((v) => {
    console.log("New value of foo is:", v);
})

Svelte provides a nicer, less verbose syntax to use stores:

let foo = makeSvelteStoreFoo();
console.log("Value of foo is:", $foo);

Under the covers Svelte calls subscribe and makes it so $foo returns the latest value.

The $foo value is reactive so if you use it in a component like <div>{$foo}</div>, it’ll be automatically re-rendered when the value changes.

Creating a store

Typically stores are created by a function returning a store object:

function makeSvelteStoreFoo() {
    function subscribe(subscriber) {
        ...
        function unsubscribe() { ... }
        return unsubscribe;
    }
    return {subscriber: subscriber};
}

Function subscribe returns unsubscribe function to call when you no longer need to observe the changes to the value.

Svelte does that for you if you use $foo syntax.

Writable store

The above store is read-only: it provides values but you can’t change it.

To make the store writable you also need to return set(newValue) function.

function makeSvelteStoreFoo() {
    function subscribe(subscriber) { ... }

    function set(newValuse) { ... }
    return {subscriber: subscriber, set: set};
}

Global store vs. multiple instances of store

Some stores can have multiple instances. In that case you’ll export function makeSvelteStoreFoo() and call it to get a new instance of the store.

Some stores are global and only should have one shared instance. In that case you’ll only export the single global instance:

function makeSvelteStoreFoo() { ... }

export let foo = makeSvelteStoreFoo();

Persisted store with values backed by IndexedDB

If you want your store to survive across browser sessions, you need to persist it somehow.

I like IndexedDB and lightweight idb helper library.

IndexedDB supports more data types than localStorage, which only handles strings.

I create a single database for all my key-value needs and have each Svelte store persist values under unique database key.

Here’s a structure of such store:

function makeStore() {
  function set(v) {}

  function subscribe(subscriber) {
    function unsubscribe() {}
    return unsubscribe;
  }

  return { set, subscribe };
}

Let’s fill out the implementation.

First, we need a db for backing store:

import { KV } from "../dbutil";
const db = new KV("np2store", "keyval");

This is my global database for all key-value data, including my persisted Svelte store. See below for implementation of KV.

We use a unique database key for each store.

We have a variable that keeps the value in memory, which is more efficient that re-reading from database.

When creating a store we read the initial value from the database:

function makeStore() {
  const dbKey = "browse-folders";
  let curr = [];

  db.get(dbKey).then((v) => {
    curr = v || [];
    broadcastValue();
  });
}

The initial value is [] because this store happens to store an array so we want the right “unset” value.

broadcastValue informs all subscribers about the change in value of the store:

function makeStore() {
  const subscribers = new Set();

  function broadcastValue() {
    subscribers.forEach((cb) => cb(curr));
  }
}

We store subscriber callback functions in a Set. Here’s how subscribe() is implemented:

function makeStore() {
  const subscribers = new Set();

  function subscribe(subscriber) {
    subscriber(curr);
    subscribers.add(subscriber);
    function unsubscribe() {
      subscribers.delete(subscriber);
    }
    return unsubscribe;
  }
}

The contract for Svelte store is that upon subscription we need to synchronously call susbscriber callback to immediately provide the current value.

And finally the set(newValue) function:

function makeStore() {
  function set(v) {
    curr = v;
    broadcastValue();
    db.set(dbKey, v);
  }
}

Changes in another browser tab

What if the value is changed in another browser tab?

If you open the same website twice in different tabs they both write to the same underlying database but they don’t know about each other changes and would over-write data changed by the other instance.

Turns out localStorage has a feature that allows us to fix that: we can monitor all changes to localStorage even if they are made by an instance in another tab.

I use unique localStorage value to notify all tabs about changes in our store.

function makeStore() {
  const dbKey = "browse-folders";
  const lsKey = "store-notify:" + dbKey;

  function getCurrentValue() {
    db.get(dbKey).then((v) => {
      curr = v || [];
      broadcastValue();
    });
  }

  getCurrentValue();

  /**
   * @param {StorageEvent} event
   */
  function storageChanged(event) {
    if (event.storageArea === localStorage && event.key === lsKey) {
      getCurrentValue();
    }
  }
  window.addEventListener("storage", storageChanged, false);

  function set(v) {
    curr = v;
    broadcastValue();
    db.set(dbKey, v).then((v) => {
        // notify other browser tabs
        const v = +localStorage.getItem(lsKey) || 0;
        localStorage.setItem(lsKey, `${v + 1}`);
    });
  }
}

Let’s dissect this tricky line:

const v = +localStorage.getItem(lsKey) || 0;

To ensure the value changes, we implement a numeric counter. localStorage stores strings, so we need to convert string => number on reading and number => string on writing.

+foo converts whatever foo is to a number. If foo is a string "42", +foo returns number 42.

For non-number strings it returns NaN (a special number value indicating Not A Number).

We do NaN || 0 to convert NaN to 0.

Factory function to easily create stores

You could now piece together the whole implementation and create your own stores based on this template.

Then you would notice that they share most of the code. The differences are:

database key used to persist the value
initial value

We can abstract this into a factory function that creates the store for a given key and initial value.

function makeIndexedDBStore(dbKey, initialValue, crossTabSync) { ... }

export foo = makeIndexedDBStore("foo", 0, false);
export bar = makeIndexedDBStore("bar", [], true);

Here’s our makeIndexedDBStore():

/**
 * Create a generic Svelte store persisted in IndexedDB
 * @param {string} dbKey unique IndexedDB key for storing this value
 * @param {any} initialValue
 * @param {boolean} crossTab if true, changes are visible in other browser tabs (windows)
 * @returns {any}
 */
function makeIndexedDBStore(dbKey, initialValue, crossTab) {
  function makeStoreMaker(dbKey, initialValue, crossTab) {
    const lsKey = "store-notify:" + dbKey;
    let curr = initialValue;
    const subscribers = new Set();

    function getCurrentValue() {
      db.get(dbKey).then((v) => {
        curr = v || [];
        subscribers.forEach((cb) => cb(curr));
      });
    }

    getCurrentValue();

    /**
     * @param {StorageEvent} event
     */
    function storageChanged(event) {
      if (event.storageArea === localStorage && event.key === lsKey) {
        getCurrentValue();
      }
    }
    if (crossTab) {
      window.addEventListener("storage", storageChanged, false);
    }

    function set(v) {
      curr = v;
      subscribers.forEach((cb) => cb(curr));
      db.set(dbKey, v).then((v) => {
        if (crossTab) {
          const n = +localStorage.getItem(lsKey) || 0;
          localStorage.setItem(lsKey, `${n + 1}`);
        }
      });
    }

    /**
     * @param {Function} subscriber
     */
    function subscribe(subscriber) {
      subscriber(curr);
      subscribers.add(subscriber);
      function unsubscribe() {
        subscribers.delete(subscriber);
      }
      return unsubscribe;
    }

    return { set, subscribe };
  }
  return makeStoreMaker(dbKey, initialValue, crossTab);
}

KV store

Here’s a helper class for key-value store using idb library.

import { openDB } from "idb";

export class KV {
  dbName;
  storeName;
  dbPromise;

  constructor(dbName, storeName) {
    this.dbName = dbName;
    this.storeName = storeName;
    this.dbPromise = openDB(dbName, 1, {
      upgrade(db) {
        db.createObjectStore(storeName);
      },
    });
  }

  async get(key) {
    return (await this.dbPromise).get(this.storeName, key);
  }
  async set(key, val) {
    return (await this.dbPromise).put(this.storeName, val, key);
  }
  async del(key) {
    return (await this.dbPromise).delete(this.storeName, key);
  }
  async clear() {
    return (await this.dbPromise).clear(this.storeName);
  }
  async keys() {
    return (await this.dbPromise).getAllKeys(this.storeName);
  }
}

Find programming work by increasing luck surface area

2022-06-29T00:00:00Z

Antonio asked on HN: How do I earn a small amount of money to sustain myself as a developer?

I wrote a response centered around increasing luck surface area.

This essay expands on it because I’ve seen this a few times now: good developers asking how to find work while botching the basics.

First a caveat: it only works if you are a good programmer.

I’ve looked at Antonio’s GitHub account and his website and I’m convinced he is a good programmer.

He created 3 non-trivial Mac OS apps, written in web technologies / React and packaged with Electron.

He wrote non-trivial open-source React libraries.

His code looks good.

He made good looking personal website and websites for this apps.

A developer writing good code with proven ability to ship products should be in high demand.

So why isn’t there a line of companies waiting to hire him?

It’s the basics

I believe that many capable people don’t succeed as much as they could not because they fail to do something brilliant but because they fail the basics.

The basics they are not aware of. The unknown (to them) unknowns.

Basics for programmers looking for freelance / consulting jobs

If you’re a programmer looking for a freelance or consulting job, what are the absolute basics?

A potential employer should be able to learn about what you offer, at what price and why you’re the right person for their job.

Therefore you should have a decent website with a page that describes what work you can do, how much you charge and a proof that you’re good at what you do.

I’m no longer looking for work, because I’m currently working on Filerion, a web-based file manager for online storage like Dropbox and S3.

When I was looking for a job, I was promoting I’m a Go consultant for hire page.

I’m not claiming it’s the best self-promoting page ever.

I’m claiming that you get 80% of value from having a decent page in the first place. The remaining 20% would be in refining your pitch.

The important basics of such page:

it should be short and concise; this is a decent business proposal not a novella
a way to contact you i.e. your e-mail address
describe what you can do. In my case it was programming in Go
provide proof that you’re good at it. In my case I linked to my past writing about Go, my open source Go libraries and a Go contract I successfully completed
social proof (list of well-known companies I worked at)
my time zone because it’s important in remote work
and a photo for that human connection

Antonio created more than enough stuff to show that he is a capable developer. He should create a similar web page pitching himself as a web or React or Mac developer.

Here are other people’s pages:

Increasing luck surface area

You went to a React meetup, started a conversation with a random person. You talked about your React project, that random person needed a React developer and you got a contract gig out of that conversation.

That’s luck.

I wouldn’t recommend striking conversations with random people at meetups as the best way for finding programming jobs.

You can’t control luck but you can increase your luck surface area.

Did you notice how few paragraphs above I mentioned Filerion, a web-based file manager for online storage like Dropbox and S3?

It’s an example of increasing my luck surface area of promoting Filerion.

Oops, I did it again. It’s getting recursive in here.

This post started as a comment on HN and I decided to turn it into a blog post. I believe it’s useful for people so I’ll promote it on HN when completed. It might or might not get upvoted but I’ll increase my luck surface area by posting it there.

Most of the things will not lead to job offers but all you need is one job to make this worthwhile.

Listen to Wayne Gretzky: you miss 100% of the shots you don’t take.

Here are few ideas for increasing luck surface area or programmers looking for a freelance job:

make a link to your “hire me” page very visible on your website. Don’t be afraid to make it “in your face” big. It should be on every page of your website and it should not be missed
put your pitch in every place that allows it. Just a short “Hire me for X. ”. Where to put it?
- your Hacker News profile
- your Twitter profile
- your GitHub profile
- your linkedin profile
- your reddit profile
- a footnote on every page on your website
- readme of every open source project you publish
if appropriate, in your social comments. Every week there’s a HN discussion about terrible hiring practices in software. I’m sure you could make non-spammy comments that includes “BTW: if you’re looking for a expert X programmer, I’m available ”
- all that is low probability but also low effort and it does increase your luck surface area
research or freelancing / remote jobs platforms and if they fit your desired job, create an account and start applying for jobs:
- upwork
- fiverr
- remoteok.com
- codementor
- every jobs site and forum
- sub-reddits like https://www.reddit.com/r/javascriptjobs/
- craigslist has a jobs section
- use google to find more freelancing / contracting websites

Have a process

So you did all the low effort stuff described above. What’s next?

One of the things that separates professionals form amateurs is having a process.

Marketing yourself as a contractor is no different that marketing anything else:

create leads
convert leads into clients

The “hire me” web page is how you convert leads into clients.

How do you get more leads?

Professional marketers have a process.

A set of known techniques for generating leads: buy tv ads, but newspaper ads, buy online ads, pay influencers to promote your stuff, throw a conference, publish a white paper, promote on social media.

Professional marketers are not paid because only they know how to buy ads on Google.

They are paid for knowing wide array of techniques, knowing which technique applies in a given context and simply spending hours doing the work and being better at it that someone who isn’t doing it full time.

Here’s how programmers looking for a programming jobs can generate leads.

Publish technical articles

Write articles on technologies you’re an expert on and promote them on relevant social media.

The process is:

write an article and publish on your website (or medium or dev.to or all the above)
the article must create value. Self promotion can only be a cherry on top, not the sundae
tweet about it
promote on relevant forums and websites

I used to write articles about Go and promote them on r/golang. Some of them would end up on Hacker News even without me submitting them.

Publish open source packages

More time consuming that writing an article but if you’re a full time programmer, especially in JavaScript / React ecosystem, you can often extract bits and pieces of functionality as open source packages.

In package readme link to your “hire me” web page.

Provide value, increase your luck surface area.

Don’t forget to promote the packages the same way you promote articles (tweet about it, post in relevant reddit groups etc.).

Build online tools

Build a tool / program that provides value and link to your “hire me” web page.

This takes more effort than writing an article but if it provides value, it will be a constant source of new leads.

See this linear interpolator for an example of a tool that wasn’t hard to build and has a good call to action.

Turn experience into lead generating artifacts

All this seems like a lot of work.

If you’re a full-time programmer you’re doing a lot of programming.

That work is experience.

You can sometimes cheaply turn that experience into artifacts.

Here are examples of how I turned experience into lead generating artifacts:

I did a contract job porting a large Java library to Go. I spent 600 hours on it. After the contract was finished I spend a few hours more to write Lessons learned porting 50k loc from Java to Go article. It was very popular on both Reddit and Hacker News. I spent 600 hours doing work and gaining valuable experience. The cost to writing article was, comparatively speaking, very low. Just a few hours to get thousands of people to read it. You can see that majority of my recent articles follows this pattern of turning experience into articles
I had an idea of using Notion as CMS for my blog. Notion at the time didn’t offer the API so I reverse engineered their protocol and wrote Go code to build a blog out of content in Notion. It took many, many hours. I spent a fraction of that time to generate lead generation artifacts from that experience
- an article about the process of reverse engineering the API
- an open-source library for accessing Notion API. Again, I already spent many hours doing the work of reverse engineering and writing the code. It took relatively small amount of additional hours to extract that into an open source library

Look back at your own past programming work.

Can you write an article about something your did?

Can you extract part of the code as an open-source library?

Can you summarize your experience and provide insight?

Goto 1

Notice that this is now a process: a set of actions you can repeat over and over again.

There’s an infinite number of useful articles, open source packages or online tools that you can create.

The only limit to how many leads you can generate using this process is the number of hours you’re willing to spend.

Don’t do it just once.

Set aside a certain number of hours per week dedicated to marketing yourself.

A decent goal would be to write and promote one article per week.

Stay focused

If you’re selling yourself as a Go programmer, all your articles, open source packages and tools should be related to Go.

Not Java, not Rust. Until you have more job offers than you can handle, everything you write should plausibly generate leads for your Go software contracting.

I’m sure many people will look at the above suggestions and say: that sounds like a lot of work, I don’t have that kind of time.

Many of those people browse Twitter for hours, write Hacker News comments that will be forgotten in a day etc.

Just recently a gentleman posted their super busy schedule of a working programmer and a father as a comment on Hacker News. He was trying to show how it would be impossible for him to do a take home coding assignment when looking for a job due to lack of time.

I’m not defending take home assignments but somehow his schedule didn’t include the time he wasted reading and commenting on Hacker News. And I’m pretty sure there’s a little bit of Twitter or Reddit or Facebook usage in there as well. Not to mention the 3 hours of tv watching of an average American adult.

A lot of people could wrangle a few hours a week by cutting down on social media or tv and use those hours for a focused self promotion activities.

Job search as a problem to be solved

We, the developers, are a problem solving machine.

We encounter problems daily and we solve them.

How do we solve them? Mostly by formulating the right question and entering it, verbatim, into a search engine.

For Filerion, my web-based file manager for online storage like Dropbox and S3 I needed to implement file upload via drag & drop.

I knew it’s possible because I’ve used many websites that are doing it but I had no clue how to do it.

How did I figure it out? I asked Google: “javascript drag and drop file”.

What do you know: plenty of people have already solved that problem and wrote tutorials. All I had to do is to follow them and do the work.

A simple matter of asking the right question

This is my foolproof system of solving any problem:

formulate a question
enter it into a search engine

I’m only half joking. 80% of the work is asking the right question.

Antonio did the first step by formulating the question and asking on Hacker News.

He got a bunch of good suggestions, but he would get even more simply by entering similar questions into a search engine.

“How to get a freelance job”. “successful freelancer”. “successful contracting”. “increase contracting rates”. “successful upwork freelancing”.

This is just a sampling of possible questions related to getting a job as a freelancer / contractor / consultant.

Specialize

It’s counter-intuitive but you should specialize.

When you’re looking for a programming contract it’s better to narrow down your pitch.

“Backend developer” is better than “Developer” or “Full-stack developer”

“Go backend developer” is better than “Backend Developer” or “Go developer”.

“MySQL query optimization expert” is better than “Go backend developer”.

It’s counter-intuitive because why would you want to appeal to less potential employers?

If there is a million jobs for Java and a million jobs for JavaScript, isn’t it better to pitch yourself as a Java OR JavaScript developer?

No, it isn’t.

You only need one job. One employer.

Someone looking for a Java developer will likely pick someone who pitches themselves as expert in Java, not a Java / JavaScript expert.

Same for someone looking for a JavaScript developer.

Your pitch targets less potential employers but it has a much greater chance of appealing to that particular employer.

Furthermore, if you have 10 hours to invest, it’s better to invest 10 hours into showcasing you JavaScript skills than split into 5 hours of showing Java skills and 5 hours of showing JavaScript skills.

Level up

There’s an abundance of free or cheap advice on improving your freelancing game.

Blog posts, books, YouTube videos, Udemy courses, podcasts.

Just today on HN there’s a link to an interview with successful consultant.

You don’t have to wait for material like this to serendipitously show up on HN.

Formulate a question and enter it into a search engine or YouTube or Amazon website (for books) or podcast search engine.

Kaizen

The Japanese have the best words. One of those words is kaizen. The idea of continuous improvement.

The above suggestions might be overwhelming and seem like too much work to do at once.

Start with the basics: a “hire me” web page linked from your Twitter / GitHub etc. profiles.

Then gradually create more lead generation artifacts.

Write one blog post. Then another one. And another one. Kaizen, my friend.

I will coup whoever I want

This is my website so I can shill whatever I want.

I’m working on Filerion, a web-based file manager for online storage services like Dropbox and S3.

I’m working in the open and documenting the process of creating a software product from scratch on a day to day basis.

Follow along if you want to know what does it take to build a software product. Technologies used, marketing activities and general musing related to programming and productivity.

Extreme #include discipline for C++ code

2022-04-12T00:00:00Z

C++ takes long to compile

There is more than one reason for it but one of the reasons is excessive re-parsing of the same .h header files.

In SumatraPDF I’m using an extreme #include discipline to keep compilation times in check.

The rule is simple: a .h file cannot #include other .h files.

I didn’t come up with this idea, I got it from Rob Pike: http://doc.cat-v.org/bell_labs/pikestyle

I’ve been following this rule for several years in SumatraPDF, a medium sized C++ project of over 100k loc. It works.

“It works” is more important that it seems. Many ideas seem great on paper but fail in practice. Name an economically successful communist country.

Don’t get me wrong: the price of minimizing compilation times is eternal vigilance.

Writing C++ while following that rule is annoying.

In code, things depend on other things. If a struct in foo.h depends on struct in bar.h a quick fix is to #include "bar.h" in foo.h.

You do it once and it works

Done once and for all: in your foo.c you just include foo.h and it brings in bar.h.

That convenience comes with a hidden price. Imagine you have foo2.h that also depends on bar.h so you also #include "bra.h" in foo2.h.

You then #include "foo2.h in foo.c and bang! You just included and parsed bar.h twice.

In real C++ codebases the same headers are unnecessarily re-included and re-parsed hundreds of times.

It’s a known problem. We try to mitigate it with #ifdef guards, #pragma once etc. but in my experience those band-aids don’t solve the problem.

Following Rob Pike’s rule we must #include "bar.h" and foo.h and foo2.h in foo.c in correct order.

The “correct order” part is what makes it annoying.

Let’s face it: a month after writing foo.h I no longer remember that it depends on bar.h.

So the way it goes is:

I #include "foo.h" in brand_new.cpp file
I get a compilation error what is this Bar you're referring to?
I dig around and figure out that Bar is a struct defined in bar.h so I #include "bar.h" before foo.h
I get another compilation error what is that Bar2 you speak of?. This could be unmet dependency from foo.h or newly included bar..h
I keep adding 10 more #include to satisfy their cascading dependencies

What used to be a simple #include "foo.h" can end up a lengthy game or #include whack-a-mole.

So beware: following this extreme rule will be occasionally painful.

I wasn’t following this rule from the beginning. A refactor of SumatraPDF code to follow it was painful.

I find this price is worth paying and not just because of shorter compilation times.

It also forces me to design better (simpler) dependencies.

Entropy is real. Complexity grows but our heads remain small.

In large programs you have hundreds of structs, classes, functions, enums and they form a complex web of dependencies.

It’s way too much to fully understand at once so we get sloppy, we take shortcuts just to get that damn thing to compile.

Over time the sloppiness accumulate and we might end up with inter-dependent, circular mess. You just want to #include "Button.h" and somehow it ends up bringing in NuclearPowerPlant.h

I did that in my own code. Once things get tangled, it’s really hard to untangle them.

The chaos wins.

Don’t let chaos win. Be control.

I don’t think I’ve ever seen any C++ code bases that follows this rule.

This makes me either a madman or a genius.

An idea for reducing compilation times that has more awareness (but also not much adoption in actual code bases) is impl idiom.

I’m not using it because it requires writing more code. That is not a price I’m willing to pay.

@levelsio and survivorship bias

2021-10-20T00:00:00Z

Pieter Levels is a prolific maker of software.

He’s also very successful maker of software: he’s close to making $1.5 million a year from his business, almost all of it profit.

Almost all of it is his profit since for most of the time he was a sole developer / marketer / copy writer, with a part-time sysadmin for ensuring the server stays alive. Recently he hired customer support and chat moderator.

This level of success attracts attention

It also attracts inevitable claims of survivorship bias. The gist of it is: if you do things he did, you won’t have the same level of success.

I think it’s a very defeatist attitude. Life is hard, there is no step-by-step guide to $1.5 million a year business.

You can, however, do things right or you can do them badly.

To create a successful business you need to do more things right than badly.

Let’s call it Do Things Right bias.

Let’s dissect Peter Level’s journey to see what he did right and how that differed from doing things badly.

I’ve been following Pieter for a long time and he created a large body of tweets and blog posts, a book, a few youtube talks, most of it related to his journey from making $0 a year to $1.5 million a year.

People say you can learn from other people’s mistakes. It’s even better to learn from other people’s successes.

What did Pieter do right?

Pieter ships products

Here’s a list of his projects going back 11 years. Limiting the list to only software products:

2010: Uber clone
2012 dating site for college campuses, YouTube network
2013: YouTube analytics
2014: Slack community for digital nomads, GIF Book, Nomad Jobs, Nomad List 1.0, Go Fucking Do It, Play My Inbox
2015: Startup Retreats, Taylor Bot (a Telegram bot), Nomad List 2.0, Remote OK
2016: Virtual Reality, Places to Work, learning 3D modelling
2017: Nomad List 3.0, Nomad Gear, Mute, Hoodmaps
2018: Nomad List FIRE Calculator, MAKE Book, Maker Rank, No More Google
2019: Bali Sea Cable, Nomad List 5, How Much Is My Side Project Worth
2020: QR Menu Creator, IdeasAI, Remote OK Workers, Airline List, Nomad List Climate Finder
2021: Rebase, Inflation Chart, MAKE Book NFT,

You can see him building a product from first line of code to frontpage of Reddit

Doing it badly

Not shipping products.

Commenting on Hacker News how you couldn’t possibly take away from your precious time commenting on Hacker News and write some code instead.

Pieter is persistent

He’s clearly persistent.

His overnight success was 11 years in the making.

A lot of his projects failed but he didn’t give up.

He kept making more.

He creates new things even when he doesn’t have to because Nomad List and RemoteOk bring more money that he needs.

Doing it badly

Abandoning all work after first failure and using survivorship bias as a convenient excuse.

Pieter understands compund interest

If you have a profitable business, there are 2 ways to make even more money:

start a new business
sell more stuff to your existing customers

The second option tends to be easier and more reliable way to increate revenue and profits.

The hardest thing in business is getting attention of potential customers.

It’s much easier to get attention of customers you already have.

While Pieter starts new projects, he’s smart enough to mostly do things related to hist most successful business: Nomad List.

He keeps improving Nomad List.

He built jobs website for remote workers and does other related projects like Nomad List Climate Finder.

This is similar to the magic of compounded interest in investing.

Doing it badly

Not doubling down on things that work. Chasing distractions.

Pieter abandons projects that are not working

You need to be persistent, but in a smart way.

You’ll know when you have a product market fit.

Nomad List started as a Google Spreadsheets that Pieter created for himself and accidently left unprotected.

Within days people were adding more information to it. That was a clear sign of interest so he turned it into a website.

All projects start as weakling babies that need to be initially nurtured to health. But if they never manage to get up on their feet, you have to throw them in the river.

Ok, that was exceedingly bad analogy.

Doing it badly

Sticking with a product that doesn’t perform instead of creating a new project that just might.

Pieter chargers money

Freemium is a popular tactic for getting users.

Many people find it psychologically hard to charge for software.

Those are potential reasons why people don’t charge for their products.

Not Pieter. He asks for money early and he asks for a lot. His book is $49 (and I bought it). Nomad List is $199.

The only real validation is people paying for your product

Doing it badly

Not charging at all, not charging early, not charging enough.

Pieter ships quickly

NomadList started as a Google Docs Spreadsheet.

Pieter turned it into a simple website and made that simple website available immediately.

He then kept working on it and improving it.

This is true of all his projects: ship the smallest thing that provides value, promote it, double-down on things that are working.

Doing it badly

Spending a year writing that web app without showing it to anyone.

Pieter does a lot of marketing and promotion

From early days he was promoting his stuff on his blog, Hacker News, Product Hunt, Reddit, Twitter etc.

There’s a thin line between promotion and spamming. Don’t cross it.

Most people, especially people good at programming, neglect promotion and marketing.

Some say that you should spend as much time on marketing as you do on coding the product.

I don’t know what the right amount is but it’s not zero.

Doing it badly

Not spending any time on promotion and marketing.

Not writing blog posts related to your product.

Not tweeting, not trying to promote your writing on Hacker News or reddit or quora.

If a tree falls in the forest and there’s no one to see it, write a blog post about it and post it on Hacker News.

Pieter is good at promotion and marketing

I’m not sure if this can be thought but Pieter seems to have a natural gift for story telling and promotion.

Consider those 2 examples:

2014 I’m launching 12 Startups in 12 Months.
- “12 startups in 12 months” is a much more interesting story than “I’m starting a startup”.
- This is important because if you submit “I’m starting a startup” blog post to Hacker News, it’s unlikely to get traction. But “12 startups in 12 months” - it very well might.
- People are attracted to new, never-been-done-before.
- This trope is now over-used so you can’t just copy it. But this is a big world and you don’t have to be absolutely first and only. Pieter was inspired by 180 websites in 180 days project.
How I went from 100 to 0 things.
- He was robbed.
- It happens to many people but most people don’t blog about it and promote that blog on Hacker News

It’s more than a catchy headline, though.

The content of those blog posts is good and goes beyond the obvious.

Sharing a lousy story will not do you any good.

Doing it badly

Not spending any time on promotion and marketing.

On the other hand, there are people who simply spam with low-effort, low-quality posts. That is not going to work either.

Pieter is productive

The amount of projects he shipped is impressive.

The depth of some of his projects is impressive.

It’s not inhumanely great, though.

When you consider he did it over 10 years, it’s more than doable.

The difference between “doable” and “done” is measured in hours spent in front of the monitor, writing code, testing code, learning new things, applying those learnings.

Most people just don’t spend enough time and / or don’t spend the time productively.

Doing it badly

Writing a book is a time consuming activity.

Writing software is a time consuming activity.

There’s no around spending a lot of time on your business (occasional exception notwithstanding).

Pieter learned programming by himself

This is not about self-thought vs. going to college.

This is about doing what’s necessary.

Pieter had ideas for software but didn’t know how to program.

For most people that’s an insurmountable problem. It’s learned helplessness.

All you need to learn programming is out there, for free. On YouTube, on blogs.

You just need to start learning and keep going.

You can read what Pieter says about DYI ethos.

Doing it badly

Trying to outsource programming to contractors or someone you find on Upwork, fantasizing about rising VC and hiring programmers, trying to find technical cofounder that will do all the work for 10% of the company.

Some people succeed by hiring programmers or contractors.

Most people don’t because most people suck at managing programmers even more than Winklevoss brothers.

If you are a broke college student, you just don’t have the money to hire people.

Pieter copies best practices

We are not born being the best programmers, best marketers, best writers.

There is no step-by-step guide to becoming the person that creates great products and markets them well.

But there are many businesses that do things well.

You can deconstruct what they do well and copy it.

It requires an attentive, inquisitive mind.

Let’s take https://makebook.io/ as an example. Do you see what Pieter did well there?

Go ahead, look some more. What did he do well on that website?

Answer: social proof.

At the top: “Product Hunt’s Book of the year”. At the bottom: twitter testimonials of people fawning over his book.

Does your business use social proof to make people more likely to buy?

This is just one of many things that a business can do well.

For another example: go to https://nomadlist.com/ and try to sign up for Nomad List.

It’s a master class in how to convert a visitor into a customer.

Figuring it why it makes for a great conversion funnel is left as an exercise for the reader.

Doing it badly

Never asking yourself: how can I do X better? How can I make my business better?

Not noticing the best practices that other businesses use.

Not learning from blogs, books, tweet storms.

Not applying the things you learn to your business.

Pieter is not a technologist

It’s counter-intuitive that not being a technologist i.e. someone impressed by and interested in new technologies, can be an advantage when running a software business.

He build his $1.5 million/year empire on PHP, sqlite, and a single VPS server.

Those are not technologies that will impress anyone.

PHP is 27 years old. It might be older than you. It sure isn’t Rust or Swift.

SQLite is 21 years old and everyone knows that a serious web service needs to use a serious database, like PostgreSQL

But they get things done. More precisely: Pieter gets things done using them.

Doing it badly

I’m a technologies so trust me, I know the siren song of fancy new languages, Kubernetes clusters, latest web frameworks.

I’m doing it badly. I’m trying to do it less badly.

When it comes to being productive, using a mature tool that you know well beats chasing the latest thing.

For me it’s Go on the backend and Svelte on the front-end but as Pieter shows, you can be very prolific with a very old technology.

Pieter surfs the high waves

YouTube analytics in 2013.

Slack channel in 2014.

Virtual Reality in 2016.

Telegram bot in 2015.

Being a nomad in 2015.

QR Menu creator in 2020.

Bitcoin / NFT in 2021.

The guy is at the forefront of trends.

This doesn’t necessarily make or break his businesses but it’s easier to promote novel things.

This is not blind chasing of fads.

He doesn’t have a podcast, a TikTok, or hangs out on Clubhouse.

He did nomadic lifestyle years before it was a trend.

Don’t ignore trends but do apply your personal and busines filter.

It’s also different than being a technologist.

He didn’t rewrite his website on blockchain but he did notice that Bitcoin is a trendy thing and he did a project that is related to Bitcoin and provides value related to Bitcoin.

Doing it badly

Not paying attention to new trends.

Not trying to figure out what product can you build to take advantage of the new trend.

Not being selective about which trends to exploit.

Pieter is humble

The first thing he said in this comment

All of this is possible because I’ve been a HN reader since 2010 and was inspired by all of you and especially @patio11 on here to bootstrap my own things and do it VERY publicly.

This might not impact his business but it does impact how people perceive him.

Doing it badly

Being an arrogant prick that people dislike and root against.

Doing Things Right bias

Survivorship bias is for losers.

Doing Things Right bias is for winners.

So put that coffee down until you start Doing Things Right.

Learning from Pieter

Do you want to create successful projects? You can learn a lot from Pieter:

watch his How to build a startup without funding talk
read his blog posts
read his book
watch 4 hour AMA
listen to an interview he gave

Don’t just read it passively.

You’re a business detective. He has a successful business. Your job is to figure out all the things he did right, not just the things he’s telling you.

You need to figure out that he’s using social proof even if he never talks about using social proof.

Lessons learned from 15 years of SumatraPDF, an open source Windows app

2021-07-25T00:00:00Z

I released first version of SumatraPDF in 2006. That’s 15 years ago which seems like a good time for a retrospective.

The app

SumatraPDF is a multi-format (PDF, ePub, Mobi, comic book, DjVu, XPS, CHM) viewer for Windows and currently looks like this:

The code

SumatraPDF is an open-source document reader for Windows. It started as a PDF reader, hence the name. Over time I’ve added for e-book formats (epub, mobi), comic books (cbz, cbr), DjVu, XPS, image formats etc.

It’s about 127k lines of C++ (not counting libraries written by others).

It’s written against Win32 API, not using GUI abstraction libraries like Qt. This contributes to making it as small and fast as possible.

Almost all of it was written by 2 people, with occasional contributions from others.

The amount of code written is actually higher. It is the nature of long running code bases that the code gets written and re-written. We delete, add, change.

It’s a side project, done after hours, not a full time effort. How does a daily grind of working on an app looks like?

It looks like this:

You can also take a peek at my dev log. I’ve only started it a year ago so only covers 1 year out of 15.

Why I created SumatraPDF

SumatraPDF is what I call an accidental success.

I never wanted to write a PDF reader for Windows.

In 2006 I was working at Palm and one of my job duties was writing a PDF reader for Foleo, an ARM and Linux powered mini laptop. You never heard of Foleo because it was cancelled weeks before launch for reasons I’m not privy to.

At the time I didn’t know that PDF is popular but Palm management did which is why they decided that PDF reader is a must have application. I ended up being the (sole) dev on the project.

Writing a PDF rendering library is a multi-year effort. We didn’t have years so I used Poppler open-source library.

My job was to write a basic PDF viewer that used Poppler to render PDF pages into a bitmap in memory and blit those bitmap on screen.

PDF is a complex format and rendering of some PDFs is slow. I wanted to improve the speed because Jeff Bezos told me that speed is something that customers will always care about.

Accidental app

The way to improve speed is to profile the code and look at the result.

Unfortunately, the toolchain for unreleased ARM hardware wasn’t very good. Forget about a profiler, kid, be grateful you have a C++ compiler and don’t have to enter assembly by typing hex, like Steve Wozniak.

Windows had decent profilers, so I compiled Poppler for Windows.

Once I had the library working on Windows, I wrote simplest GUI app that would show the pages and allow navigating between pages.

What do you know: I had a simple PDF reader for Windows.

I released it on my website. It couldn’t do much so I tagged it as version 0.1.

If you’re not embarrassed by your app then you’ve waited too long to release it

I didn’t come up with this nugget of wisdom but I agree with it.

Getting early users, learning what features they want the most beats toiling for months or years and implementing lots of features before you know anyone even cares.

Profiling, performance optimization and contributing to open source

Back to profiling: my plan worked.

I profiled the documents that took the longest to render and made a few surprisingly simple and surprisingly effective optimizations.

If memory servers, 2 optimizations had the biggest effect:

optimizing string class to use what is know as “small string optimization” i.e. adding a small buffer inside string class to hold small strings inline (as opposed to always allocating memory for the string). Strings were used frequently and most of them were small
fixing byte-at-a-time i/o by converting it to bulk reads. The way the code was structured in some code-paths it would do a virtual C++ call and a call to C read() function for each byte. Those are extremely cheap but not when you do it 5 million times

As a good boy I did submit my changes to Poppler.

As is my experience with contributing to open source projects, it was more of a miss than a hit.

Yes, I got 13 commits in but the project wasn’t very active and the maintainers weren’t eager to accept anything beyond small changes. Forget any major refactors.

I’m not one to voluntarily bash my head against the wall so I stopped trying.

(As you can see, I’m a fantastic team player).

Code quality

I want it and you should want it to.

How to maintain high code quality while working mostly solo, with no-one doing code reviews, no dedicated QA team?

Here’s how:

test the code yourself. Step through newly added code in the debugger, verify the newly added functionality works as expected and in general use the app a lot
automated crash reporting. Unfortunately it’s a pain to build but this is single most important thing you can do to improve quality of your software. Briefly: setup exception handlers to catch crashes in the app, in crash handler download symbols from the server to get readable callstack, create a crash report that includes callstacks of all thread, program and os information, log and submit that to a server. On the server, process those files and generate web pages for easy viewing of the crashes. Like I said: it’s a pain to build. Once you have crashes, look at them occasionally and try to figure out what went wrong and fix it
assert(). asserts are well established practice in C++ code: an additional code only executed in debug builds that verifies some conditions are true. If they’re not, something went wrong and you should investigate. I wrote wrote my own assert-like function which I enable in non-debug pre-release builds so that I automatically get bug reports from people hitting those conditions. Trust me: there’s no amount of testing you can do yourself that would match all the different things that a thousand people will do just by using the app.
logging. When investigating issues it helps to know what sequence of events led to a crash. My tiny logging module logs to a block of memory. That gets sent along with crash report. I also have an option to log to a file and I’ve recently added logging to a separate logging app via named pipe. This is perfect because most of the time I don’t care about the logs but when I do, I don’t want to restart the app to enable logging. With separate logging app, SumatraPDF is logging all the time and when it detects that logging app is running, it’ll also log to it. Implementation was trivial: logging app creates a named pipe, logger opens the pipe (like a file) and if open succeeds, it means the logger app is running and it reads the logs we write to the pipe
static code analysis: max level of warnings in C++ compiler, make warnings into errors, Visual Studio’s `/analyze’ option, cppcheck, clang-tidy, GitHub’s CodeQL. Run those occasionally and fix the errors and warnings
ASAN (Address Sanitizer), is fantastic. Was added in some point release of Visual Studio 2019. At a very small performance cost it can detect if you over-write memory or try to read uninitialized memory. I have a configuration with ASAN enabled. It’s fast enough to be used as a regular build.
stress testing. Sumatra’s job is mostly to render complex document format. There often are crashes in specific files due to complexity of the formats. To ensure lack of crashes I wrote a stress test code that reads and renders all files in a directory. I typically run it before a release on a large collection of test files I amassed over the years
unit testing. I don’t have a lot of them, they’re mostly for testing edge cases for low-level functionality like string formatting. They occasionally find bugs.
memory leaks. It’s surprisingly hard to find an easy to use memory leak detection tool. I’m working on a very simple built-in leak detector. In the meantime I’m using Dr. Memory. It works but it’s super slow.

Frequent releases

When you don’t have many features, improving the app is fast and easy. It doesn’t take much effort to implement “Go to” dialog (implemented in v 0.2).

On one hand I don’t want to release too often but I also do want the users to get new features as quickly as possible.

My policy of new releases is: release when there’s at least one notable, user-visible improvement.

Web apps take it to the extreme (some companies deploy to production multiple times a day).

In desktop software it’s a bit more involved and I had to build functionality to make it easy i.e. add a check for new releases, write an installer that can update the program.

BTW: I mean “frequent in proportion to amount of new code written”. SumatraPDF releases are not frequent in absolute terms but frequent if you consider that it’s a part-time, after hours project.

Treat open source projects like commercial software

Majority of open source projects probably don’t fall into this category, but if you want your open source to be as successful as possible, act as if it was a commercial product from a software company.

What does it mean in practice?

From day one I created a website for the app. It had screenshots, it had documentation, it was easy to download and install. Granted, a kind soul on Reddit called it “a website made by a 6-year old”. The lesson here is two-fold:

ignore haters and assholes
a website built by a 6-year old is better than no website. It doesn’t have to be pretty, it has to be functional

I did basic SEO. Nothing beyond Google’s “SEO 101” docs: just pay attention to URLs, put the right meta-data, use the right keywords.

I had a forum for users to ask questions, submit feature requests and occasionally support each other.

I made the installation process as easy as possible.

Everything that is a good idea for promoting commercial software is also a good idea for open source project.

Switching the engine while the car is running

At some point I decided to switch from Poppler to mupdf because mupdf was better and actively maintained.

Changing the app to use completely different library is not something you can do in an afternoon.

It’s demoralizing to work long time on code that doesn’t even compile.

To keep things compiling while also working towards supporting alternative rendering engine I developed an abstraction for the rendering engine.

The engine would provide the functionality the UI needed: getting number of pages in the document, sizes of each page (to calculate layout), rendering a page as a bitmap etc.

I’m much less enthusiastic about abstractions than most programmers (at least those who like to opine on Hacker News) but in this case it served me well.

I was able to incrementally convert program form using Poppler API to using Poppler via engine abstraction to using mupdf via Engine abstraction.

For a while I supported both engines at the same time but eventually I switched to just mupdf, to keep the app small.

This opened the door for supporting other formats via the same abstraction.

Simplicity vs. customizability

Simplicity sells.

I learned that from the history of Mozilla Firefox.

Before Firefox there was Netscape Navigator. It was a beast of an app, combining web browser with e-mail client.

Netscape couldn’t help themselves and was adding features upon features, leading to very complex UI.

A small group of renegades within Mozilla forked the code and focused on simple UI.

Simple Firefox was much more popular than the complex Navigator and eventually ate it completely.

From the beginning my goal was to keep the UI of SumatraPDF as simple as possible. An ⁸⁰⁄₂₀ app: 80% of functionality with 20% of the UI.

This requires resolve. I constantly get requests to add more icons to the toolbar and I constantly have to say “no” because adding 2 more icons to the toolbar to satisfy 10% of users makes the app slightly worse for 100% of the users.

Another trap is a siren song of additional settings. Sometimes people suggest that instead of doing X, the program should do Y. Not willing to remove X, they suggest adding a new UI setting “[ ] Do Y instead of X”.

Having settings dialog with 100 settings is not a good solution. It makes the app worse for everyone due to overwhelming them with choices and hiding important options in a sea of non-important options.

Not to mention that every conditional behavior requires more code, more potential bugs and more testing.

That being said, I also believe customizability is important. I believe that a big reason for Winamp being such a dominant music player (at the time) was its ability to skin the whole UI.

Some advanced features might only be used by 20% of users but those users are most likely power users that will evangelize the app more than the other 80% of the users.

My solution to UI simplicity vs. customizability: advanced settings file.

I designed a simple, human readable (and human writeable) textual format for advanced settings. Think JSON, but better.

I didn’t bother to write UI for changing those advanced settings. I just launch notepad.exe with the file. When user changes the settings and saves the file, I reload it and apply the changes.

Be water, my friend

Change is the only constant. We must adapt to the changes in the world.

I can’t believe how many popular projects still use craptastic Sourceforge for source repository or mailing list.

Actually, I can believe: changing things takes effort and the path of least resistance is to do nothing.

I started with Sourceforge, switched to code.google.com and then to github.com.

I switched forum software three times.

I’ve added a browser plugin and then removed it when browsers stopped supporting such plugins.

I changed the format for storing preferences from binary to human readable text.

Windows XP went from being the OS used by majority of users to no longer being supported (long after Microsoft stopped supporting it).

At first I only had 32-bit build and now I have both but emphasize 64-bit builds.

Think outside of the box

Thinking outside of the box is hard because the box is invisible.

SumatraPDF wasn’t the first PDF reader application ever written.

But most PDF readers do not become multi-format readers.

In hindsight it’s an obvious idea to support as many document formats as possible but it took me 5 years to realize it.

Most readers are still single format and I do believe being multi-format helped SumatraPDF become popular.

I can’t say it’s totally unique idea. There were multi-format image viewers long before SumatraPDF and I probably was inspired by them.

Small and fast - pick both

By today’s standards SumatraPDF is tiny (installer smaller than 10 MB) and starts up instantly.

I believe being small and seemingly fast was a big reason for adoption.

This comes back to Jeff Bezos’ wisdom: there will never be a time when users want bloated and slow apps so being small and fast is a permanent advantage.

How do I keep SumatraPDF small?

I avoid unnecessary abstractions. Window’s system of controls is a giant pain in the ass to program against. I could use wrappers like Qt, WxWindows or Gtk. They are easier to use but cause instant, giant bloat.

I’m not afraid to write my own implementation of things. I have my own JSON, HTML / XML parsers that are a fraction of size of the popular libraries for those tasks.

I aggressively take advantage of rich functionality included in Windows.

Let’s say I need to do a network request. I could include a monster library like curl or I could write 300 lines of code using win32 APIs. I wrote 300 lines of code.

An absence of bloat is hard to notice because it isn’t there.

My pet peeve is over-using XML for storing data.

When I worked at Palm I was at a design meeting for auto-update system for a phone. Part of it was storing information about the current version in the image, downloading information about the latest version and comparing them.

The developer decided to use XML for storing that information. That seemed like a lot of bloat for storing simple information like a version number. An compliant XML parser alone is a lot of code. Surely a simple binary format would be easier to implement, I suggested and was ignored.

If you don’t have the power to fire someone, your ideas will be ignored.

(As you can see, I’m a great team player.)

For storing advanced settings I designed and implemented a file format that is smaller than XML, readable and writeable by humans and can be implemented in few hundred lines of code. It’s as powerful as JSON and even more readable.

It’s so simple that after implementing it I had the time to implement a serialization system for C++ objects and a Go code generator. To add more settings I don’t have to write more C++ code. I just add data definition to Go generator, re-run it and get data-driven C++ parsing auto-generated.

It’s my project and I act like it

When someone pays you to write code you have to do it the way they like it.

A big attraction of working on code you’re not paid for is that there is no one who can tell you what to do or how to do it.

My code would not pass a code review at Google and not because it’s bad but because it’s often unorthodox. Outside of accepted dogma.

(As you can see, I’m a great team player.)

I always used SumatraPDF as my playground for testing crazy ideas.

Minimize the code size by not using STL? That’s crazy but I did it. Granted, in 2006 STL wasn’t very good.

I learned about how Plan 9 C code had non-traditional scheme of #include files where they don’t put #ifdef wrappers in each .h file to allow multiple inclusion and .h files don’t include other .h files. As a result .c files have to include every .h file they need and in the right order. It’s a bit of a pain and no other modern C++ codebase I know of maintains such discipline.

But it’s my project so I did it and I keep doing it. It prevents circular dependencies between .h files and doesn’t inflate C++ build times because of careless including the same files over and over again.

I implemented a CSS inspired UI system. Not great, but mine. And I plan to replace with a different one.

Because I can.

Because no one can tell me not to.

Cross-platform is over-rated

SumatraPDF is unabashedly a Windows only app.

Supporting other platforms (Linux, Mac, Android) is one of the most frequent requests. A request that I have to decline.

First, there is a pragmatic reason: I just don’t have the bandwidth to write code for 3 platforms.

Second, I believe an excellent app for one platform can become more popular than a mediocre app for 3 platforms.

Coming back to the first reason: I don’t have the bandwidth to write 3 excellent apps. Part of the reason SumatraPDF is small is my use of win32 APIs for the UI.

The only way for one person to even attempt cross-platform app is to use a UI abstraction layer like Qt, WxWidgets or Gtk.

The problem is that Gtk is ugly, Qt is extremely bloated and WxWidgets barely works.

Tests are not necessary, neither are code reviews

I’m not saying tests are bad or that you shouldn’t write test or do code reviews.

I’m saying that they are not necessary.

Dogma is powerful. Sometimes in my corporate life I felt like writing tests was just going through motion. Maybe we should spend more time writing code instead, I though?

But try to make a nuanced point about more tests vs. more code to your fellow developers and you’ll be burned at stake and your smoldering carcass will be thrown to wild dogs. Village children will use your severed head to play soccer.

(As you can see, I’m a great team player.)

And yet I do know that you can write complex, relatively bug free code without tests, because I did it.

I do know that you can write complex, relatively bug free code without anyone looking over your code, because I did it.

If no one uses your app then who cares if it crashes.

If many people use your app and it crashes, they’ll tell you and then you’ll fix it.

Overnight success takes a decade

SumatraPDF is relatively popular. Not Facebook popular or DOOM popular, but more popular than most apps. A respectable level of popular.

It all started with v 0.1 and a trickle of downloads. It remained a trickle for many, many months.

I’m not sure there’s a lesson here.

Success often takes a long time.

Unfortunately, at that stage it’s undistinguishable from (eventual) failure so this wisdom doesn’t help you if you’re working on a not-yet-successful project and debating if you should continue or abandon

The money

Open source is not a good business model.

If you want to make money do literally anything else: try to sell software, do consulting, build a SAAS and charge monthly for it, rob a bank.

I did experiment with making money and made some.

There was a time AdSense would pay decent CPM so I put AdSense ads on the website and it made some money. I no longer do because the rates did plummet and it isn’t worth annoying people. My soul has a price and AdSense can no longer afford it.

Now I’m experimenting with Patreon and Paypal donations. It makes more than $100 a month but not much more than that.

Like I said: don’t start open source project with intent to make money.

Rarely you can have both: freedom to do whatever you want and a good pay so pick what is more important to you. Open source gives you freedom but not money.

On to the future

I need to continue being like water.

For years I resisted adding editing features. “It’s just a reader” I said. But why not add editing? If people want it, give it to them.

The future of all software is as a web app. Why not bring the spirit of SumatraPDF to the web?

Those are just a few ideas I have today.

Being like water means that in 5 years I’ll have other ideas, informed by what’s happening at that time.

How I use Roam Research

2021-06-27T00:00:00Z

I take lots of notes and I’m always looking for that perfect note taking app.

Because I take a lot of notes, mobile-first note taking apps (e.g. Google’s Keep) are of no interest to me.

To me most important thing is:

lowest possible friction in adding new notes
fast way of finding existing notes

In theory I could use Google Docs as a note taking app but both Notion and Roam Research make it much faster to jot a new note or find a note I wrote in the past.

Over the years I’ve cycled through many note taking systems: a single big text file, collection of markdown files in a git repository, a hosted wiki, a self-hosted wiki, Evernote, my own note taking web app.

A combination of Notion and Roam Research is, I hope, the end of the road and will be good enough for the rest of my life.

This article describes how I use Roam Research.

Roam use cases

Here are my main use cases of Roam:

taking notes / writing
keeping a log of activity (what I did)
managing todo list
managing projects
bookmarking

Roam can be hard to grasp at first. Things you need to know:

Roam is a collection of named pages
you can trivially link between pages. Writing [[foo]] or #foo anywhere creates a link to a page named foo. A page also lists backlinks (i.e. pages linking to this page) at the end
there’s automatically a page for every day and the default view is a page for current day

Knowing that, let’s go through my use cases.

Taking notes / writing

Let’s say I have an idea to write an article about my use of Roam Research (i.e. this very article).

In most other note taking apps I would first have to think about where to put that note.

In Roam I just start writing in the page for current day and tag it with #draft.

Later on I can go to a draft page and see all the drafts I’m currently working on.

There is a greater point here: Roam is “write now, organize later” system.

When I have a though, I jot it down in today’s page and tag it with keywords that will help me find it later.

It’s the least amount of friction between having a thought and recording that thought for the future.

Taxonomy is personal. I use #draft for my draft articles, but you could just as well use #article or #articles (using singular vs. plural for tags is surprising hard to decide).

Logging of activity

In personal life logging of activity is just for curiosity.

I write down what I did in a page for current day and tag log entries with #done.

I visit done page to review what I did in the past.

What if I want to track what I did on a specific project, e.g. SumatraPDF?

I further tag the entry with a tag for the project e.g. #sumatrapdf. Roam has filtering capabilities that allows me to see entries tagged with #done AND #sumatrapdf.

In professional life logging your work could help you level up. If no-one knows what you did, did it happen? Will it lead to a promotion?

Roam is perfect for logging what you did so that you can review it later and send a summary to your manager or consulting client.

If I were a Google software engineer, I would tag my entries with #done and #work.

If I was doing a consulting job I would tag them as #done, #consulting and maybe with a tag specific to the project or the client.

Again, taxonomy is personal. You can use other tags if they make more sense to you.

Managing projects

I work on several software projects at the same time.

I have a page per project that serves as a hub of information related to the project.

On that page I have:

most important links (e.g. to github repository, Digital Ocean dashboard for deployed project etc.)
list of things to do next
list of ideas for future improvements
links to competing products
technical overviews. Memory is always fading so it pays to jot down things that might help you remember key information
documentation

Todo list

I don’t use dedicated todo list applications. I think that creating hundreds of todo items is overwhelming and counter-prodcutive.

I do believe in one of the tenets of Getting Things Done system: writing down your ideas and todo tasks to get them out of your head but be able to review them in the feature.

I maintain a “maybe do list” instead of “todo list”.

For example, if I have an idea for SumatraPDF, I jot it down in Roam and tag it with #sumatrapdf. I can then review those ideas in the future when I have time to work on SumatraPDF.

I do maintain a very short (few items) todo list for near future. Mostly today and tomorrow.

This is an anti-procrastination device.

I find that when I start working on a task, I manage to finish it.

Procrastination is what keeps me from even starting.

Having a list of things to do in the morning helps me start working (as opposed to starting to browse Twitter).

It’s worth mentioning that other people do use Roam as a sophisticated todo list. Roam is flexible enough. For example Roam understands dates so you can have pages with tags that record a completion date for a task and you can use Roam’s powerful query syntax to find tasks that are supposed to be done in the next week.

Bookmarking

Bookmark managers in browsers are not good at managing large list of bookmarks. Hierarchies don’t scale.

Roam’s easy tagging makes it a replacement for services like pinboard.in (and once-famous-now-defunct del.icio.us). Found a page about Docker you want to bookmark for later? Drop the link in Roam and tag it with #docker. Find it later in the docker page.

There is a use case for browser’s bookmarks: a fast way to get to most frequently used websites. I’m experimenting with using Roam for that as well.

I’ve created page bookmarks with most frequently used links. Opening a Roam page from scratch is relatively slow so I keep that page open at all times and as a pinned tab (a Chrome thing).

One more thing…

Actually, lots of things.

I’m relatively light user of Roam. I’m still discovering new use cases for Roam and better ways of managing notes.

But even just the basics usage of jotting things down and tagging them for cross-referencing is very useful. Speed and flexibility are important and Roam has both.

Roam can offer much more if you’re willing to invest time. Powerful search queries, tags with values, storing files. There’s a reason people become obsessed with Roam.

I’m not obsessed but I do appreciate a great tool.

Notion and Roam?

Why use both Notion and Roam? Why not stick with just one of them?

Notion has some use cases that Roam Research doesn’t support well. For example sharing a subset of notes with other people.

They also have different philosophies of managing notes.

Notion is strongly hierarchical. Pages are nested within pages. It’s great when hierarchy is obvious but sometimes I spend brain cycles figuring out where in the hierarchy should a new page go.

Roam is a flat collection of interlinked pages. It’s more difficult to grasp but removes the thinking. Write things down, tag them now and maybe organize later.

Roam is more quirky. While Notion has more polished UI, Roam is often faster to use. Trying to change a link destination in Notion can be a struggle.

Final thoughts

We live in a golden era of note taking apps. I wish I had Notion or Roam Research 10 years ago.

Notion opened floodgates of high quality note taking apps. Before Notion there was just Evernote. After Notion we got lots of other options, including Roam.

I think they are both powerful enough to serve as a note taking system for life.

The things we do to ship desktop software

2019-05-01T00:00:00Z

I wrote a small utility for Windows. It indexes a hard-drive and allows to find a file by name in under a second.

It might surprise you that I spent more time on things that are not related to core functionality. Let’s call it a tax of shipping desktop software.

Here are some of the taxes you need to pay to ship a desktop application to users.

Logging

In the long term you want to be able to diagnose problems quickly and logging helps in that.

So I’ve implemented logging to a file.

I went beyond basics and implemented a way to easily see the logs in a window. Simple to implement as all you need is to use built-in read-only text box control.

You can toggle log window from File menu. As an additional touch, if it logs an error and it’s a dev build, it automatically opens a window.

Website

You need to have a website. It’s a simple website: few screenshots and a download button, but it does take a few hours to make.

It also has a feedback page, so that people can tell me how to improve the program.

Signing certificate

On Windows anti-virus software and Microsoft’s anti-phishing systems are lousy and often flag innocent software as malware.

You can decrease the probability of that by signing your installer and executables with software signing certificate.

Unfortunately, this certificate is both expensive and pain in the ass to buy. Entities selling them want a proof of your (or your company) identity but the rules are often bureaucratic idiocy.

Installer

You need an installer because it makes the life of your users easier.

What installer does is not complicated:

copy the files in the right place
create necessary registry entries for un-installation
create a shortcut on the desktop and in Start Menu
implement un-installation logic to delete that which has been created during installation (files, shortcuts, registry entries)

Auto-update system

I want the users to get the latest version in the easiest possible way, One possible way to implement it:

Build system

To make release process smooth, you need automated build and release system.

In my case it’s a Go program that builds the software, signs it, uploads to Digital Ocean Spaces for storage, updates the info needed for auto-update system.

The taxes add up

Those things are not hard to implement. Individually they don’t take a lot of time to implement.

But when you add it all up, for this particular project I estimate I spent more time building those system than on the core functionality.

The bright light in the tunnel is:

I can re-use most of this code in other software
it only needs to be written once. The more time I spend on the app, the less expensive those things are as percentage of total development time

Lessons learned porting 50k loc from Java to Go

2019-04-05T00:00:00Z

I was contracted to port a large Java code base to Go.

The code in question is a Java client for RavenDB, a NoSQL JSON document database. Code with tests was around 50 thousand lines.

The result of the port is a Go client.

This article describes what I’ve learn in the process.

Testing, code coverage

Large projects benefit greatly from automated testing and tracking code coverage.

I used TravisCI and AppVeyor for testing. Codecov.io for code coverage. There are many other services.

I used both AppVeyor and TravisCI because a year ago Travis didn’t have Windows support and AppVeyor didn’t have Linux support.

Today if I was settings this up from scratch, I would stick with just AppVeyor, as it can now do both Linux and Windows testing and the future of TravisCI is murky, after it was acquired by private equity firm and reportedly fired the original dev team.

Codecov is barely adequate. For Go, they count non-code lines (comments etc.) as not executed. It’s impossible to get 100% code coverage as reported by the tool. Coveralls seems to have the same problem.

It’s better than nothing but there’s an opportunity to do things better, especially for Go programs.

Go’s race detector is great

Parts of the code use concurrency and it’s really easy to get concurrency wrong.

Go provides race detector that can be enabled with -race flag during compilation.

It slows down the program but additional checks can detect if you’re concurrently modifying the same memory location.

I always run tests with -race enabled and it alerted me to numerous races, which allowed me to fix them promptly.

Building custom tools for testing

In a project that big it’s impossible to verify correctness by inspection. Too much code to hold in your head at once.

When a test fails, it can be a challenge to figure out why just from the information in the test failure.

Database client driver talks to RavenDB database server over HTTP using JSON to encode commands and results.

When porting Java tests to Go, it was very useful to be able to capture the HTTP traffic between Java client and server and compare it with HTTP traffic generated by Go port.

I built custom tools to help me do that.

For capturing HTTP traffic in Java client, I built a logging HTTP proxy in Go and directed Java client to use that HTTP proxy.

For Go client, I built a hook in the library that allows to intercept HTTP requests. I used it to log the traffic to a file.

I was then able to compare HTTP traffic generated by Java client to traffic generated by my Go port and spot the differences.

Porting process

You can’t just start porting 50 thousand lines of code in random order. Without testing and validating after every little step I’m sure I would be defeated by complexity.

I was new to RavenDB and Java code base. My first step was to get a high-level understanding how Java code works.

At the core the client talks to the server via HTTP protocol. I captured the traffic, looked at it and wrote the simplest Go code to talk the server.

When that was working it gave me confidence I’ll be able to replicate the functionality.

My first milestone was to port enough code to be able to port the simplest Java test.

I used a combination of bottom-up and top-down approach.

Bottom-up part is where I identified the code at the bottom of call chain responsible for sending commands to the server and parsing responses and ported those.

The top-down part is where I stepped through the test I was porting to identify which parts of the code need to be ported to implement that part.

After successfully porting the first step, the rest of the work was porting one test at a time, also porting all the necessary code needed to make the test work.

After the tests were ported and passing, I did improvements to make the code more Go-ish.

I believe that this step-by-step approach was crucial to completing the work.

Psychologically, when faced with a year-long project, it’s important to have smaller, intermediate milestones. Hitting those kept me motivated.

Keeping the code compiling, running and passing tests at all times is also good. Allowing bugs to accumulate can make it very hard to fix them when you finally get to it.

Challenges of porting Java to Go

The objective of the port was to keep it as close as possible to Java code base, as it needs to be kept in sync with Java changes in the future.

I’m somewhat surprised how much code I ported in a line-by-line fashion. The most time consuming part of the port was reversing the order of variable declaration, from Java’s type name to Go’s name type. I wish there was a tool that would do that part for me.

String vs. string

In Java, String is an object that really is a reference (a pointer). As a result, a string can be null.

In Go string is a value type. It can’t be nil, only empty.

It wasn’t a big deal and most of the time I could mechanically replace null with "".

Errors vs. exceptions

Java uses exceptions to communicate errors.

Go returns values of error interface.

Porting wasn’t difficult but it did require changing lots of function signatures to return error values and propagate them up the call stack.

Generics

Go doesn’t have them (yet).

Porting generic APIs was the biggest challenge.

Here’s an example of a generic method in Java:

public <T> T load(Class<T> clazz, String id) {

And the caller:

Foo foo = load(Foo.class, "id")

In Go, I used two strategies.

One is to use interface{}, which combines value and its type, similar to object in Java. This is not preferred approach. While it works, operating on interface{} is clumsy for the user of the library.

In some cases I was able to use reflection and the above code was ported as:

func Load(result interface{}, id string) error

I could use reflection to query type of result and create values of that type from JSON document.

And the caller side:

var result *Foo
err := Load(&result, "id")

Function overloading

Go doesn’t have it (and most likely will never have it).

I can’t say I found a good solution to port those.

In some cases overloading was used to create shorter helpers:

void foo(int a, String b) {}
void foo(int a) { foo(a, null); }

Sometimes I would just drop the shorter helper.

Sometimes I would write 2 functions:

func foo(a int) {}
func fooWithB(a int, b string) {}

When number of potential arguments was large I would sometimes do:

type FooArgs struct {
	A int
	B string
}
func foo(args *FooArgs) { }

Inheritance

Go is not especially object-oriented and doesn’t have inheritance.

Simple cases of inheritance can be ported with embedding.

class B : A { }

Can sometimes be ported as:

type A struct { }
type B struct {
	A
}

We’ve embedded A inside B, so B inherit all the methods and fields of A.

It doesn’t work for virtual functions.

There is no good way to directly port code that uses virtual functions.

One option to emulate virtual function is to use embedding of structs and function pointers. This essentially re-implements virtual table that Java gives you for free as part of object implementation.

Another option is to write a stand-alone function that dispatches the right function for a given type by using type switch.

Interfaces

Both Java and Go have interfaces but they are different things, like apples and salami.

A few times I did create a Go interface type that replicated Java interface.

In more cases I dropped interfaces and instead exposed concrete structs in the API.

Circular imports between packages

Java allows circular imports between packages.

Go does not.

As a result I was not able to replicate the package structure of Java code in my port.

For simplicity I went with a single package. Not ideal, because it ended up being very large package. So large, in fact, that Go 1.10 couldn’t handle so many source files in a single package on Windows. Luckily it was fixed in Go 1.11.

Private, public, protected

Go’s designers are under-appreciated. Their ability to simplify concepts is unmatched and access control is one example of that.

Other languages gravitate to fine-grained access control: public, private, protected specified with the smallest possible granularity (per class field and method).

As a result a library implementing some functionality has the same access to other classes in the same library as external code using that library.

Go simplified that by only having public vs. private and scoping access to package level.

That makes more sense.

When I write a library to, say, parse markdown, I don’t want to expose internals of the implementation to users of the library. But hiding those internals from myself is counter-productive.

Java programmers noticed that issue and sometimes use an interface as a hack to fix over-exposed classes. By returning an interface instead of a a concrete class, you can hide some of the public APIs available to direct users of the class.

Concurrency

Go’s concurrency is simply the best and a built-in race detector is of great help in repelling concurrency bugs.

That being said, in my first porting pass I went with emulating Java APIs. For example, I implemented a facsimile of Java’s CompletableFuture class.

Only after the code was working I would re-structure it to be more idiomatic Go.

Fluent function chaining

RavenDB has very sophisticated querying capabilities. Java client uses method chaining for building queries:

List<ReduceResult> results = session.query(User.class)
                        .groupBy("name")
                        .selectKey()
                        .selectCount()
                        .orderByDescending("count")
                        .ofType(ReduceResult.class)
                        .toList();

This only works in languages that communicate errors via exceptions. When a function additionally returns an error, it’s no longer possible to chain it like that.

To replicate chaining in Go I used a “stateful error” approach:

type Query struct {
	err error
}

func (q *Query) WhereEquals(field string, val interface{}) *Query {
	if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) GroupBy(field string) *Query {
if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) Execute(result inteface{}) error {
	if q.err != nil {
		return q.err
	}
	// do logic
}

This can be chained:

var result *Foo
err := NewQuery().WhereEquals("Name", "Frank").GroupBy("Age").Execute(&result)

JSON marshaling

Java doesn’t have a built-in marshaling and the client uses Jackson JSON library.

Go has JSON support in standard library but it doesn’t provide as many hooks for tweaking marshaling process.

I didn’t try to match all of Java’s functionality as what is provided by Go’s built-in JSON support seems to be flexible enough.

Go code is shorter

This is not so much a property of Java but the culture which dictates what is considered an idiomatic code.

In Java setter and getter methods are common. As a result, Java code:

class Foo {
	private int bar;

	public void setBar(int bar) {
		this.bar = bar;
	}

	public int getBar() {
		return this.bar;
	}
}

ends up in Go as:

type Foo struct {
	Bar int
}

3 lines vs. 11 lines. It does add up when you have a lot of classes with lots of members.

Most other code ends up being of equivalent length.

Notion for organizing the work

I’m a heavy user of Notion.so, a hierarchical note taking application.

Here’s how I used Notion to organize my work on Go port:

Here’s what’s there:

not shown above, I have a page that is a calendar view where I take short notes about what I work on on a given day and how much time I spent. This is important information since it was a hourly contract. Thanks to those notes I know that I spent 601 hours over 11 months
clients like to know the progress. I had a page for each month were I summarized the work done like this:

Those pages were shared with the client.
A short-term todo list helps when starting work each day:
I even managed invoices as Notion pages and used “Export to PDF” function to generate PDF version of the invoice

Additional resources

I’ve provided some additional commentary in response to questions:

in Hacker News discussion
in /r/golang discussion

Other material:

if you need a NoSQL, JSON document database, give RavenDB a try. It’s chock full of advanced features
if you’re programming in Go, try a free Essential Go programming book
if you’re interested in Notion, I’m world’s most advanced user of Notion:
- I reverse engineered Notion API
- I wrote an unofficial Go library for Notion API
- all content on this website is written in Notion and published with my custom toolchain

Trade offs in designing versatile log format

2019-03-25T00:00:00Z

This article shows that when designing software even seemingly simple things are complicated and trade offs abound.

I wanted to log events to a file. I had several requirements for my design:

it should be simple and therefore easy to implement
it should be human-readable
it should allow various types of events

It’s not a hard problem.

I could log it as stream of JSON objects. It would allow different types of events. It’s easy to implement (in the sense that there’s a library in every language for the hard part of encoding/decoding JSON).

One thing: it’s not particularly human friendly. I wouldn’t enjoy looking at tail -f of a JSON log.

How about something like Apache server logs:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

It’s readable but doesn’t allow for different types of events.

Implementing this format isn’t challenging but the format is ad-hoc. For each kind of data I would have to write a completely new formatter / parser.

How about something more structured:

ip: 127.0.0.1
request: GET /apache_pb.gif HTTP/1.0
status code: 200

We can write a library that serializes key / value pairs and that gives us ability to write different kinds of events to the same file with generic code.

But hold on, how do we know where one event ends and another starts?

We must amend our format add add a record separator, like ---:

ip: 127.0.0.1
request: GET /apache_pb.gif HTTP/1.0
status code: 200
---

We’re not there yet. We use newline to separate lines that encode a single key / value pair. What if the value has a newline in it?

body: hello fred
I'm writing to you...

We could escape the value:

body: hello fred\nI'm writing to you...

Escaping is not a good solution.

Without escaping, we can just write out the data as-is.

With escaping we have to scan the whole thing to determine if it needs escaping, and escape if it does.

Same goes for un-escaping.

This is slow and error prone.

Furthermore, what if the value is really large? It won’t be readable encoded on a single line.

Let’s amend our format to accommodate large values:

body:+33
hello fred
I'm writing to you...

33 is the size of the value. Knowing the size we don’t need escaping. It’s faster, simpler to implement, more readable and supports binary data (like images).

To formalize, key / value pair can be encoded in 2 ways:

${key}: ${value}\n
${key}:+${sizeOfValue}\n${value}

Let’s revisit the idea of using a separator:

ip: 10.0.0.1
request: GET / HTTP/1.0
---
ip: 10.0.0.2
request: GET /index.html HTTP/1.0
---

For delimiting records we can use the same tricks we used for encoding large values:

36
ip: 10.0.0.1
request: GET / HTTP/1.0
43
car: Toyota Corolla
year: 2018
price: 21000

First line is size of data as a string + newline. The data of this size follows.

This is very generic framing, agnostic to what is inside. It could be a picture, a JSON-encoded data or our key / value format.

We’ve arrived at a layered design:

first layer is encoding arbitrary chunks of data by writing size and then data
second layer is key / value format inside the data

It would be even simpler if the size was 8 byte, 64-bit integer. It wouldn’t be human-readable, though, so I picked a slightly more complicated, string-based encoding.

It’s not quite right yet.

Above we have logged an HTTP requests info and car info in the same file.

How do we know what type of record did we read?

Let’s amend our framing with optional name:

36 httplog
ip: 10.0.0.1
request: GET / HTTP/1.0
43 carinfo
car: Toyota Corolla
year: 2018
price: 21000

Adding optional name (httplog and carinfo) allows us to know what kind of data is encoded in a given chunk of data.

Finally let’s add non-optional timestamp in Unix Epoch format in milliseconds:

36 1553564864010 httplog
ip: 10.0.0.1
request: GET / HTTP/1.0
43 1553564864115 carinfo
car: Toyota Corolla
year: 2018
price: 21000

Unlike name, which is optional, I decided timestamp is not optional. You can set it to 0 if you don’t need it but for most logging needs you want a timestamp.

Traditional Unix Epoch has precision of a second and that seems not enough. Millisecond-precision seems good enough. Nanosecond was also an option but seems like an over-kill.

What if you need more structure than key / value pairs?

Simplicity means that you can’t implement every possible feature. This format doesn’t preclude more structure: you can always use JSON as the value in key / value field. It’s just not going to be as easy to use.

Implementing the thing

This is not just theoretical exploration. I’ve implemented this format as a Go package siser.

It took me 2 days to implement. It’s just under 500 lines of code:

$ wc -l reader.go record.go util.go writer.go
     167 reader.go
     196 record.go
      79 util.go
      57 writer.go
     499 total

Not counting tests but I do have them because I need this code to be rock solid. Storing data is serious business and has to be reliable.

How fast is it? I benchmarked it against JSON encoding:

BenchmarkSiserMarshal-12      	 1000000	      1136 ns/op
BenchmarkJSONMarshal-12       	 1000000	      1407 ns/op
BenchmarkSiserUnmarshal-12    	 5000000	       374 ns/op
BenchmarkJSONUnmarshal-12     	  500000	      3353 ns/op

The benchmark is mostly to make sure that I didn’t make a big performance mistake. It would be embarrassing to be slower than JSON encoding. As a side note: it’s impressive how fast JSON marshaling in Go standard library is.

I’m moving all my logging needs to this format.

This format is also good for very simple stores aka databases. The file can be seen as an append-only database log. To update a record I just write a new entry and it’ll over-write earlier entry.

The roads not taken

After reading this you might not be impressed. This design is clearly inspired by many others. All I did is put known things in a specific, but very familiar, way.

Simplicity is insidiously non-trivial. The design described here is a v2 of siser library. First design used --- for record separator, didn’t have name and timestamp.

Only several months after first version I got enough insight to improve it.

In turn siser was an evolution of less robust ideas I implemented earlier. It took experiencing the limitations of those earlier designs in real use for better ideas to emerge.

This is even more apparent when you look at mistake of others.

MIME is a format used for encoding e-mail messages. While in some ways it’s very close to this format, they made a mistake of using a boundary string for separating multiple parts. Compared to framing data with size prefix it’s so much hard to implement. A MIME decoder is more than 500 lines of code and doesn’t offer more features.

Other example of massive mistakes of the past is choosing XML as a format for describing ant or Visual Studio build files. XML is super slow, unreadable for humans, hard to work with programmatically. A conforming implementation of XML parser requires thousands of lines of code.

Obvious things are often only obvious in hindsight.

How I implemented Oembed Proxy for GitHub

2018-10-13T00:00:00Z

Why Oembed Proxy for GitHub

I’m writing a programming book Essential Go in Notion and I need to include code snippets.

Notion has support for code blocks but it’s not good enough for my use case.

I want to make sure the code compiles so I write small programs and store them in GitHub repository. My custom book building script compiles and runs the programs to ensure the code is correct.

Notion supports embedding GitHub’s gists but not files from git repositories hosted on GitHub.

I researched things and turns out there’s a standard called Oembed that was created to enable embedding arbitrary content from one website in another.

Notion supports Oembed.

I didn’t find existing service that can provide Oembed support for GitHub repositories so I built one myself.

This article describes the high-level design of Oembed Proxy for GitHub.

What is Oembed?

Let’s say you’re implementing a rich-text editor on the web and you want to allow embedding arbitrary content from other services: a tweet, a youtube video, a flickr photo.

You can add code to support each service you know about (and pray that they provide a stable way to get the necessary information) but it’s not scalable. There are way too many web services out there and more are created every day.

Some people noticed the problem and created Oembed protocol that provides a standard way to expose content for embedding. Now you need to only write code to support Oembed.

Here’s how it works, using Oembed Proxy for GitHub as an example.

In our example Notion is an Oembed client that wants to embed a file from GitHub repository https://github.com/essentialbooks/books/blob/master/README.md in the body of a document.

Notion supports Oembed standard. If GitHub supported Oembed on their servers, you would add embed block and use https://github.com/essentialbooks/books/blob/master/README.md link directly.

GitHub doesn’t support Oembed so instead, you can use URL via my Oembed Proxy: https://www.onlinetool.io/gitoembed/widget?url=https%3A%2F%2Fgithub.com%2Fessentialbooks%2Fbooks%2Fblob%2Fmaster%2FREADME.md

This is https://www.onlinetool.io/gitoembed/widget page with GitHub link provided as url argument.

If you view that page it’s the file from GitHub with source code highlighting.

Oembed supports auto-discovery. If you peek at HTML of that page, you’ll see this is <head> section:

<link rel="alternate" type="application/json+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=json&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />
<link rel="alternate" type="text/xml+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=xml&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />

Those are instructions telling Oembed client (Notion in this example) how to get embeddable HTML.

Oembed supports 2 formats for providing this information: JSON and XML. In my testing Notion worked with just application/json+oembed but I implemented both just in case other clients only understand XML.

Oembed client parses HTML to extract those links and, if present, gets Oembed information. In our example it’s in https://www.onlinetool.io/gitoembed/oembed?format=json&url=https://github.com/essentialbooks/books/blob/master/README.md and looks like this:

{
	"version": 1,
	"type": "rich",
	"provider_name": "gitoembed",
	"provider_url": "https://www.onlinetool.io/gitoembed/",
	"height": 320,
	"width": 720,
	"title": "README.md",
	"html": "\u003ciframe width=\"100%\" height=320 src=\"https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md\" frameborder=\"0\" onload=\"resizeFrame(this);\"\u003e\u003c/iframe\u003e"
}

I hope the format is mostly self-explanatory.

The interesting bit is html field, which is:

<iframe 
	width="100%"
	height=320 
	src="https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md" 
	frameborder="0"
	onload="resizeFrame(this);">
</iframe>

We could send the actual HTML content to insert but it’s more customary to send an iframe which loads the html.

In my implementation src of the iframe is the same page from which we extracted Oembed JSON link so it serves double-duty as both the content and an Oembed pointer to the content. Those could be different URLs.

Implementation details of Oembed Proxy for GitHub

First I needed a server. Usually, I use Digital Ocean but this time I went for biggest bang for the buck and used C2L server from Scaleway.

For ~$30 I get 8 core server with 32 GB of RAM, 250 GB SSD drive and 600 MBits/s unmetered bandwidth. It’s a bare metal server, not a VPS, so it’s all mine, eliminating risk of noisy neighbors.

On Digital Ocean the closest server with such specs would be $160. I keep a list of cheap VPS servers for comparison.

The downside is that the servers are in Europe (you can choose between Amsterdam or Paris) so the latency for users in US will be higher than if the server was hosted in US.

For the OS I went with Ubuntu 18.04. I know it best and it’s one of the most popular distros.

The server is written in Go. It’s my go-to language for writing backend code.

The service isn’t very complicated:

it downloads the file via GitHub’s https://raw.githubusercontent.com url
to avoid over-loading GitHub servers (and exceeding their throttling limits) I cache downloaded files for a day. It’s not infinitely long cache because files on GitHub can change and I don’t want to cache outdated version forever
for code highlighting I use chroma library

There’s even less of front-end code. Explanation of the service and a way to test it implemented with a form and few lines of JavaScript.

Powering a blog with Notion and Netlify

2018-07-30T00:00:00Z

The last iteration of this blog was a Go program running on Digital Ocean’s cheapest VM ($5/month).

Recently I’ve made 2 big changes:

I converted it to a static site hosted on Netlify
I used Notion for writing the posts instead of writing markdown files in a text editor

Moving to Netlify

My blog was effectively a static website. It didn’t need a backend so writing a custom server and running it on a VPS was overkill.

Few months back Netlify reduced the price of their cheapest plan to $0 (from $10/month).

I’m always looking to simplify and cheapify my life so I bit the bullet and converted my custom server to generate static HTML files suitable for hosting on Netlify.

If you want to publish a static website and are starting from scratch, the best approach is to use one of the many static site generators (e.g. Hugo or Jekyll).

I already had a lot of content in .html and markdown files accumulated over the years and code to generate the website for serving from custom web server so I refactored to code to generate static HTML instead.

Refactoring process was time consuming but simple.

For dynamically generated html I changed the code to generate a .html file and setup appropriate url => file mapping using _redirect file.

For local testing I use Caddy and generate Caddyfile with appropriate redirects. There are minor differences when testing locally because there are semantic differences between redirect capabilities of Netlify and Caddy but it’s good enough.

Netlify also has an option to do a draft deploy under a unique URL. This is good for previewing the changes before publishing.

At the end of code refactoring I effectively ended up with a custom static site generator.

The verdict

I’m happy with the result.

Netlify has all the features I care about for a basic website.

Notably they provide free SSL with Let’s Encrypt, allow custom domains and have capable redirect capabilities. They use CDN so should be faster than hosting on a single host.

The only thing I miss is being able to see analytics for 404. With my own server I was logging all requests for pages that don’t exist on my server. Many of them were bots trying to hack me via known vulnerabilities in popular software like WordPress but sometimes it would be caused by my mistake or a request for a valid article incorrectly linked. Seeing those I was able to fix most of them by adding redirects.

I hope Netlify can sustain their generous free plan. Luckily there are plenty of options to host a static website so even if Netlify goes under, it’ll be easy to move somewhere else, like Firebase Hosting, surge.sh, GitHub pages, GitLab pages and many others.

Using Notion as Content Management System

The easier it is to write, the more I write.

Using markdown files fails the “as easy as possible” part.

In absolute terms, creating a new markdown file doesn’t take much time.

In practice it’s enough friction to deter me from writing. In my worst year I only wrote 1 new blog post.

In a perfect world I would open an app and start writing. When I’m done writing I would publish with a click of a button.

Enter Notion

Notion is as close to a perfect writing tool as it gets: open a page and start writing.

The problem is: all that content is trapped in Notion. You can publish (publicly share) a page but I want more flexibility:

I want to host on my own domain; my website is partly a tool for marketing myself
I want integration with Google Analytics
I want a custom design
I want to provide rss feed

Thankfully I’m a programmer: if something can be done with software, I can do it.

I started a one-man Notion Liberation Front. My goal: liberate content trapped inside Notion.

I reverse-engineered their API, wrote a Go library and after another round of code refactoring I had my blog powered by Notion thanks to this Go program.

I imported my old blog posts into Notion. They have a decent markdown importer although I did have to do some cleaning up.

I went even further and published most of my notes to my website as well. I still have many private notes in Notion but most of them can be just as well be publicly visible.

There’s no way I could do publish so much content without Notion.

I also automated publishing by using cron functionality in Travis CI. Every day a script downloads latest content from Notion, caches it in git repository and re-generates a website.

Everything now runs on auto-pilot. I can just write new articles in Notion and my website will be automatically updated every day.

How I reverse engineered Notion API

2018-07-23T00:00:00Z

Notion is a great tool for writing but the content is trapped inside the web app.

The company is working on an official API but I’m impatient.

This article describes how I reverse engineered their API and created a Go library notionapi.

It all began with a failure.

My first attempt at extracting notion content was traditional web scraping.

I found a Python script that uses Selenium to recursively spider a Notion page and publish it to Firebase Hosting.

I ported it to Node to use Puppeteer (better technology than Selenium).

While it worked this approach is limited to getting a verbatim HTML of the pages as they are rendered by the Notion application.

I wanted to be able to change the look of the page, add elements like footers and headers and navigation bar.

I briefly considered trying to reconstruct the structure of the page from rendered HTML but at best that would be a lot of ugly guesswork.

The lightbulb moment

Modern Single Page Applications (SPA) work by getting data from the server in structured format (most often JSON) and rendering HTML in the browser with JavaScript.

A trip to Chrome Dev Tools confirmed that Notion works like that.

When loading a Notion page I saw XHR requests like /api/v3/getRecordValues and /api/v3/loadPageChunk.

Lucky for me the API is not obfuscated. It returns responses as JSON data. It isn’t hard to figure out the meaning of fields.

Working with the original JSON structure is much easier that trying to reconstruct it from rendered HTML.

Building tools

I could have looked at API requests between client and server in Chrome dev tools but it’s not the best workflow.

Instead I wrote node.js script that logs all XHR requests that web browser makes when rendering a given page.

That has several advantages over using dev tools:

I could filter out requests to third-party services like amplitude, fullstory and intercom
I could filter out requests that are not interesting like /api/v3/ping
I could pretty-print JSON
I could write captured traffic to a file for further analysis

Here’s the script:

https://gist.github.com/kjk/f33bc37d6ca8282b5c52b17391384693

The big picture analysis

After looking at captured data, the structure of Notion content is not complicated.

Everything, including a top-level page, is a block.

Blocks are identified by a unique id which looks like a standard UUID format.

Blocks are arranged into a tree i.e. some blocks have children.

Blocks have metadata, like creation time, last edit time, version etc.

There are different kinds of blocks: a page, text, todo item, list item etc.

Some blocks have properties specific to that block type. For example a page block has title property.

To get the content of a page we start with its UUID which we can find out because it’s last part of the URL of the page.

We can issue /api/v3/getRecordValues API to get list of blocks in the page and then /api/v3/loadPageChunk to get content of those blocks.

Majority of work was figuring out what kinds of blocks there are, how are they represented in JSON and writing code to to retrieve the data and present it in a format that is easier to work with than the raw data returned by the server.

Testing different kinds of blocks

Notion page consist of different kinds of blocks and we need to know how each block is represented in JSON response.

To investigate it systematically, I’ve created a test page for each kind of block and used the request logging script to look at JSON returned by the server for that block.

Writing Go library

Next step was writing a Go library.

I captured sample JSON responses from getRecordValues and loadPageChunk and used Quicktype to generate Go structures.

I had to tweak them a bit to accommodate variations in JSON structure.

The rest of the effort was writing a helper function that abstracts the details of HTTP requests and returns an easy to use struct describing a notion page.

There result of that work is notionapi Go package.

Using the library in practice

This was not just an academic exercise.

This blog was powered by markdown files I stored in GitHub repository.

My goal was to move the content to Notion, so that I can edit it more easily, convert it to HTML and publish as my website/blog.

You can see the code here.

The high-level structure of the code:

I use my Go notionapi library to download the content from Notion
I cache downloaded data and store them in git repository. This is to make sure I have a copy of data even if Notion disappears, to make it faster to tweak publishing code (no need to re-download) and to be nicer to Notion server (no re-downloads unless I absolutely have to)
I convert Notion data to HTML, wrap it in templates for my pages and write HTML files to disk
I deploy to Netlify

Advanced web spidering with Puppeteer

2018-07-18T00:00:00Z

Puppeteer is a node.js library that makes it easy to do advanced web scraping and spidering.

Older generation of web scraping and spidering tools would grab and analyze HTML pages as returned by a web server.

It doesn’t work well anymore because less and less website are static HTML pages. Today websites are often applications written in JavaScript that generate HTML on the client, not the server.

To get the final HTML output your scraper needs to run that JavaScript.

That used to be very difficult but Puppeteer makes it easy.

Puppeteer uses Chrome to run web application and uses CDP (Chrome DevTools Protocol) to access the webpage.

This article describes some more advanced techniques but let’s start with basic example first.

Save web page to a file

First install the library:

yarn add puppeteer when using yarn
npm --save puppeteer when using npm

This is the simplest possible usage of Puppeteer:

navigate to a page of interest
get content of the webpage as HTML and save it to a file

const puppeteer = require("puppeteer");
const fs = require("fs");

async function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
	await page.goto("https://www.google.com/", { waitUntil: "networkidle2" });
	// hacky defensive move but I don't know a better way:
	// wait a bit so that the browser finishes executing JavaScript
	await page.waitFor(1 * 1000);
	const html = await page.content();
	fs.writeFileSync("index.html", html);
	await browser.close();
}

run();

Handling failures

What if a url you tried to load didn’t exist?

The web server will return the ‘Not Found’ page with HTTP status code 404 in the response. The above script would treat such page as a perfectly valid response.

Most times you want to handle this as an error case.

For example, if you’re writing a bot that checks for broken links, you want to distinguish 404 NotFound response from 200 Ok response.

In HTTP protocol status codes 4xx and 5xx indicate errors. 2xx indicate success and 3xx indicate successful redirection.

Puppeteer provides Page.setRequestInterception(true) hook for intercepting HTTP requests before they happen as well as inspecting completed HTTP responses.

Here’s a program that prints information about all HTTP requests and responses:

const puppeteer = require("puppeteer");

async function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
	const mainUrl = "https://blog.kowalczyk.info/pas"
	let mainUrlStatus;
  await page.setRequestInterception(true);
  page.on("request", request => {
    const url = request.url();
    console.log("request url:", url);
    request.continue();
  });
  page.on("requestfailed", request => {
    const url = request.url();
    console.log("request failed url:", url);
  });
  page.on("response", response => {
    const request = response.request();
    const url = request.url();
    const status = response.status();
    console.log("response url:", url, "status:", status);
		if (url === mainUrl) {
			mainUrlStatus = status;
		}
  });
  await page.goto(mainUrl);
	console.log("status for main url:", mainUrlStatus);
  const html = await page.content();
  await browser.close();
}

run();

Here’s what it’ll print:

$ node test.js
request url: https://blog.kowalczyk.info/pas
response url: https://blog.kowalczyk.info/pas status: 404
request url: https://fonts.googleapis.com/css?family=Roboto:400,700&subset=latin,latin-ext
response url: https://fonts.googleapis.com/css?family=Roboto:400,700&subset=latin,latin-ext status: 200
request url: https://fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmWUlfChc9AMP6lQ.ttf
request url: https://fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu7GxPKTU1Kg.ttf
response url: https://fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu7GxPKTU1Kg.ttf status: 200
response url: https://fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmWUlfChc9AMP6lQ.ttf status: 200
status for main url: 404

Notice that fetching a page also fetches all resources used by that page, just like in a web browser. For that reason to find out status code for the url we requested, we have to remember it in a variable in response hook.

requestfailed hook is for errors on network connection level e.g. DNS resolution failed, there’s not network at all, network connection got interrupted etc.

See `console.log` from inside the browser

Your JavaScript code is executed in two different contexts:

main script is executed in node.js. In that context console.log("foo") prints to shell
scripts provided to Page.evaluate method are serialized to text, sent to the browser via Chrome DevTools Protocol and executed inside the browser. In that context console.log("foo") prints to browser console, which you can’t see.

To see what console.log prints in the browser, you can hook it and re-log to shell:

const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = "https://blog.kowalczyk.info/";

// this hooks `console.log()` in the browser
page.on("console", msg => {
  console.log("The whole message:", msg.text());
  console.log("\nEach argument:");
  for (let arg of msg.args()) {
    // arg is a Promise returning value of type JSHandle
    // https://pptr.dev/#?product=Puppeteer&show=api-class-jshandle
    arg.jsonValue().then(v => {
      console.log(v);
    });
  }
});
await page.goto(url);
await page.evaluate(() => {
  // This is executed inside the browser so not visible in our script
  // unless we hook 'console' events
  console.log("Message from the browser", 5);
});
await browser.close();

Quickly testing `evaluate` scripts

It’s slow to test browser script executed via Page.evaluate because you have to start the browser, load the page etc.

To test scripts faster I test them directly in the browser, using excellent Chrome dev tools.

My process is:

prepare the script, in IIFE form, in the editor
copy & paste in console window in Chrome dev tools

What is IIFE form? To avoid conflicts with JavaScripts state from previous runs I wrap the code inside Immediately Invoked Function Expression:

function() {
  // code here is isolated from things outside this function
  console.log("My script");
  // ... my script

	// when debugging I can trigger JavaScript debugger from inside the script
	// with debugger statement:
	debugger;
}() // immediately invoke the function

It’s faster to iterate on code this way. You can also use browser’s JavaScript debugger.

As shown in the snippet, I can also trigger the debugger for single-stepping through the code with debugger; statement.

Study Puppeteer API

Now that you’ve seen a few advanced uses of Puppeteer, you should study its API a bit to learn what else is possible. CDP is very powerful:

Page class allows hooking many events, reading and setting cookies, simulating interaction like mouse clicks etc.
Tracing class allows creating a trace file for future inspection in Chrome DevTools
Worker class allows interacting with Web Workers
Coverage class allows measuring JavaScript and CSS coverage
Keyboard class allows simulating keyboard events

Other CDP tools and libraries

Puppeteer is not the only tool that takes advantage of Chrome DevTools protocol. A bunch of them is listed in Awesome Chrome DevTools.

57 MicroConf videos for self-funded software businesses

2017-12-24T00:00:00Z

MicroConf is a conference for small/indie/self-funded software businesses. Many of their talks are available on Vimeo but not well indexed. They have a better index (and another here) on their website, but also not great.

This is a list of videos and a bit of info about each video. I hope this will help you find a video useful for you.

1. Lizards Thru Doorways: Proven ways to Widen Your Funnel Using Just Your CTAs

Video https://vimeo.com/132932415 by Joanna Wiebe, 39 min.

Joanna is a copywriter for hire i.e. she writes the text of emails, sales pages etc. for other businesses.

Her talk is about writing better copy. Tips and case studies. She also has a lot of free (and paid) tutorials on copywriting on her website https://copyhackers.com/. Recap: https://kaidavis.com/microconf-2015/joanna/

2. An Inside Story of Self-Funded SaaS Growth

Video https://vimeo.com/132932414 by Rob Walling, 51 min.

This is a history of Rob’s SaaS startup https://www.drip.com(email marketing software). He also wrote about it online: https://wpcurve.com/bootstrapped-drip-into-a-7-figure-saas-business/.

Recap: https://kaidavis.com/microconf-2015/rob/

3. Accelerating Growth: Grow Faster Without Working Yourself to Death

Video https://vimeo.com/132139313 by Hiten Shah, 54 min

Recap: https://kaidavis.com/microconf-2015/hiten/

4. Amplification - Content Marketing That Works

Video https://vimeo.com/132139312 by Justin Jackson, 11 min.

Content marketing is about writing blog posts, doing podcasts and videos to attract people to your website so that you can pitch them on your products.

5. How to Aggressively Acquire Customers for your SaaS with an Efficient Outbound Sales Process

Video https://vimeo.com/132139308 by Jordan Gal, 12 min.

Outbound sales process is using email and phone calls (cold-calling) to find customers.

6. How I Designed Our (High-Touch) Sales & Onboarding to Run Without Me (2014)

Video https://vimeo.com/132139307 by Brian Casel, 11 min.

Based on experience from his startup Restaurant Engine, talks about converting leads (from his inbound traffic) into customers and how to automate it.

Recap: https://www.phraseexpander.com/microconf-2014/brian-casel-automated-content-marketing-machine-microconf-2014/

7. Creating an Explosive Email Course

Video https://vimeo.com/132139306 by David Kadavy, 13 min.

Talks about creating email course in order to get more leads (build email list).

8. Do This, Not That: Creating an Exceptional Customer Support Experience from Day 1 (2015)

Video https://vimeo.com/131466750 by Sarah Hattter, 29 min.

Sarah runs CoSupport, which teaches companies how to create good customer support.

Recap: https://kaidavis.com/microconf-2015/sarah/

9. How to Build a Solo SaaS Sales Machine (2015)

Video https://vimeo.com/131441010 by Steli Efti, 56 min.

Recap: https://kaidavis.com/microconf-2015/steli/

10. Lessons Learned Building a WordPress Plugin Business to $10k/month

Video https://vimeo.com/130984501 by Phil Derksen, 15 min.

11. How To Systematically Fight SaaS Churn And Win (2015)

Video https://vimeo.com/130984497 by Robert Graham, 13 min.

Recap: https://bootstrapping.io/microconf-2015/robert/

12. How Bookkeeping Tripled My Revenue in Two Years (and Other Unexpected Cash Flow Advice) (2015)

Video https://vimeo.com/130984492 by Jesse Mecham, 48 min.

Recap: https://kaidavis.com/microconf-2015/jesse/

13. Growing Your Userbase with Better Onboarding (2015)

Video https://vimeo.com/130797721 by Samuel Hulick, 30 min.

Recap: https://kaidavis.com/microconf-2015/samuel/

14. Q&A and Smart Bear Live (2015)

Video https://vimeo.com/130797720 by Jason Cohen, 1hr 7 min.

Recap: https://kaidavis.com/microconf-2015/jason/

15. Micro-ISV to Micro-acquisition: Selling my 11-year one-man software business (2015)

Video https://vimeo.com/130797718 by Jacob Thurman. 11 min.

Recap: https://kaidavis.com/microconf-2015/jacob/

16. How to start a SaaS business in any market with no idea or connections, using only excel, email & phone (2015)

Video https://vimeo.com/130797716 by Pawel Brzeminski, 12 min.

Recap: https://kaidavis.com/microconf-2015/pawel/

17. How I Grew My Productized Consulting Offering To $100K YRR In 12 Months (2015)

Video https://vimeo.com/130797714 by Einer Vollset, 9 min.

Recap: https://kaidavis.com/microconf-2015/einar/. A story of building http://www.appaftercare.com/

18. The 3 Week Startup (2015)

Video https://vimeo.com/130499701 by Keith Perhac, 10 min.

Recap: https://kaidavis.com/microconf-2015/keith/

Tactical tips about how (and why) to build SaaS quickly (in 1 week).

19. Leveling Up (2015)

Video https://vimeo.com/129913527 by Patrick McKenzie, 1 hr.

Recap: https://kaidavis.com/microconf-2015/patio11/

20. How to Validate Your Idea and Launch to $7k in Recurring Revenue (2014)

Video https://vimeo.com/96267945 by Rob Walling, 58 min.

Recap: http://www.christophengelhardt.com/rob-walling-validate-idea-launch-7k-recurring-revenue-microconf-europe-2014/

Recap: https://www.phraseexpander.com/microconf-2014/rob-walling-validate-idea-launch7k-microconf-2014/

21. 6 Tricks That Helped Me Triple My SaaS’ Growth Rate (2014)

Video https://vimeo.com/95680318 by Brennan Dunn, 41 min.

Recap: http://www.christophengelhardt.com/brennan-dunn-6-tricks-helped-triple-saas-growth-rate-microconf-europe-2014/

Recap: https://www.phraseexpander.com/microconf-2014/brennan-dunn-triple-saas-growth-rate-microconf-2014/

22. Lifting the Veil: The Data Behind Successful Product Launches (2014)

Video https://vimeo.com/95680316 by Ryan Delk, 13 min.

Recap: https://www.phraseexpander.com/microconf-2014/ryan-delk-data-behind-successful-product-launches-microconf-2014/

23. 3 Habits for Building (and Growing) a Product Empire (2014)

Video https://vimeo.com/95680313 by Nathan Barry, 38 min.

Recap: https://www.phraseexpander.com/microconf-2014/nathan-barry-3-habits-grow-product-empire-microconf-2014/

24. From Zero to $4M/year Without Quora, Hacker News, or Mixergy (2014)

Video https://vimeo.com/95653848 by Jesse Mecham, 50 min.

Recap: https://www.phraseexpander.com/microconf-2014/jesse-mecham-from-zero-4m-microconf-2014/

25. 10 Business Questions Every Entrepreneur Needs to Ask Their Analytics (And Where to Find the Answers) (2014)

Video https://vimeo.com/95653743 by Annie Cushing, 46 min.

Recap: https://www.phraseexpander.com/microconf-2014/annie-cushing-10-business-questions-analytics-microconf-2014/

26. How To Slay the Customer Support Beast (2014)

Video https://vimeo.com/95052087 by Ian Landsman, 44 min.

Recap: https://www.phraseexpander.com/microconf-2014/ian-landsman-slay-customer-support-beast-microconf-2014/

27. Don’t Burn-up in the Launch: Staying Emotionally and Relationally Healthy While Launching Your Startup (2013)

Video https://vimeo.com/72211933 by Sherry Walling, 19 min.

Recap: http://www.christophengelhardt.com/sherry-walling-dont-burn-up-in-the-launch-microconf-2013/

Recap: http://www.christophengelhardt.com/sherry-walling-dont-burn-up-in-the-launch-staying-emotionally-and-relationally-healthy-while-launching-your-startup-microconf-europe-2013/

28. Playing the Long Game: Making Entrepreneurship a Sustainable Life (2014)

Video https://vimeo.com/95052086 by Sherry Walling, 56 min.

Recap: https://www.phraseexpander.com/microconf-2014/sherry-walling-entrepreneurship-sustainable-life-microconf-2014/

29. From Idea to $5k/mo in 5 Months (2014)

Video https://vimeo.com/94623532 by Josh Pigford, 12 min.

History of building https://baremetrics.com/

Recap: https://www.phraseexpander.com/microconf-2014/josh-pigford-idea-5k-5months-microconf-2014/

30. UX Basics That Convert Users into Customers (2014)

Video https://vimeo.com/94623531 by Samuel Hulick, 7 min.

Recap: https://www.phraseexpander.com/microconf-2014/samuel-hulick-uxbasics-convert-user-into-customers-microconf-2014/

31. Business Hacks & Epic Wins (2014)

Video https://vimeo.com/94623529 by Mike Taber, 1 hr 4 min.

Recap: https://www.phraseexpander.com/microconf-2014/mike-taber-business-hacks-epic-wins-microconf-2014/

32. How to Grow Your Self-Funded Business Faster (2014)

Video https://vimeo.com/94187473 by Hiten Shah, 43 min.

Recap: https://www.phraseexpander.com/microconf-2014/hiten-shah-grow-your-self-funded-business-faster-microconf-2014/

33. Designing the Ideal Bootstrapped Business (2013)

Video https://vimeo.com/74338272 by Jason Cohen, 1hr 5 min.

Recap: http://www.christophengelhardt.com/jason-cohen-microconf-2013/

34. Dude. Marketing is not your thing (2013)

Video https://vimeo.com/72461554 by Jody Burgess, 13 min.

Recap: http://www.christophengelhardt.com/dude-marketing-is-not-your-thing-microconf-2013/

35. How a Non-Technical Founder Built a 6 Figure Saas App Using Only Free Public Data Sources (2013)

Video https://vimeo.com/72461551 by Brecht Palombo, 15 min.

Recap: http://www.christophengelhardt.com/brecht-palombo-how-a-non-technical-founder-built-a-6-figure-saas-app-using-only-free-public-data-sources-microconf-2013/

36. Finding Customers Who Are 100x More Valuable Without 100x the Effort (2013)

Video https://vimeo.com/72456666 by Erica Douglass, 48 min.

Recap: http://www.christophengelhardt.com/erica-douglass-how-to-measurably-move-the-needle-with-your-software-company-microconf-2013/

37. Bootstrapping an App Business (2013)

Video https://vimeo.com/72260021 by Patrick Thompson, 16 min.

Recap: http://www.christophengelhardt.com/patrick-thompson-bootstraping-an-app-business-microconf-2013/

38. Building Things To Help Sell The Things You Build (2013)

Video https://vimeo.com/72140534 by Patrick McKenzie, 1 hr 13 min.

Recap: http://www.christophengelhardt.com/patrick-mckenzie-building-things-to-help-sell-the-things-you-build-microconf-2013/

39. Killer Content Marketing (2013)

Video https://vimeo.com/72081980 by Hiten Shah, 44 min.

Recap: http://www.christophengelhardt.com/hiten-shah-killer-content-marketing-microconf-2013/

40. SEO Demystified: Practical Techniques That Produce Astonishing Results (2013)

Video https://vimeo.com/71547333 by Dave Collins, 57 min.

Recap: http://www.christophengelhardt.com/dave-collins-seo-demystified-practical-techniques-that-produce-astonishing-results-microconf-europe-2013/

41. Shut Up and Take My Money: How to Find Business Ideas Customers Want (2013)

Video https://vimeo.com/71250239 by Josh Kaufman, 42 min.

Recap: http://www.christophengelhardt.com/josh-kaufman-shut-up-and-take-my-money-how-to-find-business-ideas-customers-want-microconf-2013/

42. Copywriting that Converts: How to Sell Without Selling Your Soul (2013)

Video https://vimeo.com/71008640 by Joanna Wiebe, 50 min.

Recap: http://www.christophengelhardt.com/joanna-wiebe-copywriting-that-converts-microconf-2013/

Another: http://www.workhappy.net/2013/05/how-you-can-drastically-improve-the-copy-on-your-site-even-if-you-only-have-5-minutes.html

43. Lean Analytics: How to Focus on What Matters (2013)

Video https://vimeo.com/70981922 by Ben Yoskovitz, 43 min.

Recap: http://www.christophengelhardt.com/ben-yoskovitz-measure-what-matters-microconf-2013/

44. How to Sell Anything to Anyone (2013)

Video https://vimeo.com/70901902 by Mike Taber, 41 min.

Recap: http://www.christophengelhardt.com/mike-taber-microconf-2013-a-k-a-the-liquor-fairy/

45. How to 10x in 15 months (2013)

Video https://vimeo.com/70901901 by Rob Walling, 51 min.

Recap: http://www.christophengelhardt.com/rob-walling-how-to-10x-in-15-months-microconf-2013/

46. Cheap and Easy Customer Support (2012)

Video https://vimeo.com/51306662 by Sarah Hattter, 40 min.

47. Google AdWords: Stop Losing & Start Exploiting (Really) (2012)

Video https://vimeo.com/51187193 by Dave Collins, 58 min.

48. How I Bootstrapped and Sold My Software Company By Maxing Out My Credit Cards (2012)

Video https://vimeo.com/50372726 by Bill Bither, 46 min.

49. From Idea to 7 Figures in 2 years: The Story of Woothemes

Video https://vimeo.com/50209990 by Adii Pienaar, 45 min.

50. Ask Me Anything (2012)

Video https://vimeo.com/49023444 by Peldi Guilizzoni, 48 min.

51. If You Don’t Like Drunk Frat Boys, Don’t Open an Irish Pub… (2012)

Video https://vimeo.com/48962410 by Amy Hoy, 43 min.

Transcript: https://stackingthebricks.com/a-customer-is-your-mvp-a-video-talk-on-making-products-that-sell/

52. Growth Hacking (2012)

Video https://vimeo.com/48592609 by Dan Martell, 39 min.

53. Losers Have Goals, Winners Have Systems (2012)

Video https://vimeo.com/48571431 by Mike Taber, 27 min.

54. Naked Business: How Honesty Makes You More Money (2012)

Video https://vimeo.com/48549019 by Jason Cohen, 55 min.

55. Finding Your Flywheel (2012)

Video https://vimeo.com/47465229 by Rob Walling, 1 hr 04 min.

56. How to Engineer Marketing Success (2012)

https://vimeo.com/47311461 by Patrick McKenzie, 51 min.

57. More Lessons I’ve Learned as a Serial Entrepreneur (2012)

https://vimeo.com/46893380 by Hiten Shah, 48 min.

Related resources:

Guide to predefined macros in C++ compilers (gcc, clang, msvc etc.)

2017-11-07T00:00:00Z

When writing portable C++ code you need to write conditional code that depends on compiler used or the OS for which the code is written.

Here’s a typical case:

#if defined (_MSC_VER)
// code specific to Visual Studio compiler
#endif

To perform those checks you need to check pre-processor macros that various compilers set.

It can either be binary is defined vs. is not defined check (e.g. __APPLE__) or checking a value of the macro (e.g. _MSC_VER defines version of Visual Studio compiler).

This document describes macros set by various compilers.

Checking for OS (platform)

To check for which OS the code is compiled:

Linux and Linux-derived           __linux__
Android                           __ANDROID__ (implies __linux__)
Linux (non-Android)               __linux__ && !__ANDROID__
Darwin (Mac OS X and iOS)         __APPLE__
Akaros (http://akaros.org)        __ros__
Windows                           _WIN32
Windows 64 bit                    _WIN64 (implies _WIN32)
NaCL                              __native_client__
AsmJS                             __asmjs__
Fuschia                           __Fuchsia__

Checking the compiler:

To check which compiler is used:

Visual Studio       _MSC_VER
gcc                 __GNUC__
clang               __clang__
emscripten          __EMSCRIPTEN__ (for asm.js and webassembly)
MinGW 32            __MINGW32__
MinGW-w64 32bit     __MINGW32__
MinGW-w64 64bit     __MINGW64__

Checking compiler version

gcc

__GNUC__ (e.g. 5) and __GNUC_MINOR__ (e.g. 1).

To check that this is gcc compiler version 5.1 or greater:

#if defined(__GNUC__) && (__GNUC___ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ >= 1))
// this is gcc 5.1 or greater
#endif

Notice the chack has to be: major > 5 || (major == 5 && minor >= 1). If you only do major == 5 && minor >= 1, it won’t work for version 6.0.

clang

__clang_major__, __clang_minor__, __clang_patchlevel__

Visual Studio

_MSC_VER and _MSC_FULL_VER:

VS                        _MSC_VER   _MSC_FULL_VER
1                         800
3                         900
4                         1000
4                         1020
5                         1100
6                         1200
6 SP6                     1200    12008804
7                         1300    13009466
7.1 (2003)                1310    13103077
8 (2005)                  1400    140050727
9 (2008)                  1500    150021022
9 SP1                     1500    150030729
10 (2010)                 1600    160030319
10 (2010) SP1             1600    160040219
11 (2012)                 1700    170050727
12 (2013)                 1800    180021005
14 (2015)                 1900    190023026
14 (2015 Update 1)        1900    190023506
14 (2015 Update 2)        1900    190023918
14 (2015 Update 3)        1900    190024210
15 (2017 Update 1 & 2)    1910    191025017
15 (2017 Update 3 & 4)    1911
15 (2017 Update 5)        1912

More information:

MinGW

MinGW (aka MinGW32) and MinGW-w64 32bit: __MINGW32_MAJOR_VERSION and __MINGW32_MINOR_VERSION

MinGW-w64 64bit: __MINGW64_VERSION_MAJOR and __MINGW64_VERSION_MINOR

Checking processor architecture

gcc

The meaning of those should be self-evident:

__i386__
__x86_64__
__arm__. If defined, you can further check:
- __ARM_ARCH_5T__
- __ARM_ARCH_7A__
__powerpc64__
__aarch64__

Tutorial for github.com/kjk/flex Go package (implementation of CSS flexbox algorithm)

2017-08-04T00:00:00Z

Package github.com/kjk/flex implements CSS flexbox layout algorithm in Go.

It’s a pure Go port of Facebook’s Yoga C library.

High-level API overview

Despite implementing CSS flexbox spec, it isn’t tied to CSS/HTML in any way. Yoga, for example, can be integrated with iOS app and used to layout UIView hierarchy.

The library works on abstract tree of nodes. In HTML a node would correspond to a block element like a div. When used in Cocoa app, a node could represent UIView or NSView.

When used on windows, it could represent a HWND-based control.

The high-level use is:

create a tree of nodes that represents a layout you want to represent
set desired flexbox properties on each node using node.StyleSet*() functions
call flex.CalculateLayout(rootNode, parentWidth, parentHeight, direction)
each node is now measured and positioned so you can e.g. size and position widgets associated with each node. You can get the size on position of nodes with node.LayoutGet*() functions
when layout hierachy changes (e.g. the node represents a label and you’ve changed its text, which changes it’s intrinsic size), call node.MarkDirty() and CalculateLayout() to re-calculate new size/position of the nodes

An exmple

Let’s assume that we want to re-create the following HTML layout:

<div id="percentage_multiple_nested_with_padding_margin_and_percentage_values" style="width: 200px; height: 200px; flex-direction: column;">
  <div style="flex-grow: 1; flex-basis: 10%; min-width: 60%; margin: 5px; padding: 3px;">
    <div style="width: 50%; margin: 5px; padding: 3%;">
      <div style="width: 45%; margin: 5%; padding: 3px;"></div>
    </div>
  </div>
  <div style="flex-grow: 4; flex-basis: 15%; min-width: 20%;"></div>
</div>

The equivalent Go code is:

config := flex.NewConfig()

root := flex.NewNodeWithConfig(config)
root.StyleSetWidth(200)
root.StyleSetHeight(200)

rootChild0 := flex.NewNodeWithConfig(config)
rootChild0.StyleSetFlexGrow(1)
rootChild0.StyleSetFlexBasisPercent(10)
rootChild0.StyleSetMargin(EdgeLeft, 5)
rootChild0.StyleSetMargin(EdgeTop, 5)
rootChild0.StyleSetMargin(EdgeRight, 5)
rootChild0.StyleSetMargin(EdgeBottom, 5)
rootChild0.StyleSetPadding(EdgeLeft, 3)
rootChild0.StyleSetPadding(EdgeTop, 3)
rootChild0.StyleSetPadding(EdgeRight, 3)
rootChild0.StyleSetPadding(EdgeBottom, 3)
rootChild0.StyleSetMinWidthPercent(60)
root.InsertChild(rootChild0, 0)

rootChild0Child0 := flex.NewNodeWithConfig(config)
rootChild0Child0.StyleSetMargin(EdgeLeft, 5)
rootChild0Child0.StyleSetMargin(EdgeTop, 5)
rootChild0Child0.StyleSetMargin(EdgeRight, 5)
rootChild0Child0.StyleSetMargin(EdgeBottom, 5)
rootChild0Child0.StyleSetPaddingPercent(EdgeLeft, 3)
rootChild0Child0.StyleSetPaddingPercent(EdgeTop, 3)
rootChild0Child0.StyleSetPaddingPercent(EdgeRight, 3)
rootChild0Child0.StyleSetPaddingPercent(EdgeBottom, 3)
rootChild0Child0.StyleSetWidthPercent(50)
rootChild0.InsertChild(rootChild0Child0, 0)

rootChild0Child0Child0 := flex.NewNodeWithConfig(config)
rootChild0Child0Child0.StyleSetMarginPercent(EdgeLeft, 5)
rootChild0Child0Child0.StyleSetMarginPercent(EdgeTop, 5)
rootChild0Child0Child0.StyleSetMarginPercent(EdgeRight, 5)
rootChild0Child0Child0.StyleSetMarginPercent(EdgeBottom, 5)
rootChild0Child0Child0.StyleSetPadding(EdgeLeft, 3)
rootChild0Child0Child0.StyleSetPadding(EdgeTop, 3)
rootChild0Child0Child0.StyleSetPadding(EdgeRight, 3)
rootChild0Child0Child0.StyleSetPadding(EdgeBottom, 3)
rootChild0Child0Child0.StyleSetWidthPercent(45)
rootChild0Child0.InsertChild(rootChild0Child0Child0, 0)

rootChild1 := flex.NewNodeWithConfig(config)
rootChild1.StyleSetFlexGrow(4)
rootChild1.StyleSetFlexBasisPercent(15)
rootChild1.StyleSetMinWidthPercent(20)
root.InsertChild(rootChild1, 1)
flex.CalculateLayout(root, flex.Undefined, flex.Undefined, DirectionLTR)

After CalculateLayout we can see the position of each node e.g.:

fmt.Printf("root left: %f\n", root.LayoutGetLeft()) // 0
fmt.Printf"root top: %f\n", root.LayoutGetTop()) // 0
fmt.Printf("root width: %f\n", root.LayoutGetWidth()) // 200
fmt.Printf("root height: %f\n", root.LayoutGetHeight()) // 200

To see example for every flexbox property, look into github.com/facebook/yoga/gentest/fixtures. Their names hint at which properties are being used.

Each file there has corresponding *_test.go file in github.com/kjk/flex directory which shows how to express it in Go.

Size of root’s parent

Notice that in this particular example we used flex.Undefined as both height and width of the parent container.

Imagine you’re using flex to implment layout for a dekstop application where each flex.Node represents a control inside the window.

Window is the parent of root node.

In response to user resizing the window, you want to pass width/height of the window to flex.CalculateLayout().

When you create the window initially, you might do the reverse: pass flex.Undefined as width/height of parent container and then use the size of root node as the size of the window, to size it to its content.

Measure function

Imagine that a node represents an OS button. The button has some intrisic size dictated by its text.

To represent that size flex allows setting a measuring function with node.SetMeasureFunc(measureFunc MeasureFunc). It’s definition is:

type MeasureFunc func(node *Node, width float32, widthMode MeasureMode, height float32, heightMode MeasureMode) Size

The functions takes a hint width/height which is the size of parent container and returns intrinsic size of node.

This is usefule e.g. when a node represents a paragraph of text. When you know width of the parent container, you can break it into multi-line text.

If measuring function needs some state, you can use node.Context to store it.

Solo founders with profitable businesses, collected stories

2017-06-23T00:00:00Z

People sometimes wonder: can I have a successful business as a single founder?

The answer is: yes.

This is a collection of solo-preneur success stories (with occasional 2 people bands).

I only include businesses generating significant revenues. In this context it’s around $5k/mo or more, enough to replace full-time salary.

Maybe it’ll inspire you to start your own, solo business.

Before you get too excited, keep the following in mind.

This list is the pinnacle of survivorship bias. Solo-preneur software business is not different that any other business, and most businesses fail. You have 10-20% chance of success so pick your idea wisely, work hard and if you fail, do it again.

At the same time, this is only tip of the iceberg. Those are the stories that people shared, cribbed from a few online sources.

There are 100x more successful solo founders that don’t share their numbers publicly. A silent majority of successful solo businesses.

1. Anonymous making $750k/year with a desktop app, sold via his website

Source: https://news.ycombinator.com/item?id=13168965

Adivice:

make a desktop app that you can sell for $50-$300.
attack a large market that has stagnated or has entranched players with lousy apps i.e. make a better mousetrap.
electron is a good technology to write such app.

Important factors of success:

SEO
good reputation
big market
staying alive long enough for word of mouth to kick in

2. JollyTurns

Source: https://news.ycombinator.com/item?id=13170798

Web/iOS/Android app for ski resorts.

Income: unknown.

How it makes money: in-app purchase on iOS/Android ($1 per ski resort or $15 for all of them, see https://itunes.apple.com/us/app/jollyturns/id719208522)

First version (iOS only) released in Dec 2013 after 2 ¹⁄₂ years of work (https://jollyturns.com/blog/first-public-release).

Insight: code is the easy part, marketing is hardest.

Tried to find a partner in SV but couldn’t. Wrote code himself, hired people to collect data about ski resorts.

3. CRM plugin that finds location of customer’s offices based on address of the hotel you’re traveling to

Source: https://news.ycombinator.com/item?id=13167930

Sold for $50. “made pretty good money”.

4. Niche app, $200k/year after 6 years

Source: https://news.ycombinator.com/item?id=13170346

Most likely USB driver that allows using USB devices remotely over a network (https://www.virtualhere.com, https://forum.openwrt.org/search.php?action=show_user_posts&user_id=129043 is same user name as on HN post).

No marketing, gets sales via word of mouth and internet searches.

5. Website templates, $100k/year profit

Source: https://news.ycombinator.com/item?id=13935663

Sold on https://themeforest.net/. Working about 4hrs a week.

6. Shopify plugins

Source: https://news.ycombinator.com/item?id=13167990

Makes enough money to hire full-time developer for onboarding of new clients, support and documentation. Does product development himself.

Insight: the trick for coming up with product ideas is to first do custom development. When enough clients are willing to spend a few thousand for a personal implementation of something, that’s when you know you have an opportunity to charge 40$ per month for a SaaS version.

7. $20k/mo from 3 niche SaaS products

Source:

Also in the past ran real estate SaaS, making $70k/mo at its height (it then crashed when real estate market crashed).

Advice:

tip for a newbie would be to look for something niche that solves a pain point or allows your customer to make or save money. For example: if your customer can spend $10/mo on your software and make or save $30/mo from that, you will have no problem getting & keeping customers.
don’t focus on becoming a unicorn. You can make some serious money and build a very comfortable life running a $300k to $1m dollar business, and your chances of succeeding at that are much greater.
Look for things outside of tech. There are so many problems to solve in small businesses. Many will say there is no money to make with small businesses.
Learn everything you can about advertising. Get really good at it and be willing to spend money on advertising.
Be willing to kill something off quickly if it doesn’t make money. Test your market early to make sure people will pay for it. I have made the mistake of not doing this and I have lost a lot of money because of it. Now, I need to be able to see a positive ROI on my spend within 3 to 4 months. So if I spent $100 to acquire a customer, I want to be able to get that back + more within 3 to 4 months. I know this timeline is probably really short for a VC funded company, but I have always been bootstrapped so don’t have the luxury of risking a longer time-frame for return.
it is not easy! Prepare to put in long hours especially in the beginning. Prepare for it to take a mental toll at times. You will second guess yourself, feel insecure, be consumed oftentimes with your business.

8. Ngrok

Source: https://news.ycombinator.com/item?id=13168185

Full time for past 4 years.

Insight: minimize support by improving UX, documentation, error messages.

9. Football betting analysis/predictions website, £75k / year

Source: https://news.ycombinator.com/item?id=11216868

http://betalyst.com/ makes £75,000 per year in advertising/sponsorship/affiliate revenue.

Gets !25k visitors a month.

Works 2-3 hrs a week.

Website traffic generation:

Android app with 75k users
10% of traffic is from organic search
majority of traffic from email (20k subscribers)

10. Watermarking desktop app for Mac/Windows

Source:

Makes $3k-$5k per month after 5 years.

Sells for $30/$60/$140.

11. Real estate startup, $130k/year of profit

Source: https://news.ycombinator.com/item?id=10887978

Profit:

130k in 2013
$100k in 2014
$140k in 2015

Source of revenues:

30% AdSense
20% users
50% affiliate marketing

Sources of traffic:

50% organic
37% referral
13% direct

No marketing, no blog, no social presence.

12. Dan Grossman, improvely and w3counter, $45k/mo

Source:

https://news.ycombinator.com/item?id=10881301
https://news.ycombinator.com/item?id=9726951 : how he started
https://news.ycombinator.com/item?id=10201718, https://news.ycombinator.com/item?id=8396497 : business tips
https://news.ycombinator.com/item?id=8406049

https://www.w3counter.com/ uses freemium model, people pay subscription for advanced features

https://www.improvely.com/ : SaaS priced $29/$79/$149/$299 / mo.

First customers for improvely came from $100-$200/month AdWords advertising for the first few months and $79/month banner ad on a web stats site bought via BuySellAds.com.

Used SnapEgage chat widget on the website to talk visitors to sign up.

Word of mouth and referrals started quickly after that.

Now referrals are biggest signup drivers.

13. VNC application for Mac and iOS

Sells for $30 on Mac and $20 on iOS.

Source:

14. NomadList, $400k/year revenue

Source:

Revenue source: membership fees for community of digital nomads and remote workers.

15. B2B Windows desktop app, $1 million/year sales

Source: https://news.ycombinator.com/item?id=12066104

Wrote scrach-my-itch app, was side project for 10 years until it started making $120k/mo. Went full time after that. Is in a very crowded niche.

Insight: coding is easy, marketing is hard. Must be persistent.

16. Pinboard, bookmarking web service, $200k/year

Source:

Revenue history: https://blog.pinboard.in/2016/07/pinboard_turns_seven/

17. s3stat, “equivalent of a nice Senior Developer salary”

Source:

18. StoreSlider, $700k in 2016

Source: https://news.ycombinator.com/item?id=14438303

Makes money with affiliate revenue from eBay.

Built with Lumen on PHP 7.1, Nginx, running on Linode.

Source of traffic: word of mouth, social sharing, Google search.

Did a lot of A/B testing to maximize conversion.s.

19. BuiltiWith.com, estimated $12 million/year

Source:

20. tarsnap, backup service, “better than Google salary”

Source: https://news.ycombinator.com/item?id=14442425

21. Sidekiq, $1 million/year

Source: https://www.indiehackers.com/businesses/sidekiq

Open source library for Ruby, a background job framework.

Sells pro version for $950/year and enterprise version. Only needs 800 customer

22. Balsamiq, $2 millions in revenue after 18 months

Source:

It’s no longer a single person but it was created in 2008 by a single person and within 18 monts reached $2 million in revenue.

It’s a desktop app for creating mockups.

23. John Gruber, $32k/mo

Source: https://daringfireball.net/feeds/sponsors/

Makes $8k per week for sponsorship (ads) on his very popular, Apple-oriented website.

24. Sales tracking & CRM app for small business.

Source: https://news.ycombinator.com/item?id=14439284

https://www.bottomlinehq.com/, 6-digit revenue.

Freemium model, $30/year. Web and iOS.

25. https://officesnapshots.com/, full-time salary.

Source: https://news.ycombinator.com/item?id=12065574

Started 9 years ago, full-time for last 4 years.

Revenue: AdSense and later selling his own advertising (ads are sold as $/month and sold in blocks of 1 to 12 months.

26. pubexchange.com

Source:

Stated in 2013, operated solo since then. Profitable since 2014.

In early days pitched his service on linkedin, now people come from referrals.

27. park.io, $125k/mo

Source:

Started in 2014. In his spare time he wrote a script to auto-buy domain when it expires and turned that into paid service by adding registration, payments etc.

Most users find it from either parked domains or word of mouth.

Insight: automate all the things.

28. cronitor, $6k/mo revenue after 3 years

Source:

Started in 2014, written part-time by 2 people. They wrote it because it solved a real problem they had at a startup he worked at.

Marketing tactics:

did ShowHN
answered questions on StackOverflow
added a link from a popular, open-source PHP library they had
created Stackshare page
submitted to startupli.st (site defunct)
submitted to “One Thing Well” website
wrote high-quality docs for SEO (topic-based articles on ‘how to use cronitor to do X’)

Raised prices after 6 months from $7/$20/$50 => $10/$25/$50 and then $24/$70/$150.

29. bugmuncher, $4k/mo revenue

Source:

Web-based bug tracking software.

Started as a side project in 2010, went full time in Nov 2015, reached living wage in Nov 2016.

30. https://info-beamer.com, close to living wage

Source: https://news.ycombinator.com/item?id=13514865

Digital signage for Raspberry PI. Started as a side project, turned into profitable business.

31. Anonymous app, $5k/mo profit on $7k/mo revenue

Source: https://news.ycombinator.com/threads?id=gaeappthrowaway

App hosted on App Engine, ~50 users paying between $30/mo and $500/mo. Analytics API.

32. Wordpress theme, $5k/mo

Source: https://news.ycombinator.com/item?id=8107836

Sold via ThemeForest.

33. Radio Silence, main income

Source:

Mac app (https://radiosilenceapp.com/) that evolved from side project to providing main income for the author. Sells for $9.

Author was able to quit his job.

Business tip: build related free app and host on the same domain.

34. Ryan Clark, 10 Games, $3+ million over 10 years

Source: http://www.gamasutra.com/blogs/RyanClark/20150917/253842/What_Makes_an_Indie_Hit_How_to_Choose_the_Right_Design.php

Working full time since 2004 on his games. Wrote 10 games in 11 years. 8 been profitable, 3 grossed more than $1M.

35. https://PhantomJsCloud.com, ramen profitable for Seattle

Source: https://news.ycombinator.com/item?id=13327835

36. Desktop app, seating planning, $120k+/year

Source:

2 desktop apps for Windows, written in C++.

Over 10 years, sold 40 thousand licenses of first desktop app, the cheapest is $30, which is at least $120k/year.

37. Desktop app in construction industry, making a living

Source: https://news.ycombinator.com/item?id=11659140

21-year old app for Windows, written in Delphi 5.

38. Video games, making a living

Source:

Multi-platform games written in C#, based on Unity game engine, released on Steam. Makes a game every X months.

39. Pinegrow Web Editor, comfortable living

Source:

Desktop web editor built with NWJS/Electron. Started by a single person, grew to 3 full-time people.

Launched in January 2014 after 2.5 years in development, sold $100k the first year. Sells for $49/$79.

Marketing: website and asking for e-mail address when starting the trial to build e-mail database to send promotions to.

Tried Carbon, Google and Reddit ads but was losing money on them.

40. https://ipinfo.io, full-time job

Source:

Wrote and launched in a couple of hours as a response to StackOverflow question about. Posted as a response, forgot about it, it became popular so he implemented paid plans and started charging for it.

41. http://duetapp.com, $3-4k/mo

Self-hosted, web-based invoicing and project management app.

Source: https://news.ycombinator.com/item?id=8630931

42. https://betterexplained.com, $6k/mo after 10 years

A website with math tutorials. Started in 2006. Content is free. Makes money selling ebooks (on Amazon kindle and directly from the website), amazon affiliate links and newsletter sponsorships.

Source:

43. Website with special-interest news, $!0-15k/mo

Money from AdSense.

Source: https://news.ycombinator.com/item?id=8630369

44. $5.5k/mo from Udemy course

Source: https://news.ycombinator.com/item?id=8631035

45. Storemapper, $21k/mo

Source:

46. Brendan Dunn, $451k revenue in 2014 from several products

Source:

His income:

https://doubleyourfreelancing.com/rate/, online course, $207k
http://doubleyourfreelancing.com/leads/, online course, $40k
https://planscope.io/, SaaS app, $71k
http://buildaconsultancy.com/, live classes, $44k from 3 live classes
consulting and coaching, $89k from 6 weeks of consulting

47. Cooking blog, $5-6k/mo

He does the design/programming/marketing/monetization work behind http://www.theyummylife.com, his mother does the writing. Revenue from Amazon affiliates, ads and ebook.

Source:

Revenue from AdSense, 8 million monthly uniques, after 6 years.

Source: https://news.ycombinator.com/item?id=4468067

49. https://www.tiki-toki.com/, $5k/mo

Web and desktop app for creating pretty timelines. Makes money from premium accounts and selling desktop app.

Source: https://news.ycombinator.com/item?id=4469672

50. Webapp in education space, $90k/m

Revenue: AdSsense.

Source: https://news.ycombinator.com/item?id=4468535

51. Large web community, $90-110k/mo

Revenu from subscription, adsense, other ad revenue, license and royalty revenue

Source: https://news.ycombinator.com/item?id=2567487

52. Zencaster, $12k/mo

Zencaster is a web-based tool that helps podcasters record their guests in studio quality.

Source:

53. Workflowy, $800k/year

Jesse Patel learn how to program building https://workflowy.com/. After 9 months of working on it alone, he asked a friend he knew from college to join him. They got into YC to work on a different idea but pivoted back to Workflowy.

They started charging 2 years after they launched and got enough revenue to pay for living expenses.

They have 100k paying users and $800k/year revenue.

Source:

https://www.indiehackers.com/podcast/037-jesse-patel-of-workflowy

Analyzing browserify bundles to minimize JavaScript bundle size

2017-01-04T00:00:00Z

When building web apps, it’s important to keep the size of JavaScript code delivered to the browser as small as possible.

I write in ES6 or TypeScript then use browserify to combine all JavaScript code into a single bundle file. For production builds I use uglify to make the bundle smaller.

Unfortunately, by default we are blind to what ends up in the final bundle. A single import can introduce surprising, unneeded dependencies.

First step of fixing bloat is to see what code ends up in the final bundle.

Disc

Disc is one tool that visualizes the content of JavaScript bundle.

To use it:

npm install -g disc
add fullPaths: true option to browserify plugin (without it file paths are turned into opaque numbers)
discify dist/bundle.min.js >out.html (or whatever bundle.min.js is called in your setup)
open out.html (on mac, or open manually in the browser)

The visualization is very pretty but not very good for understanding.

source-map-explorer

source-map-explorer shows the same information but in a more useful way.

To use it:

npm install -g source-map-explorer
make sure that you generate JavaScript maps file
source-map-explorer dist/bundle.min.js dist/bundle.min.js.map

This will open the browser for you with the treemap visualization.

Analyzing dependency tree

Disc and source-map-explorer can tell you what but not why.

When you see a JavaScript package that shouldn’t be there, you need to know why it’s there i.e. where it was imported from.

I haven’t found a tool that makes it easy, but it’s possible to create a primitive debug tool yourself.

var through = require('through2'),

var b = browserify(browserifyOpts);
if (showDeps) {
  // for debugging dump (flattened and inverted) dependency tree
  // b is browserify instance
  b.pipeline.get('deps').push(through.obj(
  function(row, enc, next) {
    // format of row is { id, file, source, entry, deps }
    // deps is {} where key is module name and value is file it comes from
    console.log(row.file || row.id);
    for (let k in row.deps) {
      const v = row.deps[k];
      console.log('  ', k, ':', v);
    }
    next();
  }));
}

This displays dependencies in the format:

/quicknotes/node_modules/react-dom/lib/LinkedValueUtils.js
./reactProdInvariant : /quicknotes/node_modules/react-dom/lib/reactProdInvariant.js
./ReactPropTypesSecret : /quicknotes/node_modules/react-dom/lib/ReactPropTypesSecret.js
react/lib/React : /quicknotes/node_modules/react/lib/React.js
fbjs/lib/invariant : /quicknotes/node_modules/fbjs/lib/invariant.js
fbjs/lib/warning : /quicknotes/node_modules/fbjs/lib/warning.js

It’s not an ideal presentation but you can figure out who ultimately imports a given JavaScript file by chasing chain of imports.

Things I learned

How does it help in practice? Here are 2 examples of how I reduced JavaScript bundle bloat by using those tools.

bloated highlight.js

In QuickNotes I use highlight.js library to do syntax highlighting for code snippets.

Looking at output of source-map-explorer I noticed that highlight.js is 476 kB in size. That seemed excessive.

The problem was that while core of highlight.js is small, it supports 168 languages and doing import 'highlight.js' would bundle all of.

I only need to support small subset of most popular languages.

One way to fix it would be to use https://highlightjs.org/download/ to generate a custom bundle. That would require repeating this manual step when I want to use the newer version.

I settled on a hacky but more automated solution.

Doing import 'highlight.js' loads node_modules/highlight.js/index.js which imports all languages.

I created a custom index.js that only imports the languages I want. Bbefore every compilation, I over-write node_modules/highlight.js/index.js with my custom version.

That way I can still use npm to manage the library and easily update to new version.

The result? Saved 416 kB.

bloated seedrandom.js

At work we use tiny seendrandom.js library.

When inspecting our JavaScript bundle I noticed suspicious libraries in it, like asn1 decoder.

I suspected our code doesn’t do asn1 decoding. Searching the codebase didn’t turn up any direct use.

I speculated that it’s imported indirectly by some other library.

I used my ad-hoc dependency tree dump to figure out that this code is imported from seedrandom.js.

This piece of code is a culprit:

// When in node.js, try using crypto package for autoseeding.
  try {
    nodecrypto = require('crypto');
  } catch (ex) {}

Since node libraries are available during build step this line adds 294 kB of unneeded crypto code to our web app.

The fix was to fork the repo and remove those lines.

Automating things

It’s handy to be able to re-run this analysis. Here’s a sample script analyze_bundle.sh I have in one of my projects:

#!/usr/bin/env bash
set -u -e -o pipefail

# uses source-map-explorer (https://www.npmjs.com/package/source-map-explorer)
# to visualize what modules end up in final javascript bundle.

install_sme() {
  if [ ! -f ./node_modules/.bin/source-map-explorer ]; then
    npm install source-map-explorer
  fi
}

analyze_prod()
{
  rm -rf s/dist/*
  install_sme

  ./node_modules/.bin/gulp jsprod
  ./node_modules/.bin/source-map-explorer s/dist/bundle.min.js s/dist/bundle.min.js.map
}

analyze_prod

The particulars will depend on your build system. The general idea is to run the build to generate .js and .map.js files and run source-map-explorer for analysis.

Optimizing JavaScript by using arrays instead of objects

2016-10-23T00:00:00Z

Best optimizations are achievied by thinking about a problem holistically.

In this article I describe an optimization that uses arrays instead of classes while providing a class API for accessing data.

Imagine you’re building a web-based note taking application.

It uses modern, single-page architecture. Front-end is written in React and backend provides JSON data to React.

The main view is a list of all notes of a given user. You need a backend api call that returns list of user’s notes. You survey how everyone else is implementing such API and you come up with the following: /api/getnotes?user_id=<user_id> call which returns JSON response that looks like:

{
  "notes": [
    {
      "id": 1,
      "title": "first note",
      "createdAt": "2016-08-14 15:34:32Z",
      // ... more properties
    },
    {
      "id": 2,
      "title": "second note",
      "createdAt": "2016-08-14 16:03:12Z"
      // ... more properties
    },
    // ... more notes
  ]
}

You notice there’s a lot of redundancy as we repeat property names in every note object.

In our case the structure of the note is fixed i.e. it always has the same properties. We can encode this data more efficiently:

{
  "notes": [
    [1, "first note", "2016-08-14 15:34:32Z", ... more properties],
    [2, "second note", "2016-08-14 16:03:12Z", ... more properties],
    // ... more notes
  ]
}

This is a holistic optimization that achieves several speedups at once:

backend generates less text (JSON response)
backend compresses less data
browser downloads less data
browser decompresses less data
browser parsers less JSON text
JavaScript arrays are most likely more memory efficient that objects, so the data uses less memory

If you know how gzip compression works you might protest that our effort to remove property names is mostly futile because gzip is very good at removing such redundancies.

I benchmarked QuickNotes using my own notes and found that even after compression the size difference of two versions is ~50%. This might be a difference between a browser downloading 150 kB of data vs. 300 kB.

This representation comes at a cost of programmer’s convenience.

With objects we say note.title. With array representation it’s more work:

const noteIdIdx = 0;
const noteTitleIdx = 1;
// ... constants for more properties

const title = note[noteTitleIdx];

This is not great. We can improve this by writing accessor functions:

function getTitle(note) {
  return note[noteTitleIdx];
}

JavaScript is a pliable language and we can get the best of both worlds: array representation with class API.

We create a class that derives from Array and extends it with accessor functions.

This example uses TypeScript, because static typing rocks, but will also work in pure JavaScript.

class Note extends Array {
  constructor() {
    super();
  }

  ID(): int {
    return this[noteIdIdx];
  }

  Title(): string {
    return this[noteTitleIdx];
  }
  // .. more accessor functions
}

// this "upgrades" rawArray object from being Array instance
// to Note instance by patching prototype chain.
// Beware: if rawAray is not Array instance, bad things will happen
function toNote(rawArray: any): Note {
  Object.setPrototypeOf(rawArray, Note.prototype);
  return rawArray as Note;
}

// one way to convert raw array to object
const rawNote = [1, "first note", ... more properties];
const note = toNote(rawNote);
const title = note.Title();

// a less efficient but also less hacky way is by constructing a new Note object
const note = new Note(rawNote);
const title = note.Title();

Class Note extends built-in JavaScript Array so it’s as efficient as an array and inherits all its functionality.

We add a couple of functions for getting/setting note data for a better API.

The magic happens in toNote function.

rawNote is an instance of Array We could add our accessor functions directly to Array.prototype but that would make them available to all Array instances.

By defining a class Note that inherits from Array, Note.prototype inherits all of Array.prototype functions and gets our additional functions.

In order to convert a raw array to Note object we can either construct a new object from raw array or “upgrade” the object with Object.setPrototypeOf(note, Note.prototype).

This is dangerous: if the object being upgraded is not an instance of Array, bad things will happen. It’s not a technique that should be overused.

Upgrading the object in place should be more efficient than creating a new object as it avoids an allocation.

On the other hand, according to MDN changing prototype of an object makes for slowera access, so it can go either way.

To summarize:

optimization is often achieved by looking at a problem as a whole
thanks to flexibility of JavaScript we can implement a micro-optimization where we represent objects as arrays but add convenient class-like API

Go package for better guid generation

2015-02-12T00:00:00Z

The need to generate a globally unique identifier comes up often.

The way described in RFC 4122 is popular but it can be done better.

I wrote betterguid Go package that does it better.

Unique id generated by this package:

is a 20 character string, safe to include in urls (no need for escaping)
consist of 8 bytes of timestamp (millisecond precision) and 9 bytes of random data
sorts lexicographically
72-bits of random data ensures IDs won’t collide with IDs generated by other clients
are monotonically increasing even within the same timestamp

You can read a longer description of the algorithm.

My implementation is based on this JavaScript code.

Related: comparison of 7 Go libraries for unique id generation.

Tip for per-test verbose logging in Go

2014-12-03T00:00:00Z

One way to narrow down a problem when debugging a test is to add logging with e.g. fmt.Printf().

The problem with this approach is lack of selectivity: imagine you have 100 tests and only 1 test fails. For debugging the issue you only need to see logs when executing that 1 test but you’ll drown in log output from all 100 tests.

My solution: control logging state with global variable verboseLog and allow toggling this flag per test.

Something like this:

var (
    verboseLog = false
)

func myCodeWithBugs(s string) {
    if verboseLog {
        fmt.Printf("s: %s\n", s)
    }
    ...
}

func TestMyCode(t *testing.T) {
    var tests = []struct {
        ... test fields
        debug bool
    }{
        { ..., false }, // without verbose logging
        { ..., true },  // with verbose logging
    }
    for _, test := range tests {
        verboseLog = test.debug
        ...
    }
}

SumatraPDF 3.0 released

2014-10-29T00:00:00Z

We, the SumatraPDF developers have released a version 3.0 of Sumatra, a multi-format reader (PDF, epub and mobi ebooks, comic books, etc.) for Windows.

You can download it from official SumatraPDF website

The biggest change in this version is addition of tabs, contributed by Stefan Stefanov.

If you don’t like tabs, you can go back to the old UI using Settings/Options… menu

We added support for table of contents and links in ebook UI.

We added support for PalmDoc ebooks.

Comic books now support CB7 and CBT format (in addition to CBZ and CBR).

We added support for LZMA and PPMd compression in CBZ comic books

You can now save Comic Book files as PDF.

We swapped keybindings:

F11 : fullscreen mode (still also Ctrl+Shift+L)
F5 : presentation mode (also Shift+F11, still also Ctrl+L)

We added a document measurement UI. Press ‘m’ to start. Keep pressing ‘m’ to change measurement units.

We added new advanced settings: FullPathInTitle, UseSysColors (no longer exposed through the Options dialog), UseTabs

We replaced non-free UnRAR with a free RAR extraction library. If some CBR files fail to open for you, download unrar.dll from rar website and place it alongside SumatraPDF.exe

We deprecated browser plugin. We don’t remove it if it was installed in previous version but both Chrome and FireFox are removing support for plugins so there’s no point in keeping it.

Finally, some of you really didn’t like the yellow background color. You’ve won: it’s now gray.

Krzysztof Kowalczyk blog

How I implemented wc in the browser in 3 days

Building wc in the browser

Building software quickly

Why those technologies?

Implementation tidbits

Getting list of files

Deriving a class from an Array

Reading a directory recursively

Showing the files

Calculating line count

Remembering opened directories

Asking for permissions

Deleting files and directories

Getting bit by a multi-threading bug

The future

Ideas for replit bounties

Lower transaction costs

Educate bounty posters

Purge obviously bad bounties

Don’t list cancelled bounties by default

Educate and penalize bad bounty creators

Reverse search: bounty creator looks for devs

Change how entering price works

Don’t dismiss “Create a Bounty” with outside click

Redesign discussions

Monitor failed bounties

Beyond bounties

Bad error message when replit has issues

Go improvement

Advanced markdown processing in Go

Using gomarkdown/markdown library

Basics first

Advanced processing

ast.Node

Customizing HTML renderer

Modify ast tree

Custom markdown parser, custom ast.Node

Syntax highlighting

Pre-process markdown before parsing

Persisted Svelte store using IndexedDB

What is Svelte store?

Creating a store

Writable store

Global store vs. multiple instances of store

Persisted store with values backed by IndexedDB

Changes in another browser tab

Factory function to easily create stores

KV store

Find programming work by increasing luck surface area

It’s the basics

Basics for programmers looking for freelance / consulting jobs

Increasing luck surface area

Have a process

Publish technical articles

Publish open source packages

Build online tools

Turn experience into lead generating artifacts

Goto 1

Stay focused

Job search as a problem to be solved

A simple matter of asking the right question

Specialize

Level up

Kaizen

I will coup whoever I want

Extreme #include discipline for C++ code

@levelsio and survivorship bias

Pieter ships products

Pieter is persistent

Pieter understands compund interest

Pieter abandons projects that are not working

Pieter chargers money

Pieter ships quickly

Pieter does a lot of marketing and promotion

Pieter is good at promotion and marketing

Pieter is productive

Pieter learned programming by himself

Pieter copies best practices

Pieter is not a technologist

`ast.Node`

Custom markdown parser, custom `ast.Node`