The other side of the moon

Ansible: Extracting multiple attributes from a list of dicts

2025-05-14T16:27:00.004-04:00

I've been writing a bunch of ansible playbooks, and in one case I had to transform a list of dicts to extract two attributes from each dict and create a new list of dicts. i.e., given a list like this:

entities:
  - id: 123
    label: Label 1
    type: foo
    status: enabled
  - id: 234
    label: Label 2
    type: foo
    status: enabled
  - id: 345
    label: Label 3
    type: bar
    status: enabled

I need to transform it into this:

entities:
  - id: 123
    type: foo
  - id: 234
    type: foo
  - id: 345
    type: bar

I found the examples in the ansible docs to be very limited. In most cases there are no examples that show the use of additional parameters to filters. I defintely couldn't find anything that would let me extract two attributes from a list of dicts. There are examples to extract a single element using map(attribute='xxx'), but nothing to extract more than one attribute, so I had to come up with something of my own.

I ended up with two possible solutions depending on how much flexibility you have in your playbook.

1. Using `loop`

The easier option is to use loop and construct the new list one element at a time. You can do this if you have the option to use a set_fact block separate from where you need to use the variable.

- set_fact:
    entities_transformed: '{{ entities_transformed|d([]) + [{"id": item.id, "type": item.type}] }}'
  loop: '{{ entities }}'

This set_fact block creates a new fact called entities_transformed by repeatedly appending each transformed element to a new list.

2. As a one liner

If you need to write it all as a one liner without a set_fact, then this second approach works for you.

  entities_transformed: '{{ entities
                            | map("dict2items")
                            | map("selectattr", "key", "in", ["id", "type"])
                            | map("items2dict") }}'

This works in multiple steps and I'll explain each with what the output looks like at that stage.

`map("dict2items")`

This transforms the entities list into the following:

  - - key: id
      value: 123
    - key: label
      value: Label 1
    - key: type
      value: foo
    - key: status
      value: enabled
  - - key: id
      value: 234
    - key: label
      value: Label 2
    - key: type
      value: foo
    - key: status
      value: enabled
  - - key: id
      value: 345
    - key: label
      value: Label 3
    - key: type
      value: bar
    - key: status
      value: enabled

`map("selectattr", "key", "in", ["id", "type"])`

This strips down to the required keys:

  - - key: id
      value: 123
    - key: type
      value: foo
  - - key: id
      value: 234
    - key: type
      value: foo
  - - key: id
      value: 345
    - key: type
      value: bar

`map("items2dict")`

This reverses the first step giving us the following:

  - id: 123
    type: foo
  - id: 234
    type: foo
  - id: 345
    type: bar

Both options work equally well, but I prefer the second because it avoids creating additional facts and requiring loops in places where I cannot use one.

Fixing a system without enough RAM for a text editor

2025-04-22T15:00:00.002-04:00

Someone on quora asked why people still use editors like emacs and vim when more modern alternatives exist.

There were so many great answers that I didn't need to answer the original question. I couldn't possibly add more to the question of why emacs or vim. Instead, I was reminded of an experience where even emacs/vim weren't options.

Sometime in the mid to late '90s, I visited my sister at university. She was in a biology lab, and they had a single 80386 PC with DOS and Windows 3.1. The computer wouldn't start Windows and they didn't know why and they asked me if I could do anything.

Since I love debugging obscure problems like this, I decided to take a look. It turned out to be a simple case of there not being enough available RAM to start Windows. The box did however have 4MB of RAM, which should have been more than enough to start Windows... except, this was a 386, and for compatibility with older 8 bit software, RAM was split into Conventional memory (640KB), System ROM (640K-1M) and Extended memory (everything above 1MB), and this box wasn't configured to use extended memory (if you remember HIMEM.SYS).

To make matters worse, AUTOEXEC.BAT loaded a bunch of programs at startup that used up a bunch of RAM, which meant that I couldn't even start the basic EDIT program to edit AUTOEXEC.BAT or CONFIG.SYS.

My only option at that point was to fall back to the absolute basics.

COPY CON AUTOEXEC.BAT
COPY CON CONFIG.SYS

The equivalent on unix would be cat. COPY CON on MS-DOS stands for COPY what's on the CONSOLE (the keyboard in this case) to the destination file, overwriting it if it exists.

(See What is copy con? for details)

And I had to be really careful with what I typed because typing in the wrong thing would mean the system might not start up, and I didn't have a boot disk on me (remember I was just visiting with no plan of actually fixing a computer), and if I did have a boot disk, none of this would have been necessary.

Anyway, I managed to build a very basic AUTOEXEC.BAT and CONFIG.SYS from memory (though I cannot remember now what I put into them), which allowed me to reboot the machine with enough RAM to start EDIT which allowed me to further edit the files to reboot with enough RAM to start Windows.

What I learnt from this is that no matter how good a system you may have access to, you need to be prepared to use the absolute minimum available tools. On DOS this was COPY CON. On unix over a slow or lossy network, you might actually have to edit a file by sending single lines of sed. In order to be prepared to do this, you need to do this a lot. It turns out that vim and emacs are really just one step above sed (well technically one step above ed which is half a step above sed) although they are extendable to have all the features of Eclipse or Visual Studio if you like, but even without those extensions, they are far more powerful.

Even while working with Eclipse, I'll find that there are times when I need to quit Eclipse, open my files in Vim, run a few commands to do things that would take me ages to do in Eclipse and then return to Eclipse. I need to use Eclipse because that's what our dev team has standardized on, and it makes it easier when screen sharing with other devs.

If you liked this post, there's a far more fun video of how the JPL team debugged and fixed an issue 15 billion miles away on Voyager 1.

On Migrating Character Encodings

2025-03-17T13:31:00.006-04:00

Several discussions I've had with friends and colleagues recently reminded me of an incident we faced several years ago at Yahoo!

Now Yahoo! as a company was made up of many different local offices around the world, each responsible for content in their locale. Since there was a lot of user generated content, this meant users in a particular locale could easily enter content (blog posts, restaurant reviews, etc.) in their local language script.

Everyone was happy!

From about 2005 onwards, the company was looking to unify some of the platforms used around the world. For example, we had something like 4 or 5 different platforms to do ratings and reviews and it didn't make sense to have different architectures, database layouts, BCP setups, and a separate team managing each of these, so we started unifying. Building a common architecture was the easy part. I worked on several of these projects. Getting front end teams to migrate was also not terribly hard. Migrating content though, was tough because each region had content in their own locale and MySQL didn't let you set multiple character encodings on text columns.

So the i18n team started working with teams across Y! to move everything to utf8. The easy part was changing HTTP headers and <meta> tags. Content was a little harder, but doable with iconv(1) since in most cases we knew the source character encoding and the destination was always utf-8. In some cases we had to guess, but it generally worked...

...until at one point we also decided to do it for authentication.

One of the things that was localized was authentication, because it allowed users in, for example, South Korea, to use Hangul characters in their passwords. Usernames were always restricted to just alphanumeric characters and underscores (If I remember correctly).

Passwords are stored, as they should be, salted and hashed, so the character encoding of the database column was always us-ascii, which is compatible with utf-8, so no biggie..., except the input character encoding used by the browser was based on the HTTP headers or META tags of the page, and the transfer encoding was based on the enctype of the login FORM.

Prior to this move, these were all set to a character encoding that made sense locally, so Korea used EUC-KR and China used Big5, and so the hashed passwords used the byte sequences that resulted from treating the input as one of these encodings.

After the move, the user would still type in the same password, but when we converted them to bytes, we used utf-8, which resulted in a different byte sequence than the original encoding, so hashing this new sequence of bytes resulted in a different hash, and users could no longer log in. Well, only users that used non-ASCII characters in their passwords.

I forget what the actual fix was, but there were several options on the table. One was to revert the character encoding changes on the login page and to re-encode all passwords after a successful login. Another was to generate two hashes, one using utf-8 and another using the pre-migration character encoding for the region and to allow a success on either to go through.

When users interact

2025-02-01T10:43:00.009-05:00

When looking at the Core Web Vitals, we often try optimizing each independently of the others, but that's not how users experience the web. A user's web experience is made up of many metrics, and it's important to look at these metrics together for each experience. Real User Measurement (RUM) allows us to do that by collecting operational metrics in conjunction with user actions, the combination of which can tell us whether our pages actually meet the user's expectations.

In this experiment, I decided to look at each of the events in a page's loading cycle, and break that down by when the user tried interacting with the page. For those interactions, I looked at the Interaction to Next Paint, and the rate of rage clicking to get an idea of user experience and whether that experience may have been frustrating or not.

Before I jump into the charts, I should note an important caveat about the data. This analysis was done using RUM data collected by Akamai's mPulse product which collects data at or soon after page load. Not all page views resulted in an interaction before data was collected. Most of the analysis was restricted to page views where we had at least one interaction prior to data collection. We see on average, between 2-25% of beacons collected (across sites) had an interaction. Most sites had a recorded interaction on about 10% of beacons. I also separately looked at data collected during page unload/pagehide and while it captured more interactions, it did not have a noticeable effect on the results.

Each of the following charts is from a different website in mPulse's dataset.

Exploring the chart

Interaction analysis chart for Virtual Globe Trotting

Let's now look at the various features of this chart.

The chart shows multiple dimensions of data projected onto a 2D surface, so some parts of it will appear wonky. We'll walk through that in this section.

Event labels

The first thing we'll describe are the events. These are the vertical colored lines with labels to their right. These represent transition events in the page load cycle. The events we include are:

You may have already noticed that in this particular chart, First Paint is _after_ First Contentful Paint, which is counter-intuitive. The reason we see this is that the number of data points with First Paint on them is different from those with First Contentful Paint. Safari and Firefox, for example, support FCP but not FP. When aggregating these points, the same percentile value when applied to two data sets will likely get you values from two different experiences. This effect is more prominent when the sample sizes are different. In general we would not expect the delta to be too far off, and in the data I've looked at, it hasn't been more than 50ms off.

The events to keep an eye on are the Largest Contentful Paint or Time to Visually Ready, the Time to Interactive, and the delta between them. LCP is not currently supported on Safari, so we use boomerang's cross-browser calculation of TTVR in those cases.

Time to Interactive is considered a lab measurement, but `boomerang` measures it in a cross-browser manner during RUM sessions, and passes that data back to mPulse. It is approximately the time when interactions with the page are expected to be smooth due to no more long animation frames and blocking time.

The next thing to note are that these events are positioned on this projection based on when they occurred relative to interactions _as well as_ when they occurred relative to page load time. By definition this means that all interactions should show up after LCP but it may show up differently on the chart due to the projection from multiple dimensions down to two. There's also the fact that TTVR calculations do not stop at first interaction, so on browsers that do not support LCP, we may see interactions before the proxy for that event.

The absolute value of each event is calculated across the entire dataset, even on pages without intereactions, so it might look like events aren't placed where their values dictate they should be, however the percentage of users interacting before & after an event is always correct.

The last label to take note of is the fraction of users that interacted before `boomerang` considered the page to be interactive. In this case, it's 12% of users.

Data distributions

Interaction analysis chart showing mouseover details.

There are a few different distributions shown on this chart, (and even more when we look at the mouseover in the chart above).

The blue area chart is the _population density_. It shows, for every 5% interval of the page load time, how many users first interacted with the page at that point in the page's loading cycle.

The blue dots that trace the population density chart show the median _Interaction to Next Paint_ value for all of those interactions. Keep in mind that INP is not supported on Safari, whereas `boomerang`'s own measurements for TTI do work across browsers.

The vertical position of the red dots shows the _probability_ that interactions at that time resulted in _rage clicks_ while the size of the red dots shows the _intensity_ of these rage clicks. Rage clicks are collected across browsers.

The thin orange line shows Frustration Index for users that interacted within that window.

We also have the median Total Blocking Time for each of these interactions, though that's only visible in the live versions of these charts and not in most of the screenshots posted here.

In this second chart, we see that 59% of users interacted with the site before it became interactive. Its TTI is further from the LCP time compared to the first site.

Insights from the data

Interaction analysis chart showing INP increasing around TTI.

When we look at this data across websites, we see the same patterns. Users expect to be able to interact with the site once the page is largely visible, however, the user experience for interactions is sub-optimal until the time to interactive which can be much later in the page's loading cycle.

In most cases we see a high Total Blocking Time in the period between LCP and TTI, resulting in a slow INP, and higher probability of rage clicking.

When looking to optimize a site for user experience, we shouldn't look at each metric in isolation. A really fast LCP is a great first user experience, but it's also a signal to the user that they can proceed with interacting to complete their task. It's important that the rest of the page be ready for those interactions and keep up the good experience.

The elephant in the room

Interaction analysis chart for Akamai.com focussing on the population series.

As an aside, has anyone else noticed that these charts almost always look like a sleeping elephant (or maybe a hat)? I've seen very few sites where this isn't the case, so I looked into that pattern.

The population distribution pattern we see is a gradual curve increasing, then a dip that looks like the elephant's neck, then a bump that could be its ears, a sharp dip and long flat region that could be its trunk.

It could well be a Normal distribution if it weren't for the dip and spike right around PLT.

A basic Normal Distribution curve with a mean of 75 and standard deviation of 30.

The drop-off after OnLoad is expected. `boomerang.js` sends a beacon on or soon after page load (sites can configure a beacon delay of a few seconds to capture post-onload events). This results in a drop-off in data with interactions after onload. The post onload interactions are on pages that are faster than the average.

The strange pattern is the spike in interactions just at or after onload (it's sometimes at 100% and sometimes at 105%). The dip at 95% & 100% shows up on most, but not all sites, but the spike shows up everywhere.

I looked closer at the data around those buckets and there is very little difference in terms of experience. The page load time, LCP time, TTI time, etc. are all very similar at the 25th and 75th percentile (in other words, the experiences are comparable). The only difference is that more users prefer to interact with the site just after the onload event has fired than just before it. It's not a big delay - about 200-400ms on average across sites, but it does look like some portion of users still wait for the loading indicator to complete before they interact.

Conclusions

In conclusion, I think there's a lot to be learned from looking at when your users interact with your site. Which parts of the page have finished loading when that interaction happens? What's still in flight? What do they experience? Is there too much of a delay between your LCP and the site becoming usable?

A good loading experience needs your page to transition from state to state smoothly without too much delay between states. Looking at the loading Frustration Index can identify pages where this isn't the case.

When comparing different events on the page, look at the aggregate of deltas rather than the delta of aggregates.

And lastly, keep an eye out for that elephant.

References

Glossary on Mozilla Developer Network

Web Vitals on Google's Web.Dev

Implementations in mPulse

Uploading a file using SFTP through Julia's LibCURL

2022-05-03T11:12:00.006-04:00

The problem

A colleague was recently tasked with using SFTP to upload an automated report to a customer's server. The code that generates the data runs in Julia 1.6, but there were no restrictions on where the upload code had to run.

Unfortunately our command line curl wasn't build with sftp support so we couldn't use that, and our system doesn't have sftp installed, so we couldn't use that either. The question was, whether we could use Julia's built-in LibCURL library to do the upload.

tl; dr

Yes, you can, by converting a Julia IOStream to a C pointer to the file that needs uploading.

The details

Julia uses LibCURL which is a simple wrapper around the libcurl C API. This generally means that we have to pass in C pointers for a lot of things. Now Julia can automatically do type conversions for basic types like Strings and Numbers, but more complex structs will need some work to make sure they're in the right format.

The basic libcurl code we need to reproduce is this:

curl = curl_easy_init()

curl_easy_setopt(curl, CURLOPT_URL, "sftp://user:password@server:port/path/to/file.csv") # This would be a NULL terminated string in C, but Julia does that conversion for us
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1)
curl_easy_setopt(curl, CURLOPT_PROTOCOLS, CURLPROTO_SFTP)

curl_easy_setopt(curl, CURLOPT_READDATA, file_ptr)   # We need to pass in a C file pointer here

res = curl_easy_perform(curl)
curl_easy_cleanup(curl)

The complication is that Julia uses libuv to open files, and the return value from Julia's open function is an IOStream. Fortunately, Julia has code in Libc that converts between an IO and a FILE *:

struct FILE
    ptr::Ptr{Cvoid}
end

modestr(s::IO) = modestr(isreadable(s), iswritable(s))
modestr(r::Bool, w::Bool) = r ? (w ? "r+" : "r") : (w ? "w" : throw(ArgumentError("neither readable nor writable")))

function FILE(fd::RawFD, mode)
    FILEp = ccall((@static Sys.iswindows() ? :_fdopen : :fdopen), Ptr{Cvoid}, (Cint, Cstring), fd, mode)
    systemerror("fdopen", FILEp == C_NULL)
    FILE(FILEp)
end

function FILE(s::IO)
    f = FILE(dup(RawFD(fd(s))),modestr(s))
    seek(f, position(s))
    f
end

Base.unsafe_convert(T::Union{Type{Ptr{Cvoid}},Type{Ptr{FILE}}}, f::FILE) = convert(T, f.ptr)

Using this, we can open a file in Julia, and pass it on to curl:

fh = open("file.csv", "r")   # Open the file for reading and get an IOStream
fp = Libc.FILE(fh)           # Convert the IOStream to a FILE*
curl_easy_setopt(curl, CURLOPT_READDATA, fp.ptr)

Note that in the call to CURLOPT_READDATA, we need to pass in the ptr member of the FILE struct, since that's the actual C object.

Complete example

curl = curl_easy_init()

curl_easy_setopt(curl, CURLOPT_URL, "sftp://user:password@server:port/path/to/file.csv")
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1)
curl_easy_setopt(curl, CURLOPT_PROTOCOLS, CURLPROTO_SFTP)

fh = open("file.csv", "r")
fp = Libc.FILE(fh)
curl_easy_setopt(curl, CURLOPT_READDATA, fp.ptr)

res = curl_easy_perform(curl)
curl_easy_cleanup(curl)

The metrics game

2021-08-30T14:07:00.000-04:00

A recent tweet by Punit Sethi about a Wordpress plugin that reduces Largest Contentful Paint (LCP) without actually improving user experience led to a discussion about faking/gaming metrics.

Core Web Vitals

Google recently started using the LCP and other Core Web Vitals (aka CWV) as a signal for ranking search results. Google's goal in using CWV as a ranking signal is to make the web better for end users. The understanding is that these metrics (Input delays, Layout shift, and Contentful paints) reflect the end user experience, so sites with good CWV scores should (in theory) be better for users... reducing wait time, frustration, and annoyance with the web.

If I've learnt anything over the last 20 years of working with the web, it's that getting to the top of a Google search result page (SRP) is a major goal for most site owners, so metrics that affect that ranking tend to be researched a lot. The LCP is no different, and the result often shows up in such "quick fix" plugins that Punit discusses above. Web performance (Page Load Time) was only ever spoken about as a sub-topic in highly technical spaces until Google decided to start using it as a signal for page ranking, and then suddenly everyone wanted to make their sites faster.

My background in performance

I started working with web performance in the mid 2000s at Yahoo!. We had amazing Frontend Engineering experts at Yahoo!, and for the first time, engineering processes on the front-end were as strong as the back-end. In many cases we had to be far more disciplined, because Frontend Engineers do not have the luxury of their code being private and running on pre-selected hardware and software specs.

At the time, Yahoo! had a performance team of one person — Steve "Chief Performance Yahoo" Souders. He'd gotten a small piece of JavaScript to measure front-end performance onto the header of all pages by pretending it was an "Ad", and Ash Patel, who may have been an SVP at the time, started holding teams accountable for their performance.

Denial

Most sites' first reaction was to deny the results, showing scans from Keynote and Gomez, which at the time only synthetically measured load times from the perspective of well connected backbone agents, and were very far off from the numbers that roundtrip was showing.

The Wall of Shame

I wasn't working on any public facing properties, but became interested in Steve's work when he introduced the Wall of Fame/Shame (depending on which way you sorted it). It would periodically show up on the big screen at URLs (the Yahoo! cafeteria). Steve now had a team of 3 or 4, and somehow in late 2007 I managed to get myself transferred into this team.

The Wall of Shame showed a kind of stock-ticker like view where a site's current performance was compared against its performance from a week ago, and one day we saw a couple of sites (I won't mention them) jump from the worst position to the best! We quickly visited the sites and timed things with a stop-watch, but they didn't actually appear much faster. In many instances they might have even been slower. We started looking through the source and saw what was happening.

The sites had discovered AJAX!

Faking it

There was almost nothing loaded on the page before the onload event. The only content was some JavaScript that ran on onload and downloaded the framework and data for the rest of the site. Once loaded, it was a long-lived single page application with far fewer traditional page views.

Site owners argued that it would make the overall experience better, and they weren't intentionally trying to fake things. Unfortunately we had no way to actually measure this, so we added a way for them to call an API when their initial framework had completed loading. That way we'd get some data to trend over time.

At Yahoo! we had the option of speaking to every site builder and to work with them to make things better. Outside though, is a different matter.

Measuring Business Impact

Once we'd started LogNormal (and continuing with mPulse), and were serving multiple customers, it soon became clear that we'd need both business and engineering champions at each customer site. We needed to sell the business case for performance, but also make sure engineering used it for their benefit rather than gaming the metrics. We started correlating business metrics like revenue, conversions, and activity with performance. There is no cheap way to game these metrics because they depend on the behaviour of real users.

Sites that truly care about performance and the business impact of that performance, worked hard to make their sites faster.

This changed when Google started using speed as a ranking signal.

With this change, sites now had to serve two users, and when in conflict, Real Users lost out to Googlebot. After all, you can't serve real users if they can't see your site. Switching to CWV does not change the situation because things like Page Load Time, Largest Contentful Paint, and Layout Shift can all be faked or gamed by clever developers.

Ungameable Metrics

This brings us back to the metrics that we've seen couldn't be gamed. Things like time spent on a site, bounce rate, conversions, and revenue, are an indication of actual user behaviour. Users are only motivated by their ability to complete the task they set out to do, and using this as a ranking signal is probably a better idea.

Unfortunately, activity, conversions, and revenue are also fairly private corporate data. Leaking this data can affect stock prices and clue competitors in to how you're doing.

User frustration & CrUX

Now the goal of using these signals is to measure user frustration. Google Chrome periodically sends user interaction measurements back to their servers, collected as part of the Chrome User Experience report (CrUX). This includes things like the actual user experienced LCP, FID, and CLS In my opinion, it should also include measures like rage clicks, missed, and dead clicks, jank while scrolling, CPU busy-ness, battery drain, etc. Metrics that only come into play while a user is interacting with the site, and that affect or reflect how frustrating the experience may be.

It would also need to have buy-in from a few more browsers. Chrome has huge market share, but doesn't reflect the experience of all users. Data from mPulse shows that across websites, Chrome only makes up, on average, 44% of page loads. Edge and Safari (including mobile) also have a sizeable share. Heck, even IE has a 3% share on sites where it's still supported.

In the chart below, each box shows the distribution of a browser's traffic share across sites. The plot includes (in descending order of number of websites with sizeable traffic for that browser) Chrome, Edge, Mobile Safari, Chrome Mobile, Firefox, Safari, Samsung Internet, Chrome Mobile iOS, Google, IE, and Chrome Mobile WebView.

It's unlikely that other browsers would trust Google with this raw information, so there probably needs to be an independent consortium that collects, anonymizes, and summarizes the data, and makes it available to any search provider.

Using something like the Frustration Index is another way to make it hard to fake ranking metrics without also accidentally making the user experience better.

Comparing these metrics with Googlebot's measures could hint at whether the metrics are being gamed or not, or perhaps it even lowers the weight of Googlebot's measures, restricting it only to pages that haven't received a critical mass of users.

We need to move the balance of ranking power back to the users whose experience matters!

Safely passing secrets to a RUN command in a Dockerfile

2021-08-06T14:28:00.003-04:00

There may be cases where you need to pass in secrets to a RUN command in a Dockerfile, and it's very important that these secrets not be leaked into the environment or the image. In particular, these secrets should not be stored in the image (either on disk or in the environment, not even in intermediate layers), they should not show up when using docker history.

While working this topic, I found many blog posts that point to pieces that may be used, but nothing that pulls it all together, so I decided to write this post with everything I've found. I'll provide a list of references at the end.

In my case, I needed to temporarily pass a valid odbc.ini file to my Julia code so that I could build a SysImage with the appropriate database query and result parsing functions compiled. I did not want the odbc.ini file available in the image.

Step 0: Make sure you have docker > 18.09

docker version

You most likely have a new enough version of docker, but in the odd chance that you're running a version older than 18.09, please upgrade. My tests were run on 19.03 and 20.10.

Developing the Dockerfile:

Step 1: Specify the Dockerfile syntax

At the top of your Dockerfile (this has to be the absolute first line), add the following:

# syntax=docker/dockerfile:1

This tells docker build to use the latest 1.x version of the Dockerfile syntax.

There are various docs that specify using 1.2 or 1.0-experimental. These values were valid when the docs were written, but are dated at this point. Specifying version 1 tells docker build to use whatever is latest on the 1.x tree, so you can still use 1.3, 1.4, etc. Specifying 1.2 restricts it to the 1.2.x tree.

Step 2: Mount a secret file where you need it

At the RUN command where you need a secret, --mount it as follows:

RUN --mount=type=secret,id=mysecret,dst=/path/to/secret.key,uid=1000 your-command-here

There are a few things in here, which I'll explain one by one.

type=secret: This tells docker that we're mounting a secret file from the host (as opposed to a directory or something else)
id=mysecret: This is any string you'd like. It has to match the id passed in on the docker build command line
dst=/path/to/secret.key: This is where you'd like the secret file to be accessible. Any file already at this location will be temporarily hidden while the secret file is mounted, so it's safe to use a location that your code will expect at run time.
uid=1000: This is the userid that should own the file. This defaults to 0 (root), so is useful if your command runs as a different user. You can also specify a gid

The full list of supported parameters for secret mounts is available at the buildkit github page

You can add the same --mount at different locations in your Dockerfile, and with different dst and uid values. The file is mounted only for the duration of that RUN command and not persisted to any layers.

Running `docker build`

Step 3: Set the environment variable

This step is optional on newer versions of Docker.

Once you're ready to run docker build, tell docker to use BuildKit

DOCKER_BUILDKIT=1

You can either put this right before running the command, or export it into your shell.

Step 4: Run `docker build` with your secret file

docker build --secret id=mysecret,src=/full/path/to/secret.key .

It's important to note that tilde (~) expansion does not work here. You can use an absolute or relative path, but you cannot use expansion.

That's IT!!!

Jenkins

If you run your docker builds through jenkins, you'll need a few more steps. The bulk of it is documented in this Cloudbees-CI article about injecting secrets.

Once you've gotten your secret file into Jenkins, and bound it to an environment variable in your Build Environment, you have to update the docker build command to use this variable instead.

For example, if we bound the secret file to a variable called MYSECRETFILE, then we'd change our build command to:

docker build --secret id=mysecret,src=${MYSECRETFILE} .

References

These links were very useful in figuring out this solution.

Don’t leak your Docker image’s build secrets by Itamar Turner-Trauring
Slava Semushin on Stackoverflow
Dockerfile documentation about secrets
BuildKit specific syntax for Dockerfiles (these docs are not in the Dockerfile docs)
Cloudbees-CI article about injecting secrets into Jenkins

Recovering from Big Sur upgrade snafu

2021-02-17T00:00:00.003-05:00

Apple recently pushed out a new release of MacOS called Big Sur. Unfortunately, the upgrade process is problematic. Specifically, the upgrader does not check for the required disk space before starting the upgrade, and if the target system doesn't have enough disk space (35GB or so), then the upgrade fails partway through, leaving your system in a mostly unusable state.

This is what happened to me.

My environment

The system was a 13" Macbook with a 128GB SSD drive. 128 is pretty small and doesn't leave much space for too many large items.
The system had just a single user.
At the start of the upgrade, the system had about 13GB of free disk space (>10%).
Desktop, Documents and Photos were backed up to iCloud, but Downloads weren't, and some very large photos & videos had been removed from iCloud to save space there, so they only existed locally.

Prior discussion

Mr. Macintosh has published a very detailed explanation of the issue and various ways to get around it without any data loss. This is a very good article that got me very far in my investigation. I was lucky that the latest updates had been posted just a few hours before I hit the problem myself.

Unfortunately, none of the suggested fixes worked for me.

I couldn't mount the drive in Target Disk Mode as my password wouldn't work (the password still worked when logging in locally, but that took me back to the upgrade loop).
I couldn't start up the system in Recovery Mode as it wanted a password, but again, wouldn't accept the password (the same password that worked when fully booting up).
I couldn't access the disk when booting from an external startup disk because of the same issue.

Many posts I found online seemed to suggest that a firmware password was required, but I'd never set this up.

Single User Mode

Eventually, what showed the most promise was booting into Single User Mode and then fiddling around with all available disk devices.

Password worked for Single User Mode

To start up in Single User Mode, press Cmd+S when starting up until the Apple logo shows up.
The system prompts you for a password, and my password did in fact work in this mode.
After signing in, you're dropped into a unix shell.
There's only a basic file system mounted, which contains a limited number of unix commands and none of your data

Mount the Data partition

Once in single user mode, I had to mount my data partition. I first used the mount command to see what was already mounted. It showed that the only mounted device was /dev/disk1s1. I assumed that my Data partition would be /dev/disk1s2 and that it would have the same filesystem, and I chose a convenient mount point:

# mount -t apfs /dev/disk1s2 /System/Volumes/Data

Miraculously, this did not ask me for a password, and mounted my Data partition. I was able to look through the files and identify potential targets to remove. I also noticed that the disk was no completely full (0 bytes free). This was due to the Big Sur installer, which took up 11GB, and then added a few files, using up the entire 13GB that I had available.

Things were getting a little cumbersome here as most of the unix commands I needed to use were not on the primary partition, but on the mounted partition, so I added the appropriate folders to the unix PATH environment variable:

PATH="$PATH":/System/Volumes/Data/usr/bin

I was starting to see that choosing a 3 level deep path as my mount point perhaps wasn't a great idea. I also learned that while the screen is quite wide, the terminal environment is set to show 80 columns of text, and goes into very weird line wrapping issues if you type past that. It's even worse if you try tab completion at this point.

Transferring large files

Some of the large files & folders I identified were downloaded packages that could be removed. Unfortunately this only got me 2GB back. To get enough space back, I'd have to remove some photos and videos that weren't stored on iCloud. I figured I'd copy them over to an SD card and then could delete them.

I popped in the SD Card, and the kernel displayed some debug messages on the terminal. It told me that the card was in /dev/disk4, so I tried mounting that at a random empty directory:

# mount -t exfat /dev/disk4 /System/VM

This did not work!

No SD Cards in Single User Mode

By default, SD Cards are formatted with an EXFAT file system (the kind used by Windows and all digital cameras). Unfortunately, you cannot mount an EXFAT filesystem in Single User Mode as the exfatfs driver isn't compiled into the kernel. It's loaded up as a dynamic module when required. This only works when booting in standard mode with a kernel that allows dynamic loading. Single User Mode does not.

Reformat the SD Card

This was a brand new SD Card, so I decided to reformat it as an Apple file system. I used a different Macbook to do this, however my first attempt didn't work. It isn't sufficient to just format the SD Card, you also need to partition it, and that's where the filesystem is created.

I created a single APFS partition across the entire SD Card and then tried mounting it.

Unfortunately, now it was no longer at /dev/disk4 even though that's what the kernel debug messages said. Looking at /dev/disk* showed me that /dev/disk5s1 was a potential candidate.

# mount -t apfs /dev/disk5s1 /System/VM

Finally, this worked. I was able to copy my files over, and remove them from the Data partition. This freed up about 45GB, which allowed me to continue with the upgrade.

After the upgrade completed, I appear to have 75GB free. I haven't had a chance to check where the space has changed. I also plan to permanently use the SD Card (256GB) as an external hard drive.

Understanding Emotion for Happy Users

2020-11-18T10:44:00.001-05:00

How does your site make your users feel?

Introduction

So you’ve come here for a post about performance, but here I am talking about emotion… what gives? I hope that if you haven’t already, then as this post progresses, you’ll see that performance and emotion are closely intertwined.

While we may be web builders, our goal is to run a business that provides services or products to real people. The website we build is a means of connecting people to that service or product.

The way things are…

The art and science of measuring the effects of signal latency on real users is now about 250 years old. We now call this Real User Measurement, or RUM for short, and it’s come a long way since Steve Souders’ early work at Yahoo.

Browsers now provide us with many APIs to fetch performance metrics that help site owners make sites faster. Concurrently, the Core Web Vitals initiative from Google helps identify metrics that most affect the user experience.

These metrics, while useful operationally, don’t give us a clear picture of the user experience, or why we need to optimise them for our site in particular. They don’t answer the business or human questions of, “Why should we invest in web performance?” (v/s for example, a feature that customers really want), or even more specifically, “What should we work on first?”.

Andy Davies recently published a post about the link between site speed and business outcomes…

Context influences experience,
Experience influences behaviour,
Behaviour influences business outcomes.

All of the metrics we collect and optimise for deal with context, and we spend very little time measuring and optimising the rest of the flow.

Switching Hats

Over the last decade working on boomerang and mPulse, we slowly came to the realisation that we’ve been approaching performance metrics from a developer centric view. We’d been drawing on our experience as developers – users who have browser dev tools shortcuts committed to muscle memory. We were measuring and optimising the metrics that were useful and easy to collect from a developer’s point of view.

Once we switched hats to draw on our experiences as consumers of the web, the metrics that really matter became clearer. We started asking better questions...

What does it mean that performance improved by 100ms?
Are all 100ms the same?
Do all users perceive time the same way?
Is performance all that matters?

In this post, we’ll talk about measuring user experience and its effects on behaviour, what we can infer from that behaviour, and how it affects business outcomes.

Delight & Frustration

In Group Psychology and the Analysis of Ego, Freud notes that “Frustration occurs when there is an inhibiting condition that interferes with or stops the realization of a goal.”

Users visit our sites to accomplish a goal. Perhaps they’re doing research to act on later, perhaps they want to buy something, perhaps they’re looking to share an article they read a few days ago.

Anything that slows down or prevents the user from accomplishing this goal can cause frustration. On the other hand, making their goal easy to find and achieve can be delightful.

How a user feels when using our site affects whether they’ll come back and “convert” into customers (however you may define convert).

The Link Between Latency & Frustration

In 2013, Tammy Everts and her team at Radware ran a usability lab experiment. The study hooked participants up to EEG devices, and asked them to shop on certain websites. Half the users had an artificial delay added to their browsing experience and neither group were made aware of the performance changes. They all believed they were testing the usability of the sites. The study showed that...

A 500ms connection speed delay resulted in up to a 26% increase in peak frustration and up to an 8% decrease in engagement.

Similarly in 2015, Ericsson ConsumerLab neuro research studied the effects of delayed web pages on mobile users and found that “Delayed web pages caused a 38% rise in mobile users' heart rates — equivalent to the anxiety of watching a horror movie alone.”

This may not be everyone’s cup of tea, and the real implication is that users make a conscious or unconscious decision on whether to stick around, return, or leave the site.

Cognitive Bias

Various cognitive biases affect how individual experiences affect perception and behaviour. Understanding these biases, and intervening when an experience tends negative can improve the overall experience.

Perceptual Dissonance

Also known as Sensory Dissonance, Perceptual Dissonance results from unexpected outcomes of common actions.

The brain’s predictive coding is what helps you do things like “figure out if a car coming down the road is going slow enough for you to cross safely”. A perceptive violation of this coding is useful in that it helps us learn new things, but if that violation breaks long standing “truths”, or if violations are inconsistent, it makes learning impossible, and leads to psychological stress, and frustration.

On the web, users expect websites to behave in a certain way. Links should be clickable, sites should in general scroll vertically, etc. Things like jank while scrolling, nothing happening when a user clicks a link (dead clicks), or a click target moving as the user attempts to click on it (layout shift) causes perceptual dissonance and frustration.

If these bad experiences are consistent, then users come to expect them. Our data shows that users from geographies where the internet is slower than average tend to be more patient with web page loads.

Survivorship Bias

We only measure users who can reach our site. For some users, a very slow experience is better than an unreachable site.

In 2012, after Youtube made their site lighter, Chris Zakariahs found that aggregate performance had gotten worse. On delving into the data, they found that new users who were previously unable to access the site were now coming in at the long tail. The site appeared slower in aggregate, but the number of users who could use it had gone up.

Negativity Bias

Users are more likely to remember and talk to their friends about their bad experiences with a site than they are about the good ones. We need only run a twitter search for “$BRAND_NAME slow” to see complaints about bad experiences.

Bad experiences are also perceived to be far more intense than equivalent good experiences. To end up with a neutral overall experience, bad experiences need to be balanced with more intense good experiences. A single bad experience over the course of the session makes it harder to result in overall delight.

Active Listening

Research shows that practicing Active Listening can have a big impact on countering Negativity Bias. Simply acknowledging when you’ve screwed up and didn’t meet the user’s expectations can alleviate negative perception. If we detect, via JavaScript, that the page is taking too long to transition between loading states, we could perhaps display a message that acknowledges and apologizes for things going slower than expected.

Hey, we realise that it’s taking a little longer than expected to get to what you want. You deserve better. We’re sorry and hope you’ll stick around a bit.

Users will be more forgiving if their pain is acknowledged.

Measuring Emotion

There are many ways we could measure the emotional state of users using our site. These range from active engagement to completely creepy. Naturally not all of these will be applicable for websites...

Use affective computing (facial analysis, EEGs, pulse tracking, etc.)
Ask the user via a survey popover
Business outcomes of behaviour
Behavioural analysis

Affective Computing

For website owners, affective computing isn’t really in play. Things like eye tracking, wireless brain interfaces, and other affective computing methodologies are too intrusive. They work well in a lab environment where users consent to this kind of tracking and can be hooked up to measurement devices. This is both inconvenient, and creepy to run on the web.

Ask the user

Asking the user can be effective as shown by a recent study from Wikipedia. The study used a very simple Yes/No/No Comment style dialog with randomized order. They found that users’ perceived quality of experience is inversely proportional to median load time. A 4% temporary improvement to page load time resulted in an equally temporary 1% extra satisfied users.

This method requires active engagement by the user and suffers from selection bias and the hawthorne effect.

It’s hard to quantify what kinds of experiences would reduce the effects of selection bias and result in users choosing to answer the survey, or how you’d want to design the popover to increase self-selection.

The Hawthorne effect, on the other hand, suggests that individuals change the way they react to stimuli if they know they’re being measured or observed.

Business Outcomes

Measuring business outcomes is necessary but it can be hard to identify what context resulted in an outcome. One needs to first understand the intermediate steps of experience and behaviour. Did a user bounce because the experience was bad, or did they just drop in to do some research and will return later to complete a purchase?

Behavioural analysis

Applying the results of lab based research to users actively using a website can help tie experience to behaviour. We first need to introduce some new terms that we’ll define in the paragraphs that follow.

Rage Clicks, Wild Mouse, Scrandom, and Backtracking are behavioural signals we can use. In conjunction with when in a page’s life cycle users typically expect different events to take place, they can paint a picture of user expectations and behaviour.

Correlating these metrics with contextual metrics like Core Web Vitals on one hand, and business outcomes on the other can help us tell a more complete story of which performance metrics we should care about and why.

Rage, Frustration & Confusion

To measure Rage, Frustration & Confusion, we look at Rage Clicks, Wild Mouse and Backtracking.

Rage Clicks

Rage Clicks occur when users rapid-fire click on your site. It is the digital equivalent of cursing to release frustration. We’ve probably all caught ourselves rage clicking at some point. Click once, nothing happens, click again, still nothing, and then on and on. This could be a result of interaction delays, or of users expecting something to be clickable when it isn't.

Rage clicks can be measured easily and non-intrusively, and are easy to analyse.

Fullstory has some great resources around Rage Clicks.

Wild Mouse

Research shows that people who are angry are more likely to use the mouse in a jerky and sudden, but surprisingly slow fashion.

People who feel frustrated, confused or sad are less precise in their mouse movements and move it at different speeds.

There are several expected mouse movements while a user traverses a website. Horizontal and vertical reading patterns are expected and suggest that the user is engaged in your content.

On the other hand, random patterns, or jumping between options in a form can suggest confusion, doubt, and frustration. See Churruca, 2011 for the full study.

The JavaScript library Dawdle.js can help classify these mouse patterns.

Scrandom

Scrandom is the act of randomly scrolling the page up and down with no particular scroll target. This can indicate that a user is unsure of the content, the page is too long, or is waiting for something to happen and making sure that the page is still responsive without accidentally clicking anything.

Backtracking

Backtracking is the process of hitting the back button on the web. Users who are confused or lost on your site may hit the back button often to get back to a safe space. This behaviour may manifest itself in different ways, but can often be identified with very long sessions that appear to loop.

Tie this into the Page Load Timeline

In his post on Web Page Usability, Addy Osmani states that loading a page is a progressive journey with four key moments to it: Is it happening? Is it useful? Is it usable? and Is it delightful? And he includes this handy graphic to explain it:

The first three are fairly objective. With only minor differences between browsers, it’s straightforward to pull this information out of standard APIs, and possibly supplement it with custom APIs like User Timing.

We’ve found that over 65% of users expect a site to be usable after elements have started becoming visible but before it is actually Interactive. Contrast that with 30% who will wait until after the onload event has fired.

Correlating Rage with Loading Events

Comparing the points in time when users rage click with the loading timeline above, we see some patterns.

The horizontal axis on this chart is time as a relative percent of the full page load time. -50 indicates half of the page load time while +50 is 1.5x the page load time. The vertical axis indicates intensity of rage while point radius indicates probability of rage clicks at that time point. The coloured bars indicate 25th to 75th percentile ranges for the particular timer relative to full page load with the line going through indicating the median.

We see a large amount of rage between content becoming visible and the page becoming interactive. Users expect to be able to interact with the page soon after content becomes visible, and if that expectation isn’t met, it results in rage clicking.

We also see a small stream of rage clicks after the page has completed loading, caused by interaction delays.

There’s a small gap just before the onload event fires. The onload event is when many JavaScript event handlers run, which in turn result in Long Tasks, and increased Interaction Delays. What we’re seeing here is not the absence of any interaction, but survivorship bias where the interactions that happen at that time aren’t captured until later.

The horizontal axis on this chart is relative time along the page load timeline. We looked at various combinations of absolute and relative time across multiple timers, and it was clear that relativity is a stronger model, which brings us to a new metric based on relative timers...

Frustration Index

The frustration index, developed by Tim Vereecke, is a measure based on the relation between loading phases. We’ve seen that once one event occurs, users expect the next to happen within a certain amount of time. If we miss that expectation, the user's perception is that something is stopping or inhibiting their ability to complete their task, resulting in frustration.

The Frustration Index encapsulates that relationship. The formula we use is constantly under development as research brings new things to light, but it’s helpful to visit the website to understand exactly how it works and see some examples.

So how do we know that this is a good metric to study?

Correlating Rage & Frustration

It turns out that there is a strong correlation (ρ=0.91) between the intensity of rage (vertical axis) that a user expresses and the calculated frustration index (horizontal axis) of the page.

Rather than looking at individual timers for optimization, it is better to consider all timers in cohesion. Improving one of them changes the user’s expectation of when other events should happen and missing that expectation results in frustration.

However, further to this, the formula is something we can apply client-side to determine if we’re meeting expectations, and practice active listening if we’re not.

Correlating Frustration & Business Outcomes

Looking at the correlation between Frustration Index and certain business metrics also shows a pattern.

Bounce Rate is proportional to the frustration index with a sharp incline around what we call the LD50 point (for this particular site). ρ_b=0.65
Average Time spent on the site goes down as frustration increases, again sharply at first and then tapering off. ρ_t=-0.49

LD₅₀

The LD₅₀, or Median Lethal Dose is a term borrowed from the biological sciences. Buddy Brewer first applied the term to web performance in 2012, and we’ve been using it ever since.

In biology, it’s the dosage of a toxin that kills off 50% of the sample, be it tumour cells, or mice.

On the web, we think of it more in terms of when 50% of users decide not to move on in their journey. We could apply it to bounce rate, or retention rate, or any other rate that’s important to your site, and the “dose”, may be a timer value, or frustration index, or anything else. Depending on the range of the metric in question, we may also use a percentile other than the median, for example, LD₂₅ or LD₇₅.

This isn’t a single magic number that works for all websites. It isn’t even a single number that works for all pages on a site or for all users. Different pages and sites have different levels of importance to a user, and a user’s emotional state, or even the state of their device (eg: low battery), when they visit your site can affect how patient they are.

Patience is also a Cultural Thing

People from different parts of the world have a different threshold for frustration.

Many of our customers have international audiences and they have separate sites customized for each locale. We find that users from different global regions have different expectations of how fast a site should be.

In this chart, looking at 5 high GDP countries (that we have data for), we see a wide distribution in LD₂₅ value across them, ranging from a value of 10 for Germany to the 40s for Australia and Canada. It’s not shown in this chart, but the difference is even wider when we look at LD₅₀, with Germany at 14 and Canada at 100.

So how fast should our site be?

We’ve heard a lot about how our site’s performance affects the user experience, and consequently how people feel when using our site. We’ve seen how the “feel” of a site can affect the business, but what does all of that tell us about how to build our sites?

How fast should we be to reduce frustration?
What should we be considering in our performance budgets?
How do we leave our users feeling happy?

I think these may be secondary questions…

A better question to start with, is:

Will adding a new feature delight or frustrate the user?

References

Acknowledgements

Thanks to Andy Davies, Nic Jansma, Paul Calvano, Tim Vereecke, and Cliff Crocker for feedback on an earlier draft of this post.

Thanks also to the innumerable practitioners whose research I've built upon to get here including Addy Osmani, Andy Davies, Gilles Dubuc, Lara Hogan, Nicole Sullivan, Silvana Churruca, Simon Hearne, Tammy Everts, Tim Kadlec, Tim Vereecke, the folks from Fullstory, and many others that I'm sure I've missed.

Implementing Spearman's Rank Correlation in SQL

2019-10-07T12:00:00.000-04:00

In my last post, I showed how to implement Pearson's Correlation as an SQL Window function with window frame support. In this post, I'll follow up with implementing Spearman's Rank correlation co-efficient in SQL.

While Pearson's correlation looks for linear relationships between two vectors (ie, you wouldn't use it for exponential relationships), Spearman's rank correlation looks for monotonicity, or in plain english, do the two values go up & down together?

So here's the really cool part. Spearman's Rank correlation co-efficient is the Pearson's correlation co-efficient of the ranks of the two vectors. We already know how to calculate Pearson's correlation co-efficient, so what we need to do here is first calculate ranks of our vectors.

We can do this using the SQL RANK function, which also works as a window function with window frame support:

RANK() OVER (PARTITION BY <partition cols> ORDER BY x ASC) as R_X,

RANK() OVER (PARTITION BY <partition cols> ORDER BY y ASC) as R_Y,

The two important things to note here are that RANK() does not take a parameter, instead you specify what you want to rank on in the ORDER BY clause, and secondly, make sure both parameters are ordered in the same direction, ASC or DESC.

Now even though the RANK() function supports window frames, you don't want to use them here. This is so because if you're using sliding windows, each row will have a different rank depending on the window, and we won't be able to correlate an outer window.

Once we have the ranks in an inner query, we can run either the standard CORR function, or the windowed CORR that we developed in the previous post on these derived columns instead:

SELECT CORR(R_X, R_Y) FROM (
    SELECT
        RANK() OVER (PARTITION BY <partition cols> ORDER BY x ASC) as R_X,

        RANK() OVER (PARTITION BY <partition cols> ORDER BY y ASC) as R_Y
      FROM ...
)

If implementing this as a window function, then use R_X and R_Y as the inputs to the SUM() functions with an additional nested query.

I hope this was helpful, leave a comment or tweet @bluesmoon if you'd like to chat.

Implementing Pearson's CORR as a database window function

2019-10-02T13:56:00.000-04:00

I recently needed to find the Pearson's Correlation Coefficient between two columns in a Snowflake database. Now Snowflake and PostgreSQL both support a CORR aggregate function that correlates along a GROUP BY. Snowflake additionally supports CORR as a window function, but only for use with PARTITION BY. It does not support window frames. Other databases like MySQL do not have a CORR function at all. See SQL in a nutshell for more information on database support.

If you want to know about window functions, Julia Evans (@b0rk) has a great SQL tip on them.

In my case, I needed to find the Pearson's coefficient along a sliding window of rows. ie, for a list of N rows, I needed N-k coefficients, one for each sliding window of size k. As far as I could tell, only Oracle supports this functionality, and I wasn't that desperate, so I set about figuring out how to implement it myself.

Fortunately, Pearson's correlation coefficient is calculated using a very simple algebraic function. The full details are on the Wikipedia page linked above, but it's helpful to break it down into manageable pieces.

At a high level, the coefficient ρ (the greek letter rho) is defined as the covariance of the vectors divided by the product of standard deviation of the two vectors, or mathematically:

ρ(x,y) = cov(x, y) / (σ(x) * σ(y))

In SQL, this would be

COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))

This simplifies things a bit since STDDEV_POP does support window frames, but COVAR_POP does not. That reduces our problem to implementing COVAR_POP as a window function.

This is much simpler, because the covariance uses sum and count, both of which are implemented as window functions with window frame support:

COVAR_POP = (SUM(x * y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*), or mathematically:

cov(x, y) = (Σ(x * y) - Σx * Σy / N) / N

But it gets even better. Since we're calculating these SUMs and COUNTs anyway, why not use them to implement STDDEV as well? A simplified formula for STDDEV uses the the sum of squares and the square of the sum as follows:

σ = SQRT(N * Σx^2 - (Σx)^2) / N.

Combining the formulae above, we get:

ρ(x,y) = cov(x, y) / (σ(x) * σ(y))

       = ( (Σ(x * y) - Σx * Σy / N) / N ) / (     (SQRT(N * Σx^2 - (Σx)^2) / N) * (SQRT(N * Σy^2 - (Σy)^2) / N) )

       =     (Σ(x * y) - Σx * Σy / N)      / ( N * (SQRT(N * Σx^2 - (Σx)^2) / N) * (SQRT(N * Σy^2 - (Σy)^2) / N) )

       =     (Σ(x * y) - Σx * Σy / N)      / (     SQRT(N * Σx^2 - (Σx)^2)       * SQRT(N * Σy^2 - (Σy)^2) / N )

       = N * (Σ(x * y) - Σx * Σy / N)      / (     SQRT(N * Σx^2 - (Σx)^2)       * SQRT(N * Σy^2 - (Σy)^2) )

       =     (N * Σ(x * y) - Σx * Σy)      / (     SQRT(N * Σx^2 - (Σx)^2)       * SQRT(N * Σy^2 - (Σy)^2) )

I've left the product of the two squareroots in the denominator as-is rather than simplifying it further because the simplifaction could result in numeric overflow.

So, we now have a function for Pearson's correlation coefficient using only SUM & COUNT, both of which support window functions and window frames.

For each of these, we can now SELECT something like this in an inner query:

COUNT( * )   OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS N,

SUM(x)       OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS SUM_X,

SUM(x * x)   OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS SUM2_X,

SUM(y)       OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS SUM_Y,

SUM(y * y)   OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS SUM2_Y,

SUM(x * y)   OVER (PARTITION BY <partition cols> ORDER BY <minute> ROWS BETWEEN $k2 PRECEDING AND $k2 FOLLOWING) AS SUM_XY,

$k2 is half the sliding window size, so if you wanted a window of 60 elements, k2 would be 30. The ROWS BETWEEN r1 PRECEDING AND r2 FOLLOWING syntax specifies a window of rows extending at most r1 rows before the current row, and at most r2 rows beyond the current row. We follow that with an outer query that SELECTs this:

x,
y,
CASE
    WHEN N * SUM2_X > SUM_X * SUM_X AND N * SUM2_Y > SUM_Y * SUM_Y
    THEN (N * SUM_XY - SUM_X * SUM_Y) / (SQRT(N * SUM2_X - SUM_X * SUM_X) * SQRT(N * SUM2_Y - SUM_Y * SUM_Y))
    ELSE 0.0
END AS corr_yx

And there we have it... Pearson's correlation coefficient implemented as a window function with window frame support.

With the above combination, we can get a Pearson's correlation for each (x, y) tuple in the table that correlates a sliding window of data. In my case this was a timeseries database, so for each minute of data, I get a correlation co-efficient of (at most) 61 minutes around that minute (at most 30 before and at most 30 after).

We can still use the CORR aggregate function on the entire list of (x, y) tuples, or post calculate that in a language like Julia.

For the next installment, maybe I'll write up how to do Spearman's Rank Correlation Coefficient.

Set your password to "password" - A tale of datacenter security

2017-04-13T17:23:00.001-04:00

This is an old story. Old enough that the guilty parties have either learnt from their mistakes or have moved on to other things.

It was 1999. I was an intern at an ERP/CRM company called Thirdware Solutions Pvt. Ltd. or TSPL for short. If you'd asked me a year earlier, I wouldn't have heard of this company, and I wouldn't have imagined ever working in this space. However the people who interviewed us made a convincing argument that as interns we'd be able to work on fringe technologies that they couldn't spare the rest of their full time engineers for. The web was one of these. It was very new to Indian businesses, the dot com bubble was at its peak. I'd been dabbling with HTML, JavaScript, and CSS for about 3 years and was interested in learning more about the underlying protocols. So I joined them.

I largely consider this a good decision. After the initial training on what the company did and how it did it, I was given some freedom to explore. In about about a week I'd reformatted my windows box and installed RedHat Linux 5.2. Reading through all the HOWTOs that I could find, I managed to set up daemons for DNS, DHCP, SMTP, HTTP and POP3.

At the time the company had a single Pentium box that would connect to the internet over dialup. They used a commercial HTTP & Email proxy on this box that only allowed 3 people to connect at a time, so people in the office developed an honour system where each would connect, send & receive email, and then disconnect as soon as possible. If anyone was expecting large customer requests, they'd let the rest of the office know and people would stay off the network. I took it upon myself to "fix" this. With the blessing of our network administrator and the company CEO, I wrote a little Java app that proxied ports 25 and 110. I couldn't figure out HTTP proxying yet, but POP3 & SMTP were ok. I just had to give everyone a new "proxy email address" that they'd use when connecting to the proxy and that would be translated to the actual address when going out to the server.

We left the web throttled at 3 users since no one in the office needed to access the web, and email was the most critical use of the internet.

This worked out quite well, so the leadership team started to trust me a fair bit. I do not think that any other company would have trusted an intern with the kinds of decisions they let me make following that, but it leads directly into how we avoided a fairly bad security situation.

A few months later we decided to make our ERP/CRM system available over the web. A full rewrite would take over a year, but we found something called Citrix App Server, that ran on Windows NT and would make any desktop application available to someone over the web taking care of basic authentication. We tested it out locally and it worked well on our LAN, so we now had to make it available to our customers. Except, this wasn't happening over a 56K dialup network that only allowed 3 users through at a time.

We ended up speaking to the top ISPs in India at the time, and got a great deal from one of them to put our Windows NT box inside their datacenter, on their always on network.

A few weeks later we locked the hard drive. No, this is not a security thing. This is a process of moving the drive's arm to the outer most "locked" position, so that significant vibrations would not result in the head hitting and damaging the disk platters. We did this because the next step was me and our network admin sitting in the backseat of the company president's car with our Windows NT Pentium 6 Tower PC resting across our laps while our company president drove us down the length of Mumbai trying to avoid as many potholes as possible.

We made it to the other end and when I powered the host back up and unlocked the drive (automatically on boot up), it still ran, so we were happy. We went into an unmarked building, carried the box to a floor with security guards outside the door, and a keypad entry. Inside, there were closets of blades and a few minitowers and tower hosts sitting on the bottom shelf of a rack. We were told to put our box next to the others, and then the guy who ran the datacenter said the magic words.

Start up your server, and set the Administrator password to "password"

I glanced over at the other boxes, and they all had stickers on them saying "Administrator/password"

The three of us from TSPL looked at each other, and our president told me to decide. I asked the datacenter guy why he needed that. He said that sometimes they need to shutdown the boxes so they can move them to a different power strip. I asked him if it would be sufficient to give him an account that only had local access and could only reboot the box. He thought about it for a bit and said yes.

So I created a new account that required a physically attached keyboard for login, and all it had was the ability to reboot the box. Our app was set up to start up automatically on boot, so we weren't worried about someone having to start it. DC guy physically locked the box to a rack, showed us that he was keeping they key, and we headed back to the office.

We now needed to test our setup, so we asked everyone in the office to let us use the internet connection. We tried accessing our app, and it worked!

Since I had Admin access to our box, I was also able to open the "Network Neighbourhood" of our box in the datacenter. On that network, I saw all the other hosts that were in the datacenter. They had names identifying them from India's largest IT companies. These were companies I'd initially though of interning at.

I looked at our president and grinned, and he looked back and said, "Send me a safe summary report when you're done" and walked off to his office.

I double clicked on one of the other big boxes and was prompted for a username and password to connect to it.

You can probably guess what happened next ;)

Velocity Santa Clara 2015 -- My List

2015-05-27T13:11:00.000-04:00

At Velocity SC 2015, these are the talks that I'd really like to see if only I could be in more than one room at a time.

Wednesday, May 27

09:00 Service Workers by Pat Meenan

Exciting new technology currently available in Chrome

11:00 Building performant SPAs by Chris Love

Since we're doing a lot of SPA work for boomerang at the moment, I'm very interested in performance best practices for SPAs

13:30 Metrics Metrics Everywhere by Tammy & Cliff

Tammy and Cliff are my colleagues at SOASTA, and this talk is based on a lot of the data that I've been working to collect over the last few years. I'm torn between this and the next one also at the same time.

13:30 Self-healing systems by Todd & Matt

Todd & Matt are also colleagues at SOASTA, and this talk is about the infrastructure we've developed to collect the metrics that are covered in the talk that Tammy & Cliff are doing. I really wish I could be at both.

15:30 Linux Perf Tools by Brendan Gregg

Always interested in tools to analyse linux performance.

Thursday, May 28

I haven't listed the keynotes here because that's the only track at the time and I don't need to choose which room to be in.

13:45 LinkedIn's use of RUM by Ritesh Maheshwari

LinkedIn uses a modified version of boomerang, and I'm keen to know what they've done.

13:45 Stream processing and anomaly detection by Arun Kejriwal

Very interesting topic, something that I'm very interested in.

13:45 Design & Performance by Steve Souders

Steve's talks are always educational

14:40 Visualising Performance Data by Mark Zeman

Again, this is something I'm working on at the moment, so very interested.

14:40 Failure is an Option by Ian Malpass

Etsy's devops talks are always educational.

16:10 Crafting performance alerting tools by Allison McKnight

I'm very interested in crafting alerts from RUM data.

Friday, May 29

09:00 RUM at MSN by Paul Roy

14:25 Missing Bandwidth by Bill Green

14:25 Winning Arguments with Performance Data by Buddy Brewer

17:05 All talks at this time slot

This last slot is unfortunate. Every talk at this slot is interesting and by good speakers.

IE throws an "Invalid Calling Object" Exception for certain iframes

2015-01-08T13:59:00.000-05:00

On a site that uses boomerang, I found a particular JavaScript error happen very often:

TypeError: Invalid calling object

This only happens on Internet Explorer, primarily IE 11, but I've seen it on versions as old as 9. I searched through stack overflow for the cause of this error, and while many of the cases sounded like they could be my problem, further investigation showed that my case didn't match any of them. The code in particular that threw the exception was collecting resource timing information for all resources on the page. Part of the algorithm involves drilling into iframes on the page, and this error showed up on one particular iframe. There are a few things to note:

   ("performance" in frame) === true;

   frame.hasOwnProperty("performance") === false;

The latter is not a surprise since hasOwnProperty("performance") is not supported for window objects on IE (I've seen this before when investigating JSLint problems.) There was no problem accessing frame.document, but accessing frame.performance threw an exception.

    frame.performance;    // <-- throws "TypeError: Invalid calling object" with error code -2147418113

    frame["performance"]; // <-- throws "TypeError: Invalid calling object" with error code -2147418113

In fact, frame.<anything except document> would throw the same exception. So I looked at the iframe's document object some more, and found this:

    frame.document.pathname === "/xxx/yyy/123/4323.pdf";

The frame was pointing to a PDF document, and while IE was creating a reference to hold the performance object of this document, it prevented any attempts to access this reference. I tested Chrome and Firefox, and they both create and populate a frame.performance object for PDF documents.

jslint's suggestion will break your site: Unexpected 'in'...

2014-08-22T00:45:00.001-04:00

I use jslint to validate my JavaScript before it goes out to production. The tool is somewhat useful, but you really have to spend some time ignoring all the false errors it flags. In some cases you can take its suggestions, while in others you can ignore them with no ill effects.

In this particular case, I came across an error, where, if you follow the suggestions, your site will break.

My code looks like this:

   if (!("performance" in window) || !window.performance) {
      return null;
   }

jslint complains saying:

Unexpected 'in'. Compare with undefined, or use the hasOwnProperty method instead.

This is very bad advice for the following reasons:

Comparing with undefined will throw an exception on Firefox 31 if used inside an anonymous iframe.
Using hasOwnProperty will cause a false negative on IE 10 because window.hasOwnProperty("performance") is false even though IE supports the performance timing object.

So, the only course of action, is to use in for this case.

Don't guess at TimeZones in JavaScript

2013-08-08T04:01:00.000-04:00

I spent quite some time a couple of months ago working on timezone support for mPulse and thought I should document the insanity, but never quite got around to it. Then there was this post on hacker news about reading a user's timezone in JavaScript and using that to display the right time. That post brought back a flood of horrific memories, prompting me to put my thoughts down.

First, while Trevor's post has a good hack to display a time in the user's current timezone, that hack works in only one case -- displaying the current time to the user in their device's timezone.

If you've worked with timezones and front end development for a while, this is probably the first hack you'll think up. It turns out that in most cases, this is insufficient.

We'll first look at the problems with this approach, and then look at the requirements for proper timezone support.

Problems

The user's device timezone is not always correct. Some users fix their device timezone to their home time even when they're travelling, however the information you need to display may be pertinent for the location where they are right now.
On the other hand, the user may have their device set to automatically update timezone, but they actually want to see times in their home time (because, for example, that's when they call home, or have their calendar app configured).
The timezone offset (which is what you actually get from JavaScript), only tells you the offset from UTC for "right now". This information is irrelevant if you need to display a time that is not now, because daylight saving rules may come into effect.
You cannot use a lookup table for offset to timezone, because there isn't a one-to-one mapping between offset and timezone. It's a many-to-many mapping, and it changes.

Second attempt

A second attempt might be to figure out the timezone name by parsing the JavaScript Date.toString() output. This was my second attempt when writing the strftime function for the YUI Library.

I did this study in 2008, and it turns out that browsers are pretty inconsistent wrt Date.toString() output.

Requirements

Ok, before going into this, read this post on stackoverflow about daylight saving time and timezones.

So, what we need is the ability to do the following:

store any date or range of dates.
display a date in any timezone that makes sense for the user, and/or the event(s) being displayed, and/or the environment.
display date ranges that may cross a timezone boundary.
display a historic date in a historic timezone that may have changed due to political decisions.

The first requirement should be pretty straightforward. We'd like to store dates, and the best way is really a unix timestamp or an ISO8601 date. I prefer the latter because it takes into account leap seconds as well (unix timestamps are leap second agnostic [1],[2]). I also always use Zulu time for an ISO8601 date.

This is not sufficient, however. We also need to store the timezone name of the event. This is so that we can display historic events in the timezone they originally occurred in, even if the definition of that timezone changes. This comes from the Olson Database.

With these two pieces of information (event date/time & event timezone name), we can render the date in several ways... the original event date/time, the event date/time relative to the user's current timezone, etc.

We also need to handle date ranges. This could be something like your spring vacation, that just happened to cross several timezones because you left San Francisco on March 8th, flew to the UK, stayed there until April 7th, and then flew back. Your flight departure from SFO is in Pacific Standard Time and your arrival at LHR is in British Standard Time. Your departure from LHR is in British Daylight Time, and your arrival at SFO is in Pacific Daylight Time.

What's most important is that you display these dates in their specific timezones regardless of where the user actually is.

So should we guess or ask the user what they want?

By all means guess at what the user's timezone might be. Use a combination of GeoIP + JavaScript timezone offset to figure out where they might be (note that both of these could be wrong), but give them the option to specify the timezone that they care about.

Also, when displaying event dates, use a date local to the event, but use JavaScript to allow the user an easy way to flip it to their local timezone if they like. That's progressive enhancement.

What else shouldn't we do?

Don't try and guess the user's language or preferred currency from their current location. Always ask and store their response.

Click to help me test boomerang (opens google.com in a new window).

Reducing checkboxes

2013-03-18T16:48:00.002-04:00

Alex Limi has an excellent post on the overuse of checkboxes in Firefox's preferences screen. It reminded me of something Nat mentioned during his talk with Miguel de Icaza back at Linux Bangalore 2003 about Gnome. They mentioned several UI idioms including checkboxes and disabled menu items, but the gist of it was, every time you give the user a decision to make, you're making their lives harder. As the domain expert for this product, it's your job to pick sane defaults and not bother the user with these choices.

We took this to heart on the Ayttm project. At the time ayttm probably had over 200 user modifiable configuration options in the preferences screen, and each plugin could add its own. It was way past the point of violating one of our primary design requirements, that it should be easy enough for Colin's mum to use. We had a bit of a dilemma though. While our target audience was definitely non technical, we had a significant number of geeky early adopters who really wanted the ability to modify everything.

Over the next few days we stripped out almost every configurable option from the Preferences screen, however, we left them all in the config file on disk. Any user that really wanted to modify the options could edit the config file in their favourite text editor and make the changes themselves. This made everyone happier. Our technical users were happy that they didn't have to click through too many screens to change all their options, and our non technical users had a preferences screen where the most they'd have to do was enter their account information, and the type of smileys they wanted.

The Gnome Human Interface Guidelines cover a lot about designing intuitive interfaces, so go read that.

Nexus 4 — First impressions

2013-02-01T12:41:00.000-05:00

Just got a Nexus 4. Started it up, and tried to sign in to my gmail account. I have two-factor auth set up for extra security, and you'd expect a google device to work well with google auth, but here's a list of bugs I've found within minutes of opening the package.

It first asks you to sign in to your gmail account. Type in your username and password.
It then tells you you need to sign in on the web to continue, so it opens the browser for you to sign in.
Focus is on the username field, but there's no keyboard available. You need to tap on the username field (which already has focus) to bring up the keyboard. This is counter-intuitive.
Type in your username and password and submit. It then takes you to the 2FA page (at least in my case) where you enter your security code.
Again, tap on the field to bring up the keyboard. This field is a numeric field, but the keyboard starts out in alphabetic mode. This is probably a bug with all mobile devices, but you'd think that something this new would have fixed it.
Type in the code, and now try to click on the checkbox that says "Remember this device". Except the keyboard goes away and you now end up clicking a link that explains what 2FA is.
Ok, cool, I just want to go back and hit submit... except there's no back button on this browser. There's no toolbar that normally pops up at the bottom of the browser. No, this is a special browser that does not allow you to navigate through the browser history. FAIL.
The only way forward is to shut down the phone and then start it up again. Except at this point you start from scratch, including selecting your language.

A few other things I've noticed...

The timezone by default appears to be UTC. You'd think that it'd localise this based on my location, which it knows and is configured to use.
If a transient alert message pops up, and you try to tap on it, you'll actually tap on the item below the message.
The icons are a bit too small for someone with normal sized fingers like me. It's easy to tap on the wrong item.
The position of the power button and volume control buttons means that when you press the power button with your thumb, your forefinger will inadvertently hit the volume control (or vice-versa). This happens because of Newton's third law of motion. Google/LG engineers should know this since it's a 300 year old basic law of motion.
You cannot move a widget from one screen to another by dragging it. You have to remove it from the old screen and then go through the process of adding and configuring it again for the new screen.
When you select punctuation on the keyboard, entering an apostrophe should switch back to alphabetic mode. It doesn't.
It just reboots at times.

Will post more as I use it.

A correlation between load time and usage

2012-12-12T02:03:00.000-05:00

We frequently see reports of website usage going up as load time goes down, or vice-versa. It seems logical. Users use a site much more if it's fast, and less if it's slow. However, consider the converse too. Is it possible that a site merely appears to be faster because users are using it more, and therefore have more of it cached? I've seen sites where the server-side cache-hit ratio is much higher when usage is high resulting in lower latency. At this point I haven't seen any data that can convince me one way or the other. Do you?

Startup Lessons: What should you optimise?

2012-10-25T17:48:00.000-04:00

In the early days, when you can't afford hardware, optimise for efficient code.

Sometimes this results in unreadable code or a language that not too many developers are familiar with. This is okay. You're trying to reduce the cost of hardware.

When you get large enough to hire other developers, the cost of hardware is no longer your largest expense. At this point, optimise for code readability.

This might mean writing slightly less efficient code or moving to a more popular language. That's okay. Developer efficiency is more important at this time.

Analyzing Performance Data with Statistics

2012-08-13T10:54:00.001-04:00

At LogNormal, we’re all about collecting and making sense of real user performance data. We collect over a billion data points a month, and there’s a lot you can tell about the web and your users if you know what to look at in your data. In this post, I’d like to go over some of the statistical methods we use to make sense of this data.

You’d use these methods if you wanted to build your own Real User Measurement (RUM) tool.

The entire topic is much larger than I can cover in a single post, so go through the references if you’re interested in more information.

Data and Distribution

The data we’re looking at is the time between the user initiating an action (like a page load), and that action completing. In most cases, this is the time from a user requesting a page and that page’s onload event firing.

For a site with few page views, the probability distribution looks something like this:

And for a site with a large number of page views, the probability distribution looks like this:

The x-axis is page load time while the y-axis is the number of data points that fell into a particular bucket. The actual number of data points that fall into a bucket is not important, it’s the relative value that is.

We notice that as the number of data points increases, the distribution becomes more striking until you’re able to map well known probability distribution functions (PDFs) to it. In this case, the load time looks a lot like a Log-normal distribution, ie, a distribution where the y-axis v/s the logarithm of the x-axis is Normal (or Gaussian) in nature.

You can read more about Log-normal distributions at Wikipedia and on Wolfram Alpha.

Now while these tend to be the most common, they aren’t the only distribution. We also sometimes see a bimodal distribution like this:

This often shows up when two different distributions are added together.

Central tendency

Now regardless of the distribution, we need a measure of central tendency to tell us what the “Average” user experience is. Note that I’ve included the term “Average” in quotes because it’s often misinterpreted. Most people use the term average to refer to the arithmetic mean. Statistically though, the Average could refer to any single number that summarises a dataset, and there are many that we can pick from.

The most common are the arithmetic mean, the median and the mode. There’s also the geometric mean, the harmonic mean and a few others that we won’t cover.

I’ll briefly go over each of these terms.

The arithmetic mean

The arithmetic mean is simply the sum of all readings divided by the number of readings. In the rest of this document, we’ll refer to the arithmetic mean as amean or μ.

The best thing about the amean is that it’s really easy to calculate. Even with a very large number of data points, you only need to hold on to the sum of all points and the number of points. In terms of memory, this should require two integers.

The biggest drawback of the amean is that it is very susceptible to outliers. For example, the arithmetic mean of the following three sets is the same:

Set 1: 1, 1, 1, 1, 1, 1, 1, 1, 1, 91
Set 2: 6, 7, 8, 9, 10, 11, 12, 13, 14
Set 3: 2, 3, 3, 3, 4, 16, 17, 17, 17, 18

The median

Unlike the amean, the median actually does show up in the data set (more or less). The median is the middle data point (also called the 50% percentile) after sorting all data points in ascending order. In the Set 2 above, the median is 10. Set 1 is slightly different because there are an even number of data points, and consequently, two middle points. In this case, we take the amean of the two middle points, and we end up with (1+1)/2 == 1.

The cool thing about medians is that they aren’t susceptible to outliers. The point with value 91 in Set 1 doesn’t affect the median at all. It reacts more to where the bulk of the data is.

What makes medians hard to calculate is that you need to hold a sorted list of all data points in memory at once. This is fine if your dataset is small, but as you start growing beyond a few thousand data points, it gets fairly memory intensive. (Ever notice how most databases do not have a MEDIAN function, but most spreadsheet applications do?)

Medians are also not a great measure of central tendency when you have a bimodal distribution like in Set 3. In this case we have two separate clusters of data, but the median ends up being (5+15)/2 == 10, which isn’t even in the dataset.

The mode

We’ve mentioned bimodal distributions. The term bimodal comes from the fact that the distribution has two modes. Which brings up the question, “What is a mode?”. The French term la mode means fashion (the related phrase à la mode means in style). In terms of a data distribution, the mode is the the most popular term in the dataset.

Looking at our three datasets above, the modes are 1, n/a and (3, 17).

In the first set, the most popular (or frequent) value is 1. In set 3, it’s a tie between 3 and 17. In set 2, each term shows up only once, so there’s no “most popular” term.

When looking at large sets of real data though, we’ll approximate a bit, we may call a distribution multi-modal (or bimodal) if it has more than one term that’s far more popular than the others, even if all modes do not have equal popularity.

Looking back at our third distribution, we see that the two peaks aren’t of the same height, but they’re close enough, and each is a local maximum so we consider the distribution bimodal.

Finding a single mode involves finding the most popular data point in a data set. This isn’t too hard, and only requires a frequency table, which is less memory intensive than storing every data point. Finding multiple modes involves walking the distribution to find local maxima.

The geometric mean

Just like the arithmetic mean, the geometric mean involves counters. Unlike the aritmetic mean, it uses multiplication and roots rather than addition and division. We’ll use the terms gmean or μ_g to refer to the geometric mean.

The geometric mean of a set of N numbers is the N^th root of the product of all N numbers. Writing this in code, it would look like:

pow(x1 * x2 * ... * xN, 1/N)

The problem we’re most likely to encounter when calculating a geometric mean is numeric overflow. If you multiply too many numbers, your product is going to overflow sooner or later. Luckily, there’s more than one way to do things mathematically, and we can covert multiplication and roots into summation and division using logs and exponents. So the above expression turns into:

exp((log(x1) + log(x2) + ... + log(xN))/N)

So when would one use the geometric mean?

It turns out that the amean is great for Normal distributions and similarly, the gmean is great for Log-normal distributions.

Spread

So we’ve looked at central tendency a bit, but that only tells us what our data is centered around. None of the central tendency numbers tell us how closely we’re centered around that value. We also need a means to tell us the spread of the data.

When dealing with means (arithmetic or geometric), we can use the variance, standard deviation or standard error. The method for calculating these numbers is mostly the same for arithmetic and geometric means, except that we’ll use exponents and logs for the geometric spread.

For the median, spread is determined by looking at other percentile values, quartiles and inter quartile filtering.

Arithmetic Standard Deviation

The traditional way of calculating a standard deviation involves taking the square of the difference between each data point and the amean, then taking the mean of all those squares, and finally taking the square root of this new mean. This is also called a Root Mean Square (RMS for short)

This requires us to keep track of every data point until we’ve calculated the mean.

A less memory intensive way, and one which can be streamed is to calculate a running sum of squares. If we then want the standard deviation at any point, we use the following expression:

sqrt(sum(x^2)/N - (sum(x)/N)^2)

As data comes in, we need to add it to our sum and add its square to our sum of squares, and that’s about it.

The details of how we end up with this expression are well documented on the Wikipedia page about Standard deviation, so I won’t go into them here. We’ll refer to the standard deviation using the terms SD or σ.

Geometric Standard Deviation

The geometric standard deviation is almost the same, except we use log(x) instead of x, and once we get the final result, pass it to the exp() function. Similar to the arithmetic standard deviation, we’ll use the symbol σ_g to refer to the geometric standard deviation.

The one difference between arithmetic and geometric standard deviation is the notation we use for the spread. For arithmetic standard deviation, we use μ ± σ whereas for the geometric standard deviation, we use μ_g */ σ_g.

Percentiles

As we’ve seen in the curves above, real user performance data can have a very long tail, and if you care about user experience, you’ll want to know how far to the right that tail goes. We can do this by looking at things like the 95^th, 98^th or 99^th percentile values of the curve. The method of getting a percentile is exactly the same as the method for getting the median. We need to sort all data points in ascending order, and then pick the p^th point depending on the percentile we care about.

For example, if we have 1 million data points, and want to know the 95^th percentile, we’d look for the 1,000,000 * 0.95 == 950,000^th point. Similarly, for the 75^th or 25^th percentiles, we’d look for the 750,000^th or 250,000^th points.

In Set 2 (reproduced below), with 9 points, what would the 95^th and 25^th percentiles be? For fractional indices, round down if your arrays are zero based.

Set 2: 6, 7, 8, 9, 10, 11, 12, 13, 14

Inter Quartile Range

The Inter Quartile Range is the middle 50% of data points in a set. It includes all points between the 25^th and 75^th percentiles. The IQR is more robust than the entire range (min, max) since outliers are not included in the set. IQR is actually a single number which is the difference between the 75^th and 25^th percentile numbers.

Data filtering

When dealing with real user performance data, we may need to apply two levels of filtering. The first is to strip out absurd data. Remember that as with any data received over a web interface, you really cannot trust that the data you’re receiveing was sent by code you wrote, or someone else trying to masquerade as your code. The best you can do is require sane limits on your inputs, and make sure they fit these limits.

In addition to limiting for sanity, you also need to split your data set into two (or three) parts, one of which includes typical data points, and the remainder including outliers. These are both interesting sets, but need to be analysed separately.

Band-pass & sanity filtering

If you’ve ever used a graphic equaliser on a music system, you’ve worked with a band pass filter. In the audio-electronics world, a band-pass filter passes through components of the audio stream that fall within a certain frequency band, while blocking everything else. For example, to enhance bass effects and dampen other effects, we might pass through signals between 20Hz and 200Hz and block everything else, or let something else deal with it through a parallel stream.

With performance data, we can define similar limits. You should never see a page load time less than 0 seconds, and in fact it’s highly unlikely that you’d see a page load time of under 50 milliseconds (loading content from cache may be an exception). It’s also unlikely that you’d see a page load time of over 3-4 minutes… not because it doesn’t happen, but because users are unlikely to hang around that long^*.

Similarly, if you see timestamps in the distant past or distant future, chances are that it’s either fake data, or a very badly misconfigured system sending you that data. In both cases it’s probably something you want to drop (or pass to a separate handler for further analysis).

* Users may tolerate very long page load times if the page loads in a background tab and they never actually see it.

IQR Filtering

IQR filtering is based on the Inter Quartile Range that we saw earlier. Its job is to strip out outliers so we only look at typical data. To filter a dataset using IQR filtering, we first find the inter quartile range (Q3-Q1).

We then define a field width of 1.5 times this range: fw = 1.5 * (Q3-Q1).

Finally, we run a band-pass filter on the data set with an open interval of (Q1-fw, Q3+fw). An open interval is one in which the end points are not included, so your test would be x[i] > Q1-fw && x[i] < Q3+fw.

We can also include points that fall below the interval into a low-outliers group, and points that fall above the interval into a high-outliers group.

The great thing about IQR filtering is that it’s based on your dataset and not on some arbitrary limits derived from intuition ;) In other words, it will work just as well for datapoints coming in over a slow dialup network and datapoints coming in over a T3 line. A straight band-pass filter might not.

Example Numbers

As an example, here are some of the numbers we see for a near Log-normal distribution of data:

Beacons	A-Mean	G-Mean	Median	95^th	98^th
1.5M	4.2s	2.50s	2.39s	11s	19s
870K	6.4s	3.99s	3.87s	17s	29s

Notice how close the geometric mean and the median are to each other while the arithmetic mean gets pulled out to the right.

Endgame

We’ve gone over how to find central tendency and spread and how to filter your data. This covers everything you need to analyse your performance data. Have a look at the references for more information.

If you need an easy way to collect data, have a look at boomerang, LogNormal’s OpenSource Extendible JavaScript library that measures page load time, bandwidth, dns and a bunch of other performance characteristics in the user’s browser. It is actively developed and supported by the folks at LogNormal, and contributors from around the world.

And if you’d rather not do all of this yourself, have a look at the RUM tool we’ve built at LogNormal. Send us up to 100 million data points a month and we’ll do the rest.

Also, to celebrate Speed Awareness Month, we’re giving away 2 free months of our Pro service if you sign up at http://www.lognormal.com/promos/speedawarenessmonth.

Lastly, I will be speaking about this topic at Performance meetup groups around the US. The first one will be tomorrow (August 14, 2012) in Boston.

Thanks for following along, and let’s go make the web faster.

iOS, Google WiFi and 2 factor auth -- clearly untested UX

2012-06-29T00:56:00.002-04:00

So after WebPerfDays today, a bunch of us ended up at a Pizza place in Mountain View. Naturally the first thing we all did was search for wifi in the area and try to get on to a network from our mobile devices.

Now Mountain View has Google Wifi, and it appears as if they now require you to sign in with your Gmail account, and that's where the problem comes in... for me at least. I have two factor auth turned on for my google accounts, which means that after I type in my username and password, I get to a second screen to enter my second authentication token. This token comes from an app on my iOS device... the same device I was trying to log in with.

I switched to the app to get the token number, but as soon as I did that, iOS decided that I didn't actually want to sign in to the wireless network, and disassociated itself from the Access Point (AP).

Once I'd got the number, I switched back to the settings app and it initiated login again, which means I had to enter my username and password again, and by the time I'd reached the token screen, the token had expired.

This is what the token screen looks like:

It was rather annoying.

It then hit me that I could copy the token to the clipboard, and then paste it into the token text field, which should shave a few seconds off and maybe let me through.

That worked, but it was still annoying.

We started talking about how this interface could be improved. There are a few reasons why this is a problem, and I think they're mostly Apple's fault.

When you connect to a wireless network, iOS attempts to connect to www.apple.com. If it gets redirected somewhere else, it assumes that it's being asked to authenticate, and displays whatever page it gets redirected to in a browser like window.

The problem is that if you do anything other than interact with the content in this window, iOS treats it exactly the same as hitting the "Cancel" button (top right of the screenshot), terminates the login and dissociates from the AP.

This means that you cannot switch to the Authenticator App (second app at the bottom of the screenshot) to get your token.

Can Apple fix this?

Yes, just don't cancel sign in unless I explicitly click cancel

Can Google fix this?

Maybe, if they could provide a link or something that would open the Authenticator app right from that page and let me pull the number out of it (I don't know enough about iOS to know if this is possible).

Do any Apple/Google engineers want to take this up?

Password reset over HTTP -- Part 3

2012-06-09T14:17:00.002-04:00

It's been a while since my last two posts on the topic. This time it's Groupon.

The password reset page is over HTTP:

The reset password email that you receive contains a link that looks like this:

http://groupon.com/users/password_reset/{token}?utm_source=password_reset \
    &utm_medium=email&sid={sid}&user={uid}&date={YYYYmmdd}

This link does a 301 to itself and then a 302 to a HTTPS version of itself.

The good thing is that your new password is sent over SSL. The bad thing is that your reset token is sent in clear text.

Update: This issue has been fixed by Groupon a couple of hours after reporting it.

Saving a PDF that doesn't allow saving form contents

2012-01-27T17:36:00.000-05:00

Several organisations, consulates, for example, have forms that need to be filled up in PDFs. They have very smart PDFs that change as you fill out the form and generate nice 2D bar codes at the end with all the information easily scannable when you submit the form. Most of these forms can be saved after you've filled them out, which is important if it's complex and you need to work on it for a few days, or if you need to put it on a pen drive and take it somewhere else to print.

Every now and then however, I've come across a PDF that won't allow you to save the form contents. This kinda sucks, so I decided to find a work around.

I first tried the print to PDF option, however Adobe won't let you print these particular PDFs to a PDF.

I tried Preview, but you can't fill out the form in Preview.

Then I tried actually printing it to a dummy printer. Note that this is for MacOSX.

The first step is to open your printer settings. If you don't have a printer create one:
Then open the Print Queue and pause it
Step 3, is to print your document to this printer.
Now look into /private/var/spool/cups for a file that was created within the last few minutes. It should start with d and have a lot of numbers after it. You'll need sudo:
```
sudo ls -l /private/var/spool/cups
```
copy this file somewhere convenient, and give it a .ps extension (it's a postscript file).
You can now open this file with Preview. I recommend opening it and reordering the pages, and then save it as a PDF.

That's it. There's a way to do it for windows too, but I don't have a windows box handy to document it. Linux is trivial.

Password reset over HTTP -- Part 2

2012-01-25T12:21:00.000-05:00

So it looks like I've been forgetting a lot of my passwords recently. After yesterday's issue with delicious submitting passwords in the clear, today I have a problem with livemocha.com.

As before, their login page is properly secured, but the password reset page is over HTTP:

This is the password reset page:

And this is the URL the passwords are POSTed to, in clear text:

They also include third party code on their page, in this case it's a flash object from userplane.com, google analytics, and some JavaScript from pbc.com (alias for paybycash.com)

I've gotten in touch with them via their online form. Let's hope they respond.

The other side of the moon

Ansible: Extracting multiple attributes from a list of dicts

1. Using loop

2. As a one liner

map("dict2items")

map("selectattr", "key", "in", ["id", "type"])

map("items2dict")

Fixing a system without enough RAM for a text editor

On Migrating Character Encodings

When users interact

Exploring the chart

Event labels

Data distributions

Insights from the data

The elephant in the room

Conclusions

References

Glossary on Mozilla Developer Network

Web Vitals on Google's Web.Dev

Implementations in mPulse

Other useful links

Uploading a file using SFTP through Julia's LibCURL

The problem

tl; dr

The details

Complete example

The metrics game

Core Web Vitals

My background in performance

Denial

The Wall of Shame

Faking it

Measuring Business Impact

Ungameable Metrics

User frustration & CrUX

Safely passing secrets to a RUN command in a Dockerfile

Step 0: Make sure you have docker > 18.09

Developing the Dockerfile:

Step 1: Specify the Dockerfile syntax

Step 2: Mount a secret file where you need it

Running docker build

Step 3: Set the environment variable

Step 4: Run docker build with your secret file

Jenkins

References

Recovering from Big Sur upgrade snafu

My environment

Prior discussion

Single User Mode

Password worked for Single User Mode

Mount the Data partition

Transferring large files

No SD Cards in Single User Mode

Reformat the SD Card

Understanding Emotion for Happy Users

How does your site make your users feel?

Introduction

The way things are…

Switching Hats

Delight & Frustration

The Link Between Latency & Frustration

Cognitive Bias

Perceptual Dissonance

Survivorship Bias

Negativity Bias

Active Listening

Measuring Emotion

Affective Computing

Ask the user

Business Outcomes

Behavioural analysis

Rage, Frustration & Confusion

Rage Clicks

Wild Mouse

Scrandom

Backtracking

Tie this into the Page Load Timeline

Correlating Rage with Loading Events

Frustration Index

Correlating Rage & Frustration

1. Using `loop`

`map("dict2items")`

`map("selectattr", "key", "in", ["id", "type"])`

`map("items2dict")`

Running `docker build`

Step 4: Run `docker build` with your secret file

LD₅₀