Cheyne Wallace

How To Use Web Sockets (Socket IO) With Digital Ocean Load Balancers And Kubernetes With Ingress Nginx

Cheyne — Thu, 13 Feb 2020 07:44:49 +0000

Web sockets are awesome, although it’s not exactly new technology. You would assume that there is a wealth of information out there on the internet covering every possible use case you can imagine however recently I came across a problem that I had assumed would have a heavily documented solution online, but as it turns out, I found very little.

The problem I was trying to solve was running a multi server, web socket application (using Socket IO), within Kubernetes on Digital Oceans hosted K8S solution with a Digital Ocean load balancer attached to an Nginx Ingress controller. (That’s ingress-nginx, not nginx’s ingress controller)

This should be fine.. right? Well, there are some tricky gotchas here as I soon discovered that caught me out. I wasted hours tweaking configuration, so i’m writing this blog post so you don’t have to do the same.

First, the problem and a shameless plug.
We run a dashboard management software product called VuePilot

One of the core pieces of functionality VuePilot offers is the ability to remotely manage and control your dashboards and TV screens mounted around your office.

We offer a centralised dashboard that users can use to start, stop, update and configure the dashboard screens in your organisation. Basically allowing you to be lazy and not get out of your seat to update the dashboard screens at the other end of the building.

So, when you click that button to take over a screen and display some new dashboard on it, that happens by way of web sockets. Machines are always online and available for commands from the user at any time.

More info
How To Remotely Manage Office Dashboard Screens
How To Manage Multiple Dashboard Screens From One Machine

Recently we’ve seen a large uptick in users and have decided to break apart the app and move everything into Kubernetes to offer greater control over our scaling.
For example, the service that handles this remote management behaviour lives as its own service now that purely just handles web socket connections.

So, back to the point of the article, rather than offer a “step by step” guide to setting up load balancers and Kubernetes on Digital Ocean (which would be long) I’m just going to run over the sticking points that you will likely hit when you attempt to do it yourself.

I’d like to point out that if anyone feels like correcting me or pointing out another solution on any of these points, please do so, however this is what worked for me.

Assuming you’ve got your cluster up and running and you’ve configured your ingress-nginx controller of type “LoadBalancer” which has created your Digital Ocean load balancer, what should you be aware of?

Use HTTP & HTTPS, Not TCP As Load Balancer Protocols

This one confused me for a while, and I still don’t quite understand why, here’s what I found.

When Kubernetes provisions the load balancer for you, by default the protocol will be set to TCP with the relevant ports (most likely 80 and 443) being routed to the random ports Kubernetes has assigned to the service.

I was unable to get anything working at all with TCP set as the protocol, which is a contrast from how an AWS ELB works whereby it always lists TCP as the protocol and works fine.

Switching to HTTP and HTTPS solved this for me. I suspect playing around with “proxy protocol” may also solve this, but I wasn’t able to get it working for my use case.

Using the HTTPS protocol on the load balancer has the added benefit (if you wish) of offloading TLS / SSL termination at the load balancer level which is not possible when using TCP as the load balancer protocol. You can of course use the SSL Passthrough option if you wish to terminate SSL at the pod level.

You can also do this from the user interface, but you should configure this in your Kubernetes manifests to ensure you can restore this configuration if need be.

Configuring HTTPS

Within your ingress controller service configuration, setting the

service.beta.kubernetes.io/do-loadbalancer-certificate-id

annotation will automatically switch your TCP 443 routing to be HTTPS 443 by supplying the ID of the certificate you want to use (from Digital Oceans certificate manager).

You can create this certificate from the Digital Ocean dashboard under Account > Security. Annoyingly you need to use doctl with the command

doctl compute certificate list

to get the certificate ID as its not visible in the dashboard for some reason.

Configuring HTTP

Within your ingress controller service configuration, setting the

service.beta.kubernetes.io/do-loadbalancer-protocol: http

annotation will switch your TCP 80 routing to be HTTP 80

Here’s an example ingress service config

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: http
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
    # Use "doctl compute certificate list" to get this ID
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: “xxxx-xxxxx-xxxxx”

Side note: You do not need to change the TCP settings at the service level for your ingress controller. Mine still looks like

ports:
  - name: http
    containerPort: 80
    protocol: TCP
  - name: https
    containerPort: 443
    protocol: TCP

Long Life Certificates For CloudFlare Users

Digital Ocean offers the ability to generate LetsEncrypt certificates for you, providing you host your apps DNS records with them, which is awesome, unless you are using a service like CloudFlare, in which case it sucks because your DNS will be hosted at CloudFlare.

CloudFlare already provides me with managed certificates and well, I’m not going back to managing them myself so how to get around this?

We only need to secure the transmission between CloudFlare and the DO load balancer, the end user will only ever see the CloudFlare certificate so really, we just need a valid cert for secure transport that could even be self signed. Self signed certs however, just don’t feel right, CloudFlare has another option we can use

It’s not completely managed, but CloudFlares Origin CA certificates allow us to generate a certificate, signed by CloudFlare for our domains, with a 15 year expiry which we can then add to Digital Ocean and assign to our load balancer. If you add another domain you will need to regenerate the certificate to include it but this is a pretty simple task.

More info here
https://blog.cloudflare.com/cloudflare-ca-encryption-origin/

Multi Server “Session ID unknown” Disconnect Errors / Broken Long Polling

Chances are, you’ll want to run more than one pod serving your Socket IO servers, of which you’ll like use something like Redis as a PubSub backend for communication between the pods. When you do, you’re going to hit one ugly problem.

Your server logs will show your clients repeatedly connecting and disconnecting every few seconds and your client console will be blasted with “Session ID unknown” errors.

“What the hell?!” I can hear you say.. Well, that’s the nice version of what I was saying.

This is a result of multiple levels of load balancing at both the DO Load Balancer and the Kubernetes service level balancing you onto different pods.

Socket IO will start by long polling the endpoint, then send a HTTP 101 (Switching Protocols) to “Upgrade” your connection to web sockets. The problem here is that the follow up request doesn’t land on the same pod and so … “Session ID unknown”

There are two ways to solve this

Solution 1: Use Session Affinity

Session affinity essentially means sticky sessions, which basically means, any follow up requests from the same user will be routed to the same pod.

This will allow you to keep using the Long Poll > Upgrade to WS default method for Socket IO.

Just be aware that heavy traffic users making many requests will not be load balanced to other pods, which is fine for web sockets but if you are serving other static content and API requests from the same app, individual users requests will not be spread across your pods which may screw a little with your load balancing strategy.

To this, you simply set the “affinity” annotation at the ingress level. This will set a “route” cookie which contains a hash which nginx remembers that is used to route follow up requests to the same upstream pod.

Here’s an example ingress definition

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"

Solution 2: Disable Long Polling

You can disable the long polling altogether and go straight to web sockets which prevents this issue whilst not interfering with the natural load balancing. You’ll want to be sure that your users browser will be fine with this before enabling it, as the long poll upgrade is a fairly nice feature of Socket IO and offers a fall back incase of web socket failure.

To do remove long polling and force only web sockets, simply set the transports property in your client to “websocket”

const ioSocket = io('https://ws.myapp.com', {
  transports: [‘websocket’],
});

Excessive Client Reconnects

By default our clients will reconnect every 60 seconds as per the default nginx “proxy-read-timeout” configuration. This is pretty excessive so let’s make this something longer, like an hour (3600 seconds).

Again, we can configure this in the ingress annotation definition

 apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"

Secure Web Sockets (WSS)

With above protocol changes in place and the load balancer terminating TLS on 443 with our new certificate, we can now force upgrades to WSS instead of WS connections which will be encrypted all the way up to the entry point to our cluster. As mentioned above, you can also use SSL Passthrough if you want to terminate at the pod level.

An example of how you can force WSS from your client side code

const ioSocket = io("https://ws.myapp.com", {
  secure: true,
});

The usage of HTTPS in the URL will also tell Socket IO to upgrade to secure transmission.

Redirecting HTTP To HTTPS

Rather than do this at the app level we can do this at the load balancer level by setting the

do-loadbalancer-redirect-http-to-https

annotation to true in our ingress controller service definition.

Example

   
apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
labels:
  app.kubernetes.io/name: ingress-nginx
  app.kubernetes.io/part-of: ingress-nginx
annotations:
  service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: “true”

Enable CORS

Depending on your application you’ll possibly want to enable CORS (Cross Origin Resource Sharing) on your ingress to allow clients to connect from other domains.

Again, this is done at the ingress resource level with annotations, here’s an example CORS configuration that essentially opens the ingress to all origins.

apiVersion: extensions/v1beta1
  kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
annotations:
  nginx.ingress.kubernetes.io/enable-cors: “true”
  nginx.ingress.kubernetes.io/cors-allow-methods: “PUT, GET, POST, OPTIONS”
  nginx.ingress.kubernetes.io/cors-allow-credentials: “true”
  nginx.ingress.kubernetes.io/configuration-snippet: |
  more_set_headers “Access-Control-Allow-Origin: $http_origin”;

The Final Configuration

Most of what’s been mentioned above happens in the ingress controller service and ingress resource definitions.
Here’s what the final configuration files look like

Ingress Resource

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    #kubernetes.io/ingress.class: nginx-general
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-methods: "PUT, GET, POST, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "Access-Control-Allow-Origin: $http_origin";
spec:
  rules:
    - host: ws.vuepilot.com
      http:
        paths:
          - path: /
            backend:
              serviceName: vuepilot-node
              servicePort: 8080

Ingress Controller Service

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: http
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
    # Use "doctl compute certificate list" to get this ID
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: “xxx-xxx-xxx”
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 80
    - name: https
      port: 443
      targetPort: 80
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

Here’s hoping these tips save you some time and hassle

Deduplicating Large Data With Rails

Cheyne — Thu, 13 Apr 2017 01:51:47 +0000

Storing and retrieving large chunks of data from your database can be tricky if it’s not done correctly. What happens when you want to store a relatively large document or body of text in your database but that large chunk of text is likely to be identical for thousands of new records? You’re faced with the possibility of storing gigabytes of duplicated data that really doesn’t need to be there.

At Vue Pilot, we had this problem a lot. Thousands of duplicate records, with unique IDs but identical content just occupying disk space. It was pretty clear this wasn’t an optimal solution.

Here’s the scenario, say you have a piece of software that generates a log file on installation with some key information you would like to track and archive. You push this log file back to your server via an API endpoint and store it in the database. The log file is roughly 100KB in file size, for every 100,000 installs that’s coming close to 10GB in storage.

The problem is 80% of these log files are identical, they contain the same information but we treat and store them as though they were unique, wasting disk space and blowing out our table size and tuple count with an unnecessary number of rows.

We all know disk space is cheap, but processing power isn’t, large tables cause large indexes which in turn cause slower queries, require more CPU cycles and mean you need to be more careful about writing your future SQL queries. If you query on a non-indexed field you can trigger a full table scan and your app takes a big performance hit in the process.

People love to say “storage is cheap! store everything!” but moving, loading and managing 100GB back up files is lot more difficult and time consuming than 20GB backup files, so it’s worth optimizing where possible.

So how do we optimize this? We’re going to create three tables, The user table, one for the log files and one that acts as a join table between the users and logs. Instead of having a direct relation between the Log and the User, there will be a relation from User to the UserLog and from the UserLog to the Log.

Instead of just saving the log file when we receive it, we will instead hash the contents and perform a look up on this hash in the database to see if we have already seen this exact block of data before. If we get a match on the hash we will instead use the id of that record and throw away the data. This means that if 1000 people generate the exact same log output, we will still only store it once.

Lets have a look at how to set this up

class CreateLog < ActiveRecord::Migration
  def change
    create_table :log do |t|
      t.string :checksum
      t.text :data
    end

    create_table :user_log do |t|
      t.integer :log_id
      t.integer :user_id
    end

    create_table :users do |t|
      t.string :name
    end

    add_index :logs, :checksum
    add_index :user_logs, :user_id
    add_index :user_logs, :log_id
  end
end

Ok so we have the base tables we need to deduplicate our data. Now lets see how we would store the log files. (for the sake of a cleaner read, i’m going to omit all the usual boilerplate code you would find in a normal application like validation and authentication and get to the point)

Assuming that we are receiving the logs from an API endpoint somewhere and that we just want to store them for later inspection and return a status code, here is a simple way we could implement it.

class LogController < ApplicationController
  def create
    log = Log.store(params[:log_data])
    user_log = UserLog.create(log_id: log.id, user_id: current_user.id)
    render status: 201
  end
end

class User < ActiveRecord::Base
  has_many :logs, through: :user_logs
  has_many :user_logs
end

class UserLog < ActiveRecord::Base
  belongs_to :user
  belongs_to :log
end

class Log < ActiveRecord::Base
  has_many :user_logs

  def store(data)
    key = Digest::MD5.hexdigest(data)
    log = Log.find_by_checksum(key)
    if log.nil?
      log = Log.new(data: data, checksum: key)
      Log.transaction(requires_new: true) do
        begin
          log.save!
        rescue ActiveRecord::RecordNotUnique => e
          raise ActiveRecord::Rollback
        end
      end
    end
    log
  end
end

It’s as simple as that. The Log.store method simply uses Digest::MD5.hexdigest to calculate an MD5 checksum of the log data to which on the next line we then attempt to look up if the record exists. If it does not, we create a new log file and save it, being sure to wrap the save operation in a transaction just incase another log file with the same content is being stored at the same time.

The User model has a through: :user_logs relation to the Log model which means when you call User.logs it performs log look up via the smaller join table which just consists of ids in order to find relevant logs. This essentially resolves to a query that looks something like

SELECT * FROM logs 
INNER JOIN user_logs ON logs.id = user_logs.log_id 
WHERE user_logs.user_id =

As you can see from the following console output, trying to store the same data repeatedly will result in the same record being returned and not repeatedly stored.

2.3.1 :004 > log = "some log file text here"
2.3.1 :005 > Log.store(log).id
 => 4 
2.3.1 :007 > Log.store(log).id
 => 4 
2.3.1 :008 > Log.store(log).id
 => 4 
2.3.1 :009 > Log.store(log).id
 => 4 
2.3.1 :010 > new_log = "something else"
2.3.1 :011 > Log.store(new_log).id
 => 5

Uploading To S3 With AngularJS And Pre-Signed URLs

Cheyne — Thu, 14 Jul 2016 23:10:57 +0000

This is a revised post based on the popular original article found at Uploading To S3 With AngularJS
The content is similar other than a few key steps which have been removed/altered but for the sake of clean reading, I thought I would create a new post for it.

Scenario

The scenario we’re going to build for here will be to upload a file (of any size) directly to AWS S3 into a temporary bucket that we will access using Pre-Signed URLS
The purpose of this front end application will be to get files into AWS S3, using JavaScript and some basic backend code.

A good example is having a user upload a file from a web form for which your application server will then pull back down, encrypt or resize before pushing it back into a more permanent bucket for storage.

By uploading directly to S3 we will be taking load off our application server by not keeping long running connections open while slow clients upload large files. A problem which is especially visible when using services like Heroku.

Step 1: Add the file directive

The file directive simply takes the attributes from a file input type and binds it to the $scope.file object so you can easily work with the filename, file size etc from your controllers.

As soon as you select a file, you can access $scope.file.name or $scope.file.size to get the filename and size for handling client side validation and unique S3 object names.

This means you can validate the file size from the browser with something simple like;

if($scope.file.size > 10585760) {
  alert('Sorry, file size must be under 10MB');
  return false;
}

Go ahead and include the following directive in your project;

directives.directive('file', function() {
  return {
    restrict: 'AE',
    scope: {
      file: '@'
    },
    link: function(scope, el, attrs){
      el.bind('change', function(event){
        var files = event.target.files;
        var file = files[0];
        scope.file = file;
        scope.$parent.file = file;
        scope.$apply();
      });
    }
  };
});

In your HTML you will include the file element as follows

Step 2: Configure CORS And Expiry On The Bucket

CORS or “Cross Origin Resource Sharing” allows us to restrict the operations that can be performed on a bucket to a specific domain, like your websites domain. Typically a CORS Ajax request will first initiate an OPTIONS HTTP request to the server which will return the allowed options for that endpoint before the real Ajax request actually happens. Think of it like an access request, the server will inspect where the request originated from and return a set of allowed options (or none) for that origin.

Don’t worry, you won’t have to make that request your self, Angular will handle all of that for you, but it’s good to have a basic understanding of whats happening during the lifetime of the request.

Add The CORS Policy
From Your AWS console, under S3 click into your bucket then click the Properties button. There you will see a “Add CORS Configuration” button. It’s here that you’ll configure your bucket to only allow PUT requests from particular origins.

You can use the following sample config – just edit to reflect your development, production and staging environments.



    
        http://localhost:3000
        https://www.yourdomain.com
        http://staging.yourdomain.com
        PUT
        3000
        x-amz-server-side-encryption
        x-amz-request-id
        x-amz-id-2
        *

It’s a good idea to split these into other buckets, but for simplicity we’ll just use the one bucket.

Configure Object Expiry
It’s a good idea to expire the objects in this bucket after some short period to prevent people from just uploading huge objects to screw with you. Your server side code should handle moving and deleting valid files so you can assume those that are left after 24 hours are not meant to be there.

From your S3 console, view a bucket and then click Properties, expand the “Lifecycle Rules” section and follow the prompts. Set the action to “Permanently Delete Only” and set it for 1 day which will delete any objects in the bucket that are older than 1 day permanently.

Now you’re ready to lay down some code.

Step 3: Generate The Pre-Signed URL And Upload

This is a two step process and assumes you have already configured the AWS SDK credentials on what ever backend framework your using.
First we create a simple function on the server side that generates the URL based on the filename and file type, then we pass that back to the front end for it to push the object to S3 using the Pre-Signed URL as a destination.

Generate The URL
On the server side you will need to be using the AWS SDK. We’ll use Ruby for this example. Create a controller action to generate and return the presigned URL as follows

  def presigned
    if params[:filename] && params[:type]
      s3 = AWS::S3.new
      obj = s3.buckets[YOUR_TEMP_BUCKET].objects[params[:filename]]
      url = obj.url_for(:write, :content_type => params[:type], :expires => 10*60)  # Expires 10 Minutes
      render :json => {:url => url.to_s}
    else
      render :json => {:error => 'Invalid Params'}
    end
  end

The PreSigned URL is dynamic and includes details about the object name, bucket, signature and expiration details. Its important to note that the file name and the content type are included as part of the signature so you need to include them in the front end request to S3 exactly as you did to generate the URL otherwise you’ll get an invalid signature response.

Now in your Angular controller, create the upload function

$scope.upload = function(file) {
  // Get The PreSigned URL
  $http.post('/presigned'),{ filename: file.name, type: file.type })
    .success(function(resp) {
      // Perform The Push To S3
      $http.put(resp.url, file, {headers: {'Content-Type': file.type}})
        .success(function(resp) {
          //Finally, We're done
          alert('Upload Done!')
        })
        .error(function(resp) {
          alert("An Error Occurred Attaching Your File");
        });
    })
    .error(function(resp) {
      alert("An Error Occurred Attaching Your File");
    });
}

The $scope.upload method here could be broken out into a service or factory to clean things up a little, but you could also just drop this method into your controller and with a few minor tweaks be up and running.

Step 4: Processing The Upload

This step is really going to depend on what you want to do with the file and is going to vary depending on your application server but, generally speaking, now all you need is the path to the file in S3 (which is basically just the bucket name plus the object name) and you can pull the file down, process it and push it back to another location from the server in some sort of background process that doesn’t tie up the front end.

As an example, in Ruby On Rails, using the AWS SDK for Ruby you could pull the file down, transform it and push it back up to another bucket with something like this

# Get The Temporary Upload
s3 = AWS::S3.new
temp_obj = s3.buckets['YOUR_TEMP_BUCKET'].objects[params[:uploaded_file]]

begin
  # Read File Size From S3 For Server Size Validation
  size = temp_obj.content_length

  # Assign A Local Temp File
  local_file = "#{Rails.root}/tmp/#{params[:uploaded_file]}"

  # Read In File From S3 To Local Path
  File.open(local_file, 'wb') do |file|
  	temp_obj.read do |chunk|
  		file.write(chunk)
  	end
  end

  #########################################
  # Perform Some Local Transformation Here
  #########################################

  # Now Write Back The Transformed File
  perm_obj = s3.buckets['YOUR_PERMANENT_BUCKET'].objects[params[:uploaded_file]]
  perm_obj.write(File.open(local_file))

  rescue StandardError => exception
    # That's A Fail
  ensure
    # Delete The Original File
    temp_obj.delete
  end
end

Summary

We’ve seen now how we can upload files directly to AWS S3 using JavaScript. It may seem like a lot of work from the first few steps in this article, but they are necessary in order to prevent people abusing your S3 bucket and your app.

I’v been using this technique for a while now and it’s been pretty solid. It certainly solved my issue with Heroku H12 timeouts which was causing me endless headaches.

Have any suggestions how to improve on this technique? Let me know in the comments section below

Uploading To S3 With AngularJS

Cheyne — Fri, 15 Aug 2014 04:45:58 +0000

A little while back I found myself needing to handle file uploads without touching the application server. This is a common scenario for people using Heroku as they limit all requests to 30 seconds, which in theory sounds fine for most requests, but destroys any chance you have at directly handling file uploads within your application.

I restricted my allowed attachment size down to 2MB but still there were people in more remote parts of the world who were hitting that 30 second timeout. Heroku’s response is to simply not upload to the app server and instead go direct to something like AWS S3.

This got me thinking – can I upload a large file to Amazon S3 using the AWS-JS-SDK with JavaScript and work it into my existing AngularJS application?

As it turns out, yes you can and just in case you want to jump ahead and skip the blog post, I’ve put up a sample application to demonstrate it along with the full source code here;

See A Live Demo Here

http://cheynewallace.github.io/angular-s3-upload/

Full Source Code And Sample Project Here

https://github.com/cheynewallace/angular-s3-upload

UPDATE – July 14th 2016

This is been a very popular article for several years now, how ever many people have asked how to avoid using the restricted public keys method shown in this article. I have created a new post based on this original one that explains how to use Pre-Signed URLs instead.

If you are creating a new public web application you should use this new post as a guide and consider this original one deprecated

Please see
Uploading To S3 With AngularJS and Pre-Signed URLs

Scenario

The scenario we’re going to build for here will be to upload a file (of any size) directly to AWS S3 into a temporary bucket that we will access using a restricted and public IAM account.
The purpose of this front end application will be to get files into AWS S3, using only JavaScript libraries from our browser.
We can then kick off a background process with our application server to process these uploaded files at a later time.

A good example of this is having a user upload a file from a web form for which your application server will then pull back down, encrypt or resize before pushing it back into a more permanent bucket for storage.

Step 1: Add The AWS JS SDK To Your Project

This is Amazons JavaScript SDK and it won’t take long before you notice the file size of this library. It’s pretty obvious that this is more of a backend NodeJS library than a browser JS library, even though they have a “browser” version, it still weighs in at about 230KB.

The easiest way to get this library is to simply “bower install aws-sdk-js” and you’ll get the latest “browser” version.

If you’re already feeling the weight from too many libraries on your app, you can clone the NodeJS repo and compile it your self which I found saved me about 50KB in file size.

More info can be found on the AWS Docs under Compiling The AWS SDK

You can compile and minify just the S3 component of the SDK from the NodeJS modules using the following command;

MINIFY=1 node dist-tools/browser-builder.js s3 > aws-sdk.min.js

You can also just use my compiled custom version found here

Step 2: Add the file directive

The file directive simply takes the attributes from a file input type and binds it to the $scope.file object so you can easily work with the filename, file size etc from your controllers.

As soon as you select a file, you can access $scope.file.name or $scope.file.size to get the filename and size for handling client side validation and unique S3 object names.

This means you can validate the file size from the browser with something simple like;

if($scope.file.size > 10585760) {
  alert('Sorry, file size must be under 10MB');
  return false;
}

Go ahead and include the following directive in your project;

directives.directive('file', function() {
  return {
    restrict: 'AE',
    scope: {
      file: '@'
    },
    link: function(scope, el, attrs){
      el.bind('change', function(event){
        var files = event.target.files;
        var file = files[0];
        scope.file = file;
        scope.$parent.file = file;
        scope.$apply();
      });
    }
  };
});

Step 3: Setup The AWS Credentials

You’re probably wondering how to lock down this new upload functionality considering it’s all JavaScript. There is a few ways you can do this, those being by use of Pre-Signed URL’s or by creating a “public” IAM account.

The use of Pre-Signed URL’s involves making a call to your application server first and retrieving a “pre signed” location for the upload which your server and AWS negotiate on the fly. To keep things simple, we’re not going to use this method today.

The second method, which is the one we’re going to use today is by using a public IAM account, by that I mean a regular IAM account that is heavily restricted to do only 1 particular function, in this case, it will only have permission to PUT files into a particular AWS Bucket and nothing else.

This users API key will be public, so anyone will be able to upload to your bucket if they use this key, which is why we will want to configure the bucket to expire all objects within 24 hours, so even if someone did try and upload a 10 Gigabyte file to screw with you, it would only sit there for a few hours. We will also configure CORS which will prevent people uploading content from anywhere other than your website (more on that later)

Once you upload a file to this temporary bucket from your application, you will want to ping your application server with details of the new file and move it into a new permanent bucket. It’s right here where you will be able to perform any transformations, encryption, resizing or processing.

Create The User
Go into your AWS console and visit the “Security Credentials” section. Create a new user and call it something like “app_public”. Make sure you download the key information when it is presented, this is what we’ll be feeding into our app later to upload with.

Under the permissions section, click “attach a new policy“, then select the policy generator.
Select Amazon S3 as the service and only select the PutObject action from the drop down list.

The ARN is an Amazon Resource Name. This is going to look like;

arn:aws:s3:::your_bucket_name

Click “add statement”, then save and apply policy. Now your user has write-only access to the bucket.

Your policy is going to look something like this;

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt126637111000",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::your_bucket_name"
      ]
    }
  ]
}

Step 4: Configure CORS And Expiry On The Bucket

You can use the following sample config – just edit to reflect your development, production and staging environments.



    
        http://localhost:3000
        https://www.yourdomain.com
        http://staging.yourdomain.com
        PUT
        3000
        x-amz-server-side-encryption
        x-amz-request-id
        x-amz-id-2
        *

It’s a good idea to split these into other buckets, but for simplicity we’ll just use the one bucket.

Now you’re ready to lay down some code.

Step 5: Write Some Angular

It’s worth mentioning again that there is a complete sample application available here that you can pull down and play around with or a live demo application here that you can try out. Simply provide the bucket name, access key and secret access key in the form to upload.

So, if we’ve configured everything mentioned above correctly, we’re ready to see this in action. The following snippet is a simplified version of the sample application I mentioned earlier. It does the basic operations we need to upload a file to S3, which includes:

Configure the AWS S3 credentials and bucket object
Check to make sure a file is selected
PUT the object into the S3 bucket whilst displaying progress information to the console
Alert with any errors or config issues.

Here’s what the method in your controller is going to look like

$scope.creds = {
  bucket: 'your_bucket',
  access_key: 'your_access_key',
  secret_key: 'your_secret_key'
}

$scope.upload = function() {
  // Configure The S3 Object 
  AWS.config.update({ accessKeyId: $scope.creds.access_key, secretAccessKey: $scope.creds.secret_key });
  AWS.config.region = 'us-east-1';
  var bucket = new AWS.S3({ params: { Bucket: $scope.creds.bucket } });

  if($scope.file) {
    var params = { Key: $scope.file.name, ContentType: $scope.file.type, Body: $scope.file, ServerSideEncryption: 'AES256' };

    bucket.putObject(params, function(err, data) {
      if(err) {
        // There Was An Error With Your S3 Config
        alert(err.message);
        return false;
      }
      else {
        // Success!
        alert('Upload Done');
      }
    })
    .on('httpUploadProgress',function(progress) {
          // Log Progress Information
          console.log(Math.round(progress.loaded / progress.total * 100) + '% done');
        });
  }
  else {
    // No File Selected
    alert('No File Selected');
  }
}

and the file input element using the file directive

The upload is broken down into parts and uploaded one piece at a time.
The ‘httpUploadProgress’ event is fired after each time a part has finished uploading which is where we can update our progress bars or percentage counters for a more aesthetic UI / UX.
In the snippet above we’re simply logging this to the console, but in the sample application I mentioned earlier i’v used a bootstrap progress bar to indicate the overall progress.

Adding Folders
If you want to arrange the uploads into folders, you can do this by simply adding the folder name to the end of the bucket name. So for example, setting $scope.creds.bucket to “MYBUCKET/user1” would upload the file into the MYBUCKET bucket under the folder user1.

putObject Configuration
The putObject params object can hold a lot more configuration than what is shown in this example. Setting content expiry, content type and ACL information is just a few examples of what can be done by adding attributes to the params object. You can read more about configuring these here

A more comprehensive version of the above code, including file size validation, unique file names and proper notifications can be seen here: https://github.com/cheynewallace/angular-s3-upload/blob/master/js/controllers.js

Step 6: Processing The Upload

I won’t expand on this too much as this article was more focused on uploading using purely JavaScript but you can see how we’re able to POST details of this files new location in S3 fairly easily to the server.

$scope.s3_path = $scope.creds.bucket + '/' + $scope.file.name;
// mybucket/document.pdf

As an example, in Ruby On Rails, using the AWS SDK for Ruby you could pull the file down, transform it and push it back up to another bucket with something like this

# Get The Temporary Upload
s3 = AWS::S3.new
temp_obj = s3.buckets['YOUR_TEMP_BUCKET'].objects[params[:uploaded_file]]

begin
  # Read File Size From S3 For Server Size Validation
  size = temp_obj.content_length

  # Assign A Local Temp File
  local_file = "#{Rails.root}/tmp/#{params[:uploaded_file]}"

  # Read In File From S3 To Local Path
  File.open(local_file, 'wb') do |file|
  	temp_obj.read do |chunk|
  		file.write(chunk)
  	end
  end

  #########################################
  # Perform Some Local Transformation Here
  #########################################

  # Now Write Back The Transformed File
  perm_obj = s3.buckets['YOUR_PERMANENT_BUCKET'].objects[params[:uploaded_file]]
  perm_obj.write(File.open(local_file))

  rescue StandardError => exception
    # That's A Fail
  ensure
    # Delete The Original File
    temp_obj.delete
  end
end

Summary

We’ve seen now how we can upload files directly to AWS S3 using only JavaScript. It may seem like a lot of work from the first few steps in this article, but they are necessary in order to prevent people abusing your S3 bucket and your app so I would avoid creating an open bucket or being lazy with the IAM policy.

I’v been using this technique for a while now and it’s been pretty solid. It certainly solved my issue with Heroku H12 timeouts which was causing me endless headaches.

Have any suggestions how to improve on this technique? Let me know in the comments section below

Simple Element Toggling With AngularJS

Cheyne — Tue, 10 Jun 2014 01:24:33 +0000

Toggling UI elements is a chore that nobody enjoys. Back when your web apps were primarily jQuery or raw JavaScript you would need to wire up an event handler, catch the click event, determine the current visibility state and switch it. A fairly small job, but still an annoying one when you have a page full of these elements.

With Angular you can accomplish this without writing any code, simply by using two HTML attributes.

Example, say I want to hide an input box, and have it appear when I click a button, then hide when I click it again.

Toggle Input

And that’s all there is to it.

The text input is only being shown when the $scope.showEmailInput variable evaluates to true, which of course it doesn’t because it doesn’t exist yet, so it is hidden.

Once you click the Toggle Email Input button it simply sets the $scope.showEmailInput variable to be the inverse of what it currently evaluates to which means it will now become true and the variable will actually exist on the scope now.

This is a super clean way to toggle the visibility of elements without requiring any plumbing code, events or variables to be manually configured on your scope.

Resend Devise Confirmation Emails For Incomplete Accounts

Cheyne — Fri, 20 Dec 2013 00:37:45 +0000

It’s slightly frustrating looking through your users table and seeing a bunch of accounts that signed up but never confirmed, only existing in this limbo land where you can’t quite count them as a conversion and they seem to have dropped off the face of the earth.

This rake task will look through your users table for unconfirmed devise accounts and one by one resend their confirmation emails.

You can schedule this to run once a day and attempt to reclaim lost users. Here’s how

Create a file named “user.rake” in your Rails project under lib/tasks

Paste the following in

namespace :user do
  task :resend_confirmation => :environment do
    users = User.where('confirmation_token IS NOT NULL')
    users.each do |user|
      user.send_confirmation_instructions
    end
  end
end

Now from the terminal, from within your Rails app path, simply run

rake user:resend_confirmation

or if you’re a Heroku user

heroku run rake user:resend_confirmation

Heroku offers a free scheduler which you can use to schedule the emails to be resent daily. Simply use addon found here: https://addons.heroku.com/scheduler

Using Gravatar With AngularJS

Cheyne — Fri, 06 Dec 2013 22:16:45 +0000

Want to use Gravatar in your AngularJS app?
Use this simple directive to insert Gravatar images in your app.

Before you get started, you will need to make sure you have an MD5 hashed version of your users email address. Gravatar uses this hashed version of the email address in order to determine which avatar to display.

It’s simple in any back end language to generate this hash on page load, or send it down with your JSON model.

In Ruby

Digest::MD5.hexdigest(email_address)

In PHP

md5(email_address}

Now, Include this directive in your AngularJS application

 myApp.directive('gravatar', function() {
  return {
    restrict: 'AE',
    replace: true,
    scope: {
      name: '@',
      height: '@',
      width: '@',
      emailHash: '@'
    },
    link: function(scope, el, attr) {
     scope.defaultImage = 'https://somedomain.com/images/avatar.png';
    },
    template: ''
  };
 });

You can change the “defaultImage” property to link to a default avatar to display when the user has no Gravatar.

Then simply drop the tag with attributes into your HTML

Serving Compressed Assets With Heroku and Rack-Zippy

Cheyne — Sun, 29 Sep 2013 20:38:50 +0000

It’s often a little known fact that Heroku does by default , NOT serve the compressed version of your assets to the client browser.

Often it’s all too easy to get lost in the magic that is Heroku slug compilation and the simplicity of a “git push heroku master” code deploy that we forget the basics of web development and end up serving our clients bloated CSS and JavaScript files, often at ridiculous sizes.

This is especially prominent of late with the rise of the “do it all” frameworks like Bootstrap, Foundation, AngularJS, Backbone JS etc.

Chances are, if you use any of these frameworks, you’re sending hundreds of kilobytes of code down to the browser on every request that is never even used, forcing your users to download the lot each time.

Heroku states clearly in their HTTP Routing docs:

Since requests to Cedar apps are made directly to the application server – not proxied through an HTTP server like nginx – any compression of responses must be done within your application.

So, let’s see what that looks like to the end user.

File: application.css
Size: 70KB
Time: 595ms

Here’s a typical application.css from a Rails app coming down from Heroku. We can see that the file was 70KB coming down the wire and took a total of 595ms. (The top figure is the one we’re interested in)

The fact that the 2 numbers regarding the file size are almost the same means that there is no compression or caching happening.

The top number represents the size of the file downloaded, and the bottom represents the actual size of the file. In this case the file is 70KB and the client had to download the full 70KB.
Consider how big the standard Bootstrap CSS framework is, together with an accompanying JavaScript MVC framework bundled with your own custom CSS and JS and you can begin to see how this can get out of control.

If we browse the public directory on one of our Heroku Dyno’s we can see that the compressed version of our assets are there already, pre generated by the assets pre-compilation on your last deploy, they’re just simply not being served to the browser.

Rack-Zippy

Fortunately, serving these compressed assets is pretty simple with the help of a gem called Rack-Zippy. https://github.com/eliotsykes/rack-zippy

Rack zippy is Rack middleware that serves up these compressed assets instead of the full uncompressed version, dramatically improving client download times.

To quote from the github gem page:

rack-zippy replaces the ActionDispatch::Static middleware used by Rails, which is not capable of serving the gzipped assets created by the rake assets:precompile task. rack-zippy will serve non-gzipped assets where they are not available or not supported by the requesting client.

Installation is simple,

Add to your Gemfile

gem 'rack-zippy'

Then from the command line

bundle install

Add this line to config/application.rb

config.middleware.swap(ActionDispatch::Static, Rack::Zippy::AssetServer)

Push to Heroku and you’re done.
Now let’s take another look at the file sizes

File: application.css
Size: 12.3KB
Time: 180ms

That’s a pretty drastic improvement. From 70KB down to 12.3KB, and this is only taking into account a single CSS file, your JavaScript will also show similar improvements.

So, before you start obsessing over excess code or removing images that enhance the appearance of your website in an attempt to increase client response times, I would first ensure you’re serving up compressed assets, it’s probably the easiest thing you can do that yields a performance gain this high.

Other Options
Another option for serving compressed assets is to modify your config.ru file to use Rack::Deflater.
I have not tested this method, but I hear it’s also another viable option.
More info can be found here: http://www.gaurishsharma.com/2012/04/enable-gzip-compression-for-rails-3-2-on-heroku-cedar.html

Get Active Ports and Associated Process Names In C#

Cheyne — Sun, 28 Jul 2013 18:57:15 +0000

Recently I found myself needing to find a way to determine what open and listening ports along with their associated running processes are currently active on a Windows machine using C#.

Some extensive Googling returned pretty dismal results.

Retrieving a list of open ports is simple.
Retrieving a list of running processes is simple.
A combination of the two is not.
There were a few external libraries, but it just seemed like overkill for something that should be a pretty simple task.

The solution was to just parse the output of a netstat -a -n -o command. The result from this code snip will return a list of “Port” objects which contain process names, port number and protocol as properties.

// ===============================================
// The Method That Parses The NetStat Output
// And Returns A List Of Port Objects
// ===============================================
public static List GetNetStatPorts()
{
  var Ports = new List();
 
  try {
    using (Process p = new Process()) {
 
      ProcessStartInfo ps = new ProcessStartInfo();
      ps.Arguments = "-a -n -o";
      ps.FileName = "netstat.exe";
      ps.UseShellExecute = false;
      ps.WindowStyle = ProcessWindowStyle.Hidden;
      ps.RedirectStandardInput = true;
      ps.RedirectStandardOutput = true;
      ps.RedirectStandardError = true;
 
      p.StartInfo = ps;
      p.Start();
 
      StreamReader stdOutput = p.StandardOutput;
      StreamReader stdError = p.StandardError;
 
      string content = stdOutput.ReadToEnd() + stdError.ReadToEnd();
      string exitStatus = p.ExitCode.ToString();
      
      if (exitStatus != "0") {
        // Command Errored. Handle Here If Need Be
      }
 
      //Get The Rows
      string[] rows = Regex.Split(content, "\r\n");
      foreach (string row in rows) {
        //Split it baby
        string[] tokens = Regex.Split(row, "\\s+");
        if (tokens.Length > 4 && (tokens[1].Equals("UDP") || tokens[1].Equals("TCP"))) {
          string localAddress = Regex.Replace(tokens[2], @"\[(.*?)\]", "1.1.1.1");
          Ports.Add(new Port {
            protocol = localAddress.Contains("1.1.1.1") ? String.Format("{0}v6",tokens[1]) : String.Format("{0}v4",tokens[1]),
            port_number = localAddress.Split(':')[1],
            process_name = tokens[1] == "UDP" ? LookupProcess(Convert.ToInt16(tokens[4])) : LookupProcess(Convert.ToInt16(tokens[5]))
          });
        }
      }
    }
  } 
  catch (Exception ex) 
  { 
    Console.WriteLine(ex.Message)
  }
  return Ports;
}
 
public static string LookupProcess(int pid) 
{
  string procName;
  try { procName = Process.GetProcessById(pid).ProcessName; } 
  catch (Exception) { procName = "-";}
  return procName;
}
 
// ===============================================
// The Port Class We're Going To Create A List Of
// ===============================================
public class Port
{
  public string name
  {
    get
    {
      return string.Format("{0} ({1} port {2})",this.process_name, this.protocol, this.port_number);
    }
    set { }
  }
  public string port_number { get; set; }
  public string process_name { get; set; }
  public string protocol { get; set; }
}

Heroku Postgres Row Limit Email Notifications

Cheyne — Tue, 14 May 2013 16:27:50 +0000

So, you’ve got a side project hosted with Heroku, you only have basic database requirements so the 10,000 row limited Postgres dev plan seems like a good choice and could possibly be a free alternative for a descent amount of time until you need to upgrade.

Heck, at $35 a month per dyno, you need to try and save some money somewhere, right?

Take for example one of my side projects, it’s called NoteShred.com. It allows you to send password protected, encrypted notes to people over the internet with a unique URL and have the note automatically destroy its self after it’s been read.
The beauty of this application is that it is constantly deleting rows from the database every time a note is “shredded“, so a 10,000 row count is actually a descent amount of space for this application.

The problem is that 10,000 isn’t a huge number, and all it takes is one big spike of traffic and you’ve hit the limit.
Heroku provides a basic alerting service that will tell you once you’re at 7,000 records and again once you’ve hit the limit. You will have 24 hours to get your row count back under the limit before you will lose write access to your database. (Thanks to @hgmnz and @ctshryock for pointing this out)
More Info Here.

Unfortunately, This is not configurable and the cut off may come at a bad time, say if you’re on holidays or are not able to dive in and trim back the records before this 24 hour window expires.

The 7,000 row alert is nice how ever not terribly useful for applications like NoteShred where you have a slow creeping database and the difference between 7,000 records and 10,000 records is a large gap in time.
I am most likely going to disregard the 7,000 row alert email because I still have 3,000 left, the next email would be informing me that I have hit the limit and have 24 hours to get it back under control.

A configurable threshold with a series of alerts leading up to the cut off point would be more useful, or even just an alert closer to the limit, say at 9000 rows.

You want to keep an eye on this row count, you can do so using the Rails built in Mailer and a basic Rake task with this simple solution and be alerted when your application reaches a threshold nearing that 10,000 mark. You can easily customize this rake task to alert you at times when feel it’s more useful than the standard 7,000 and 10,000 alerts.

In a nutshell, this will:

Check your row count every hour
Email you if you’re over the threshold

Step 1: Create A Rake Task

In your Rails application, under the folder /lib/tasks/ create the file report.rake

Enter the following

namespace :report do
  task :heroku_row_report => :environment do
    #Send Warning Email If Over The 9000 Threshold
    warning_threshold = 9000
    query = 'select sum(n_live_tup) as records from pg_stat_user_tables'
    record_count = ActiveRecord::Base.connection.execute(query)[0]['records'].to_i
    if record_count > warning_threshold
      puts "Uh Oh .. We're Up To #{record_count} Records"
      Mailer.heroku_row_report(record_count.to_s).deliver
    else
      puts "We Cool, Only #{record_count} Records"
    end
  end
end

Simply change the

warning_threshold = 9000

part to what ever number you want to be alerted at

Step 2: Create The Mailer Action and View

You will need an action in your Mailer that will send you the email if the rake task discovers your row count is over the threshold.
Add the following to your Mailer class (/app/mailers/mailer.rb)

def heroku_row_report(rows)
  @rows = rows
  mail :to => "your_email@domain.com", :subject => "Heroku Row Report"
end

Now add the view (/app/views/mailer/heroku_row_report.html.erb)

Heroku Row Count Warning
Your Heroku Row Count Is At: <%= @rows %>

Obviously you can substitute the above mailer action and view with a custom one of your choosing.

Step 3: Test The Rake Task

At this point, let’s test the rake task.
Open up your console and change into your Rails application directory, then run:

heroku run rake report:heroku_row_report

You should see a response on the screen that looks like the following:

Remember, you will only be emailed if the row count exceeds the threshold you set earlier, so in the case of my screenshot, I would not be emailed.

If you wanted to get an email all the time, a simple way to do this is to change the threshold to 0.

Step 4: Schedule The Rake Task

You’ll want this rake task to run every hour or so. This is simple to do with the free Heroku Scheduler found here: https://addons.heroku.com/scheduler.

Simply add the Heroku Scheduler add-on to your project, jump into the scheduler dashboard, create a new job and enter

rake report:heroku_row_report

as the task to run.

Set the job to run every 1 hour, daily, and you’re done