Jeff Beard

Ubuntu Landscape and MotD integration kills Gitlab SSH performance

2016-06-19T17:03:08Z

I had nothing to do with this discovery but my colleague Lance Johnston, who did, felt that we should share it because of a lack of information about the issue on the Internet.

I lead a team at work that, among other things, manages our version control system. We started to have some performance issues with our Gitlab instance as usage increased and it started to impacted users so we decided to restart the Gitlab services first.

We observed a pretty high load when we checked before stopping the services, around 10-12, which we expected to go down when we shutdown the services. However when the services were off, the load did not go down, which was very curious.

Lance investigates and as he watched ‘top’ he observed batches of inbound ssh connections, as one would expect. But when the connections happened, he immediately saw another batch of processes named ‘landscape-sysinfo’.

A little digging turned up some information indicating that whenever a shell is spawned, such as when there’s an ssh connection, the Message of the Day is presented. The MotD runs the ‘landscape-sysinfo’ program in order to collect metrics that are presented to users when they login. So we have literally hundreds of ssh connections at any given time as Jenkins and developers do their jobs so this program was producing a consistently high load average.

Since the vast majority of ssh connections are not interactive, we disabled the Message of the Day and the load dropped immediately to .01, with the Gitlab services off. When they were turned back on we stabilized around .7 and during the work day it doesn’t go over 5 during usage spikes.

Store Time Machine Backups on an Ubuntu Server

2014-04-20T22:31:48Z

I found this concise article (author’s claim verified) on setting up Mac OS X Time Machine backups on a network drive. I tried using SMB/CIFS to no avail but setting up a Netatalk share did the trick!

Note that I did not modify the Avahi configuration since it wasn’t necessary to make the share usable for backups.

Processing files from S3 with Cascading

2013-08-10T19:05:23Z

Cascading is a Hadoop ecosystem framework that provides a higher level abstraction over MapReduce. I recently worked on a Cascading prototype that would read log files from an Amazon Web Services S3 bucket, do a minor transform, land the output in HDFS then move the files to another S3 bucket configured for archiving.

Figuring out how to get Cascading to read a stream of data from S3 turned out to be a bit tricky since the documentation or example applications didn’t bring together all the pieces explicitly so this article will capture what I’ve learned.

The first thing to understand is that Cascading’s S3 support is really just an extension of Hadoop support for S3.

Secondly, as noted in the Hadoop S3 wiki page, there are two of what I’ll call “formats” that Hadoop supports: HDFS file types and what is called “native” files. The former is a file in the same format as it would be stored in HDFS and the latter is what can be thought of as a “plain old file”. In the prototypes I’ve worked I needed to access native files which were gzipped, delimited files (Hadoop can process gzipped files natively and Cascading offers an extension that supports zip files too).

In the end the tricky part was finding the properties and understanding the two different URI schemes. And it turned out to be very simple to stream files from S3 into a Cascading job. Just setup a Tap with an S3 URI:


import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.property.AppProps;
import cascading.scheme.hadoop.TextDelimited;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;

// ...
public class Main {

    public static void main(String[] args) throws Exception {
        Properties properties = new Properties();
        String accessKey = args[0];
        String secretKey = args[1];

        properties.setProperty("fs.s3n.awsAccessKeyId", accessKey);
        properties.setProperty("fs.s3n.awsSecretAccessKey", secretKey);

        properties.setProperty("fs.defaultFS", "hdfs://localhost:8020/");
        properties.setProperty("fs.permissions.umask-mode", "007");

        AppProps.setApplicationJarClass( properties, Main.class );

        HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );

        String input = "s3n://my-bucket/my-log.gz";

        Tap inTap = new Hfs( new TextDelimited( false, "\t" ), input);

        Pipe copyPipe = new Pipe( "copy" );
        Tap outTap = new Hfs( new TextDelimited( false, "," ), "hdfs://tmp/output");

        FlowDef flowDef = FlowDef.flowDef()
	    .addSource( copyPipe, inTap )
	    .addTailSink( copyPipe, outTap );

        flowConnector.connect( flowDef ).complete();
    }
}

The code above would stream a tab delimited file directly from S3 and output it to the HDFS folder /tmp/output as a comma separated file.

I should also note that this code can be run on Elastic Map Reduce in the cloud as well so the data never has to leave the Cloud.

Netalyzr – Network debugging tool

2012-01-01T17:51:29Z

I’ve had a transient issue with my Internet access randomly “going away”. It’s annoying but generally clears up within a minute or two. I came across a tool called Netalyzr by a group within UC Berkeley. Netalyzr is a Java application available as either an in-browser Applet or a command line utility. It runs a number of network connectivity tests and provides a detailed report hosted on their web site that uses a simple red/yellow/green motif to show problems and their relative importance.

While Netalyzr didn’t clearly show what was going on with my Internet connection it did raise a red flag about network buffers that might be the issue. Unfortunately, that’s a router configuration issue on the part of my ISP so I’m not hopeful for a resolution. But I can always gather data then open a trouble ticket with the vendor.

Regardless, Netalyzr looks like a great tool for troubleshooting connectivity issues.

Prey Project, ping and Cygwin

2012-05-29T13:42:02Z

File this with the obscure issue department…

The Prey Project looked like a nice system for tracking stolen devices and has gotten a lot of good press recently. I decided to try it out. After getting everything setup and working I noticed a lot of Cygwin bash shells running the ping command. The commands accumulated eventually degrading system performance which is when I noticed.

Prey has a partial UNIX environment (MingW) contained in it and consists of shell scripts wrapping a number of UNIX utilities compiled for Windows. I say partial because it doesn’t include the “ping” command which is a dependency for the software. And the shell scripts apparently don’t take into account the potential for a user having other UNIX-like environments installed (Cygwin also has a bash shell and the ping command but there are others as well.) So what was happening is that script (pull) naively looks at what operating system it is installed on and for a ping command and issue what it believes are the correct command line arguments. For Windows it’s this:

ping -n 1 www.google.com

This doesn’t work because Cygwin’s ping.exe doesn’t have a “-n” switch. But for some reason doesn’t fail when it encounters an invalid option. Rather, it tried to ping the IP address 0.0.0.1. This doesn’t work, of course, but the ping command tries forever thus respawning new instances of the bash shell and ping until it kills your computer.

Anyway, I hard coded a change to the script on my system and filed a bug with the Prey developers.

I also submitted an email to the Cygwin mailing list describing the Cygwin ping issue.

Quick-n-dirty git getting started guide

2011-09-20T13:28:02Z

As a git neophyte I approve of this post:

http://news.ycombinator.com/item?id=2970637

UPDATE: I also found this helpful site: gitref.org

Lighting a fire under WordPress

2011-09-07T21:25:53Z

Since I moved my personal web site from Roller to WordPress a couple of years ago, my web site had been a dog. After reading an article about a PHP-based web site configured to support 9 millions hits per day, and knowing through experience that my site should be significantly faster, I decided it was time to light a fire under WordPress.

(Note that I’ve included gists at the bottom of the article with the important configuration files.)

I was using a Slicehost slice with a typical Apache/mod_php configuration but there wasn’t enough memory so it would start swapping with a little use which caused frequent outages. But rather than upgrade to the next sized Slice, I found that I could double my RAM for the same money simply by moving to Linode. So that was the first change I made. (FWIW, I’m not suggesting this as a performance enhancement but it’s definitely a better value.)

Next was a series of changes, some of which were noted in the Tumbledry article, some not. The Tumbledry article was thin on details so I did the research myself and came up with a number of articles with the best being this article on setting up nginx, PHP-FPM, APC, memcached and the W3 Total Cache WordPress plugin. The cryptkcoding article shows how to setup an Ubuntu Linux system for seriously fast WordPress performance that consumes incredible few system resources.

First, an ease of use feature that I discovered: someone did a build of PHP 5.3.8 for Ubuntu 10.04 LTS. Since PHP 5.3 includes PHP-FPM, you can keep everything package based.

To use these packages add these lines to /etc/apt/sources.lst:

deb http://ppa.launchpad.net/brianmercer/php/ubuntu lucid main
deb-src http://ppa.launchpad.net/brianmercer/php/ubuntu lucid main

After updating the sources list, run this command to update the apt cache:

sudo apt-get update

(For more details on using these packages, as well as setting up a similar system, checkout this HowToForge article. In particular, there are some useful comments at the bottom.)

Anyway, one of the main features of this setup was swapping out the Apache web server for nginx and PHP-FPM. Like most PHP developers, Apache and mod_php has been the default setup for PHP applications for years. However, I can now vouch for the nginx/PHP-FPM combo as both stable and fast production environment. (I will try out this combo for development on my next PHP project to see how it works.)

Importantly, the system now uses a UNIX socket for the connection between the web and application servers rather than TCP/IP. That means that for the core application and web services there are two UNIX sockets used, one between the web server and the application server then again between the the application server and the database server (MySQL clients use the UNIX socket when the “localhost” host name is used or the host name is blank.)

Anyway, to really see the difference the architecture changes made, I used a blitz.io Rush to hammer the two instances.

First up was the old Slicehost system. This is the Rush configuration I used (same as Tumbledry):

--pattern 1-250:60 -T 4000 -r california

The result: this rendered the system completely unresponsive and required a hard reboot. Here’s a shot of “top” before the system stopped responding. Note all the memory being consumed, load on the way up and lots of Apache processes:

Oh no! The system just died:

Next was the new Linode hosted solution. The result is that the new architecture sustained the Rush with virtually zero CPU usage (I’m not kidding) or any changes to memory usage. Varnish takes most of the load.

So as it turns out, I can also serve up 9 million hits per day from a small (512MB RAM), inexpensive ($20 per month) virtual server.

Here are the important configuration files:

nginx.conf:

nginx virtual host config:

php5-fpm.conf:

varnish:

wordpress.vcl (varnish site config):

Tip for optimizing MySQL data types

2012-12-11T05:48:32Z

This is a tip that I’ve kept forgetting to write down so here it is:

During a system’s life cycle, requirements change and components are refactored. This includes databases as well, and particularly as data grows. Decisions and assumptions are made at the beginning of a system’s life cycle that may or may not hold up over years of operation and it’s good practice to continually analyze how well the initial design is working.

When doing analysis in support of refactoring database schemas in MySQL, I’ve found this little bit of SQL to be invaluable.

Code:

SELECT * FROM TABLE PROCEDURE analyse();

(I suggest giving it a try on a small table with few rows.)

PROCEDURE ANALYSE interrogates the values in a table, shows the smallest and largest values and suggests a type for each column. While the results frequently indicate that an ENUM type is the most appropriate you can add arguments to the ANALYSE procedure to get more rational suggestions. However, even with no arguments the results can be useful.

For example, you might see that the max value of a column actually is smaller than the type that it’s using. I’ve frequently seen INT(11) columns that would work fine as a MEDIUMINT or even TINYINT. Or your might find that an ENUM type is better since the distribution of values is small in a VARCHAR column. (The benefit of an ENUM is that the data is stored as an integer rather than the string value so its’ footprint on the disk can be significantly smaller).

Anyway, while it’s not a panacea, PROCECURE ANALYSE() is another helpful tool.

MySQL udf_median on Windows 7 64bit

2011-11-24T03:11:40Z

In a minor but ongoing saga of supporting the venerable MySQL UDF function udf_median, I can now add a HOWTO for building it on Windows 7 x64 using Microsoft Visual C++ Express 2010.

I should point out my previous article on the subject since there are parts of it that are still applicable.

This is likely applicable to other MySQL UDFs as well but I haven’t tried.

I used the MySQL 5.1.57 x64 version for my system and I downloaded the zip archive rather than the installer. (Note that the server can still be installed as a service but you will need to run the cmd.exe program as an administrator in order to run the command line installation process.)

I also used Microsoft Visual C++ Express 2010 for this and my Windows version is Windows 7 Ultimate x64.

In addition to VS C++, you will also need the Windows SDK for Windows 7 which you can download here. This addition is critical since it contains the x64 compiler and other tools.

When you have everything installed, follow the instructions on Roland Bouman’s blog post about building UDFs on Windows but stop before building and installing the function then follow these steps to 64bit glory (if you are specifically using udf_median, you might also be interested in this post):

Right-click on the project (not the solution), choose “Properties”. At the top of the dialog, from the “Configuration” dropdown, select “All Configurations”.
Next expand “Configuration Properties” then select “General”. In the field on the right labeled “Platform Toolset” make sure the value “Windows7.1SDK” is selected
Now let’s make it x64. At the top of the dialog box, click on the “Configuration Manager” button.
In the resultant grid, select the “Platform” dropdown and choose “New…”. When the “New Project Platform” comes up, select x64 from the top dropdown then click “OK” then “Close” then close the Properties dialog.
Now you can try building the project
If the build is successful, you find the .dll, in my case ufd_median.dll, in the “Debug” or “Release” folder. Put that in the MySQL lib/plugin/ directory
Install the plugin using SQL like this:
CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.dll';

And that should be it.

Here are a few things that I came across while setting up my project that might be helpful:

In one instance the linker reported this error:
1>LINK : fatal error LNK1104: cannot open file 'kernel32.lib'
The fix for this was making sure that the “Platform Toolset” in the project configuration properties was set to “Windows7.1SDK”
If you get this error:
ERROR 1126 (HY000): Can't open shared library 'file.dll'
this can sometimes mean that the .dll wasn’t compiled correctly (i.e. this is what you get when you try to use a 32bit .dll with a 64bit server) or the symbol exports didn’t work. I used the dumpbin program to make sure that the functions that needed to be exported were. Under the “Tools” menu in VC++, select “Visual Studio Command Prompt” then navigate to the directory with your .dll file and run this command:
dumpbin /exports udf_median.dll
You should see the exported functions in the output.
I did not have to alter any of the C code to make this work even though I saw some comments on Roland’s post indicating that might be necessary.

That’s about it. Feel free to contact me if you have any questions or if the steps didn’t work for you.

Intercept HTTP requests with Squid

2011-04-20T14:33:02Z

On one of my projects we had some questions about how much bandwidth was being used by requests to a third party service but we didn’t have any a view beyond general traffic on the network interface. I hit upon the idea of using a transparent proxy to log requests then use log analysis to break out data transfer amounts per third party service. And since we already had squid as part of our infrastructure applications it seemed like a good choice.

The tricky part of this setup is that everything is hosted on the same hardware node and we also have some web services that needed to be left untouched. These requirements implied some network configuration using iptables to force outbound web requests through the proxy.

So the first thing I needed to do was install squid. On this project we use CentOS on all our hosts so this was easily accomplished like this:

sudo yum install squid

Next was adjusting the configuration. The default squid.conf comes with lots of documentation which is helpful but makes the configuration file difficult to read and navigate so the first thing I did was get rid of it like so:
cd /etc/squid && sudo cp squid.conf squid.conf.orig && sudo egrep -v'^#' squid.conf > /tmp/squid.conf

This leaves a lot of empty lines in the file which can be removed like this:

sudo sed '/^$/d' /tmp/squid.conf > /tmp/squid.clean && sudo mv /tmp/squid.clean squid.conf

Next up was setting up networking and squid.

The squid site has a great set of examples, one of which looked like it suit my purposes nicely.

First I configured squid by adding this directive:

http_port 3128 transparent

Then I started squid:

sudo service squid start

I also wanted to make sure squid starts when the system is rebooted:

sudo chkconfig --levels 2345 squid on

Next up was network configuration.

I needed to setup iptables with some NAT rules to force requests through the proxy server. The first command clears out any existing rules. If you already have a custom kernel network config, use this with caution:

sudo iptables iptables -t nat -F

The next rule is for a typical transparent proxy setup. In the setup that I was working with I did not need this rule, something I discovered by disabling the existing web sites with this command. So if you have a web server DO NOT do this:

sudo iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 80 -j REDIRECT --to-port 3128

Here is the start of the iptables configuration we implemented.

Apply the rules to force local HTTP traffic through the transparent proxy:

gid=`id -g squid` sudo iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --gid-owner $gid -j ACCEPT sudo iptables -t nat -A OUTPUT -p tcp --dport 80 -j DNAT --to-destination HOSTIP:3128

Replace the string “HOSTIP” with the IP address of the host you’re configuring.

At this point I needed to test the setup so I tailed the access log. Or at least I tried to. The default directory permissions on the /var/log/squid directory prevented me from viewing its’ contents. I fixed that with this:

sudo chmod 0775 /var/log/squid

Then I was able to tail the /var/log/squid/access.log. So I created a request to see if it was logged:

wget http://www.google.com

I saw the request logged in the squid access log so I was satisfied that it and the networking were functional. However the log format wasn’t what we needed to feed to awstats, which is what I going to use to process the log.

Since we already use squid and process its’ logs I grabbed the configuration from our production configuration file:

logformat combined %>a %ui %un [%{%d/%b/%Y:%H:%M:%S %z}tl] "%rm %ru HTTP/%rv" %Hs %h" %Ss:%Sh access_log /var/log/squid/access.log combined

Then I restarted squid and tested again. It looked good so I tested one of our batch processes that makes HTTP requests to make sure that it did what I wanted. It did however I noticed that query strings from the URI were not being logged. A quick google told me that I needed to update the squid.conf with this:

strip_query_terms off

As it turns out squid squid strips query string after the “?” by default. This is apparently to “protect privacy” but we needed the query string to identify individual requests more accurately.

At this point I had the system setup and working. It logged all the outbound HTTP requests and the existing web services remained unaffected. All that was left to do was setup awstats to process the logs.