hross.net

Headless Chrome Basic Auth

2017-05-30T00:00:00+00:00

I recently spent some time checking out Headless Chrome since it seems it will probably at least kill PhantomJS.

It doesn’t yet fully appear ready for prime time but, there are some cool uses already in the wild.

All I wanted to do was hit a page that had basic auth enabled, but it turned out to be more of a task than I thought. Here are some of my notes:

Get Ubuntu box up and running.
Install latest Chrome:

(From askubuntu)

sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
sudo dpkg -i google-chrome*.deb

Test Headless Chrome:

google-chrome-stable --headless --disable-gpu --no-sandbox --remote-debugging-port=9222 --remote-debugging-address=10.138.33.145  http://www.google.com

Navigating to that port/IP should display a page you can debug your Chrome session on.

Install node, npm, chrome-remote-interface.
Set up chrome-remote-inteface node script.
Use Chrome DevTools docs to browse options.
Come up with something like this repo here.

Git Dates

2016-03-30T00:00:00+00:00

Here is another article I recently released via Microsoft. Pretty simple, but handy for keeping track of Git Dates.

Git History Simplification

2016-02-23T00:00:00+00:00

I have been pretty busy working on our Git implementation in Visual Studio Team Services here at Microsoft, so I have not been posting much. However, I did find some time to write an article about Git History Simplification over on the Visual Studio site.

How to Evaluate Product Management at a Technology Company

2016-02-12T00:00:00+00:00

Part 3 of 3, been up for a while at Medium: How to Evaluate Product Management at a Technology Company.

How to Evaluate Engineering at a Technology Company

2015-09-18T00:00:00+00:00

Part 2 of 3, now up over at Medium: How to Evaluate Engineering at a Technology Company.

How to Evaluate Management at a Technology Company

2015-09-03T00:00:00+00:00

Usually I post more tech stuff on this blog (when I actually post), so I thought I’d start a blog over at Medium that is a little more squishy. Check out my first post on How to Evaluate Management at a Technology Company.

Reading SEC Data From EDGAR in C#

2013-04-13T00:00:00+00:00

Since I’ve been working on side projects, moving to North Carolina, and generally having a life, I have not updated this blog much. That doesn’t mean I haven’t been working on things, though. One of my more recent projects has been scraping the SEC EDGAR database for public filing data on companies. Surprisingly, I found very little in the C# realm for doing this.

I wrote a bunch of stuff to help me grab the data and shove it into a database using ServiceStack. Nothing earth shattering, but if you are interested in learning how to evaluate companies, or you just want this kind of data, it’s a great place to start.

Take a look at the Github Project for more info.

Charity Campaign (building a node.js web app)

2011-09-09T00:00:00+00:00

Over the past month or so I’ve been putting together a node.js application for fun. The goal was to get some “real world” development experience with node.js and mongodb, figure out what the challenges are (is event oriented programming hard? how will I live life without a traditional RDBMS?) and build something people might actually use.

To wit: Charity Campaign… an application for tracking charity drive donations. The basic concept is to have teams of users submitting various items, assign bonuses to those users based on what they submit, give some basic security/management and display stats.

The idea isn’t new, other people I know have written Google App Engine applications for it, but its a lot easier to focus on a niche you know than make up a new one for the purposes of writing software.

Without further ado, here are some of the pieces/concepts I used to build the application:

node.js packages

express for most of the http framework and middleware support
connect-form for file uploads
ldapjs for ldap connectivity
mongodb for mongo db connectivity
async for keeping my sanity when writing event driven code that had dependencies

concepts/info

The vision media samples were uber helpful in learning how to use express. The mvc sample is the basis for my routing framework.

Stackoverflow was useful for finding configuration management information.

Found a good csv parsing starting point (ended up hacking this code to bits, but the basics and the regex are there).

Twitter recently released boostrap and it seemed cool so I converted the user interface to use it (previously I was using some free html template or other).

false starts

Geddy looked like a good MVC starting place, but development appears to be halted so I didn’t get far.

Mongoose looked cool, but seemed a bit heavy and constrictive. It probably would have helped with data validation and error handling, though.

I initially started with couch-db and cradle, but I didn’t care about the replication or json consumption features and generating unique integers for slugs started to become a PIA.

The mongodb tutorials were much more accessible and accommodating.

summary

Hopefully this application actually gets used at some point. Even if it doesn’t, building and committing it to github has been a satisfying experience in and of itself.

Converting HTML to Markdown

2011-09-07T00:00:00+00:00

I started converting some old posts and found it was a breeze (other than the annoyance of link conversion). There is a great web based HTML to markdown converter. Here are a couple of links in case you’re in the same situation:

Setting up a Drupal Training Instance

2011-09-07T00:00:00+00:00

In a couple of days I’m providing Drupal training at PPC. The goal of the training is to provide an overview and “get your feet wet” with Drupal.

I found a couple of cool things while looking for some decent training resources. First, the handbooks on drupal.org are top notch (and free). Second, there is an excellent (free) ebook over at learnbythedrop called Building Your Blog With Drupal. It covers a soup to nuts install and configuration of a blogging platform (and is pretty up to date as of this writing). Perfect for a beginning training session.

To facilitate the training I set up an nginx install with php on my local windows machine (the training is in windows to make things less of a culture shock). You can find a great tutorial on doing that here and an nginx config tailored for Drupal here. Finally, I set up mysql with phpmyadmin and I was ready to rock.

The best part of this is that I needed to set up 25+ training machines with the same configuration. Because it’s all batch file oriented, I simply had to copy the entire directory tree, take a dump of the database and copy the whole thing to an EC2 windows instance. Finally, with a little nssm goodness I was able to make the whole thing automatic.

Presto. Drupal training.

Adventures in GitHub Blogging

2011-09-05T00:00:00+00:00

I got tired of Drupal as a blogging platform awfully quickly. It ended up being too much for my needs (as is probably the case with anyone who is using it solely for blogging).

I stumbled across a couple of interesting solutions (like hosting on s3). Ultimately, though, I like the idea of keeping it simple and having the content in a format I can easily export and read with a text editor.

Windows Live Writer is still the best blogging tool I’ve ever used… too bad I’ve started focusing some of my development efforts on my mac lately. Time to move back to a text editor…

Luckily, github has some awesome abilities to both version your blog and host it at the same time, thanks to github pages.

Tom Preston Werner has an interesting post about doing just that, as do some other people.

Cool stuff:

I can use markdown for my blog posts
My entire blog is automagically in source control
Updating entries is as simple as making changes using git
I can edit the blog entries on github from anywhere I have an internet connection
Pretty much free, minus the cost of pointing a domain name at the blog if I so choose.
Preston-Warner makes his blog template freely available on github
diqus works great for comments

Not as cool stuff:

No fancy editors (I am more than okay with that)
I have to maintain my own HTML templates, SEO, etc
I have to use git (pro and con)
Have to learn yet another templating language (Jekyll). This looks useful: Jekyll Template Wiki.
All my blog updates are public on github (unless I want to start paying them for private repos)

Debugging Drupal Cron

2011-06-01T00:00:00+00:00

Recently, we had some issues with Drupal cron and indexing of Solr results, so I figured I’d share a couple of quick tips on debugging Drupal cron:

Get Supercron. That will let you individually run each cron task so you can figure out which one is failing.
Usually the problem is with search_cron. In order to dig deeper into this, you can hack core with watchdog statements and run a few database queries to identify problematic nodes. See this thread here for a quick how to. Here is an example of a hacked up search_cron function for debugging purposes:

function search_cron() {
	// We register a shutdown function to ensure that search_total is always up
	// to date.
	register_shutdown_function('search_update_totals');
	
	// Update word index
	foreach (module_list() as $module) {
		watchdog('search_debug', 'update index for ' . $module);
		module_invoke($module, 'update_index');
	}
}

Finally, if things get really hairy, you can build your search cron task into a module and run it by itself on demand. Build your own search cron task and exclude it. Details here.

Quick and Dirty Drupal Profiling

2011-04-26T00:00:00+00:00

One major question I had when I first started debugging module code was “how can I see how my code is performing?”. Turns out its pretty easy to get this info in a variety of ways.

XHProf

The talk of the PHP profiling town seems to be Facebook’s latest entry XHProf. For what I’m looking for it’s a bit heavy, and it only seems to work on Linux, which is a problem, since I’m developing and deploying on a mac and a wintel box.

Xdebug

Ah yes, my old friend XDebug. As it turns out, you can set up function profiling in XDebug with a couple of php.ini options. I won’t spend too much time on setup, since Zend did an exhaustive job of explaining it here. Or you can probably just take a look at the great documentation here.

Webgrind

Okay, so you created yourself a bunch of cachegrind files and you’re ready to see what’s happening. You now have a couple of options, depending on your platform (WinCacheGrind on Windows, KCacheGrind on Linux or MacCallGrind on a Mac). However… the Mac version isn’t free and setting up Linux compatibility stuff to view a call stack isn’t my thing. Turns out there is a truly righteous web based grind file viewer called webgrind. Cross platform and super easy install. Sweet. Only problem you may have is that large cachegrid files will take forever to load.

Devel Module

So what about Drupal specific profiling information? Turns out there is a great module for that called Devel. It will give you SQL statement profiling and memory usage, and a variety of other features.

Finally, on the OS side you have your system specific tools like Process Explorer, Activity Monitor, top, etc.. Happy Profiling!

Quick and Dirty Drupal Debugging

2011-04-14T00:00:00+00:00

Over the past few months I’ve had the opportunity to branch out and hack on some Drupal modules. In doing so, I noticed that (a) I had no idea what I was doing, and (b) there wasn’t a simple, easy description of how to debug Drupal code. Hopefully this will serve as a quick and dirty primer.

Watchdog

This is the quick and dirty way to “write to console” in Drupal. Simply putting the following PHP statement in a module:

watchdog("foobar", "hello world");

Will print to the Drupal log (Admin

Recent Log Messages).

XDebug

I highly recommend you immediately download and install XDebug for your PHP installation. Many *AMP stacks include it by default (MAMP and XAMPP among them).

Along with a bunch of other stuff, it will “pretty print” PHP error messages in HTML, so the next time you take your Drupal instance down by forgetting a semicolon, you’ll see a nicely formatted call stack, memory profile, etc.. Installing it is ridiculously easy, since they give you very specific instructions based on your configuration.

var_dump and debug_backtrace

Here is where XDebug comes in really handy, since it auto-formats these functions when they are used. In a nutshell, var_dump will literally dump the contents of any variable directly to the current HTML page it executes on. If you add it to some module code, it will crap its output directly to the screen.

Even more useful is the debug_backtrace function, which gives you an array that contains the entire stack trace at the time it was called, and the values of the variables that were passed to all of the functions in that stack trace (you need XDebug to really appreciate this function).

set_error_handler

Another nifty trick is replacing the default PHP error handler with your own (using the function set_error_handler). Typically I replace it with a custom handler that uses debug_backtrace to give me a call stack and/or use var_dump to see what’s going on. Here is an example:

set_error_handler('debugErrorHandler');

// problem code here          

restore_error_handler();

// error handler function
//TODO: remove this after we figure out the problem
function debugErrorHandler($errno, $errstr, $errfile, $errline) {
	var_dump(debug_backtrace());
	// call php error handler
	return false;
}

Warning: mysql_real_escape_string() expects parameter 1 to be string, array given in includes/database.mysql.inc on line 321

This error seems to end up all over the place and it makes a great “how to debug” story. A lot of times someone will write some ugly code, forget about it, then push it to your development environment or provide it via a module.

What do you do if you start seeing this (or a similar type of error) appearing in your watchdog logs? How can you figure out which module is causing the problem?

Simply put, you follow the instructions here, except you replace the function with this one:

function db_escape_string($text) {
	global $active_db;
	if(is_array($text)) {
		var_dump(debug_backtrace())
	}

	return mysql_real_escape_string($text, $active_db);
}

Alright, well that’s about it. Happy debugging!

Helvetica

2010-03-10T00:00:00+00:00

The best thing about Netflix is the instant queue and its constant ability to surprise me. I end up watching movies and documentaries about things I would normally never see, simply because I wouldn’t know they exist. One such example is the documentary Helvetica.

I am sure there are many consultants, programmers and others in the IT world who, like me, end up wearing the graphic designer hat from time to time. It may be to create icons or images for a web site, or it may be to edit CSS styles.

If you are one of these types of people, and you have even a passing interest in design, Helvetica is for you. It takes you down the rabbit hole of typeface design, various artistic movements, and the “why’s” and “how’s” of what make a good typeface. This is one of those movies that could actually make you better at your job and provide some interesting historical context around something we take for granted when we design web sites.

It will also make you look at normal everyday signage in a totally different light.

Apache Directory

2010-03-05T00:00:00+00:00

In one of my previous posts I mentioned Apache Directory Studio as a great way to view LDAP directories, but the entire Apache Directory project really deserves its own post (and here it is).

First of all, Apache has implemented a fully featured directory browser (Apache Directory Studio) and a fully featured LDAP directory (Apache Directory Server). Both of these projects are entirely Java based. Studio is an eclipse based directory browser with all the bells and whistles. JXplorer is nice, but not as powerful or as easy to use.

However, the real power of ApacheDS is the server. It’s entirely java based and is available for a variety of platforms, including Windows. I have yet to find an easier platform to install and use on a variety of operating systems (especially Windows). Rather than trying to build a confusing OpenLDAP implementation, you can simply download, install, and start ApacheDS in 5 to 10 minutes.

And, oh by the way, you can also embed and manipulate ApacheDS in your own applications, since its written entirely in Java and the source code is freely available.

So, if you are looking for an easy, free, directory implementation for your next proof of concept, demo, or unit test, look no further than ApacheDS.

Google Translate is Awesome

2010-01-14T00:00:00+00:00

I am not sure this is even enough for a blog post, but… well it is. Here are just a few the of cool things you can do with Google Translate:

Use the new audio capability to generate audio translations of text, automatically.
Use the translate API to translate your entire web page by simply including some javascript.
Programmatically translate stuff using the unofficial Java API. There is probably one for .NET, but I haven’t tried it.
Translate bits and pieces of your web site, on demand, using a very cool jQuery plugin called Sunday Morning.

Synchronizing Email with Ruby

2010-01-11T00:00:00+00:00

Recently, I was in the midst of a Windows 7 installation when my company decided to migrate my email to a new mail server. As we in the IT world are aware, migrations rarely go as planned. It seemed like as good a time as any for me to start a project I have long considered: migrating all of my email to Gmail.

I guess this is technically something I’m not supposed to do. Then again, it is no less secure than downloading and sending email using my local laptop and a standard email client (provided the passwords/accounts are properly encrypted). Either way, I love Gmail for personal email and there is no way my entire work organization is going to switch to Gmail, so I decided to set up my own little synchronization process.

Here is what I did:

1.) Enable IMAP on my Gmail account. My work email is already IMAP, so this let me drag and drop folders from one mail account to another using Thunderbird. Once all my folders were migrated, I only had to worry about new email in my inbox.

2.) Set up a synchronization process from my work email to my Gmail account. Transferring mail itself is pretty simple. There is an RFC that defines what mail messages look like, so they are the same data from one mail account to the other. The trick is moving them automatically.

Since I already have a Linux host that runs full time (this site), it seemed like my most sane option would be to write something that I could schedule using cron. Since I am a member of the cargo cult, I thought I could pretty easily find something on Google written in Java.

After some searching, though, it seemed like the best and simplest examples were all written in Ruby. Unfortunately, none of them did exactly what I wanted so I figured I would have to write a bit of code. Before I began this endeavor, I knew nothing about Ruby (yes, I am way behind), but it seemed like a good time to learn.

I started off with some setup:

http://wiki.dreamhost.com/Ruby

http://wiki.dreamhost.com/index.php/Ruby_on_Rails

Next I went with a few blogs/docs and some source from the beginnings of larch:

http://ruby-doc.org/stdlib/libdoc/net/imap/rdoc/classes/Net/IMAP.html

http://wonko.com/post/ruby_script_to_sync_email_from_any_imap_server_to_gmail

http://codeclimber.blogspot.com/2008/06/using-ruby-for-imap-with-gmail.html

While I like larch, it doesn’t delete from my inbox, has way more features than I need, and it is more object oriented (and thus harder to understand) than I would like. Since I am a Ruby novice, I wanted something simple that I could make sure was working. Here is what I came up with:

#! ~/run/bin/ruby
require 'net/imap'

puts "Synchronizing mailboxes..."

# create destination imap
dest = Net::IMAP.new("imap.gmail.com",993,true)
dest.login("my.account@gmail.com", "password")

# create source imap
source = Net::IMAP.new("imap.work.com",993,true)
source.login("my.account@work.com", "password")

puts "Logins complete. Checking source mailbox for mail."

source.select('INBOX')
source.search(["NOT", "DELETED"]).each do |message_id|
        puts "Found message: #{message_id}"
        msg = source.fetch(message_id, ['RFC822', 'FLAGS',
                  'INTERNALDATE'])[0]

        puts "Transferring message with id: #{message_id}"
        dest.append('INBOX', msg.attr['RFC822'], msg.attr['FLAGS'],
            msg.attr['INTERNALDATE'])

        puts "Deleting message from source inbox."
        source.store(message_id, "+FLAGS", [:Deleted])
end
puts "Transfer complete. Logging out."

source.logout()
source.disconnect()

dest.logout()
dest.disconnect()

puts "done."

Simple, huh? I set this up to run as a cron job every ten minutes and that’s all it took.

Windows Tools

2009-09-12T00:00:00+00:00

It came out just in time for my birthday this year: Scott Hanselman’s 2009 Windows Power Tools list. It is indeed a great list. If you are a developer working in Windows and have a few minutes to spare, you might just find something on that list that makes your life a million times easier.

As some of you may know, I recently took a new job within Oracle, and as such I had the privilege of getting a brand new laptop for my birthday (okay, I happened to start the job the same week as my birthday, but it felt like I got a birthday present). There is nothing quite like a fancy new laptop and a list of awesome tools and utilities to install on it.

Anyway, since I’ve finally got it perfectly configured (never to be this fast or nice again), and since searching through the 2009 list is a bit of a grind, I thought I’d share my own, much smaller, list of tools I love (I’ll try to keep this not-so-development-oriented):

PS Hot Launch - I hate the start menu, and the quick start tray is never big enough for all my icons. Enter PS Hot Launch, where I can now keep all my frequently used programs and bind hotkeys for startup. There are a lot of heavy duty hotkey managers and the like, but for my money (or lack thereof) this is the easiest, best, and most free version.

Console - I never use a dos prompt anymore. This one looks way cooler and supports more features (like easy copy and paste).

GNU Win32 - One of my biggest peeves about Windows is not being able to do *find .

xargs grep “blah”* and get results. I sometimes use cygwin, but this is much less overhead and more native. Thank you GNU.

Microsoft TimeZone - This is a weird one, but I find myself constantly having to check what time it is on the west coast, central, etc.. I am never sure how many hours to subtract. This super simple free utility runs in your tray and lets you customize up to five locales to show current times for when you click on it. A lot easier than googling “current time pacific”.

Google Desktop - When it first came out, I thought “will I really use this?” Now I can’t go a day without it. How many times have you thought about an email you sent two months ago but could only remember various key words? Or an old coding project that has something specific you did that you now can’t remember? I can’t even count the number of hours this has saved me in “hunting for stuff” time. Seriously… if you don’t have this you don’t even know what you’re missing.

Textpad - My favorite of the “enhanced text editors” crowd. It has a vast array of pluggable syntax highlighting (no more Eclipse to edit one line of Java code), explorer shell integration, and an intuitive interface without a ton of annoying bells and whistles.

Postbox - A lightweight email client. If it had better calendar integration I’d give it a gold star. Still a great quick and dirty client that has some nice search capability.

PuTTY - Remember when people made applications that fit in a single executable and just did the job? Yeah… I do too. If you’re not using PuTTY I have no idea why not.

Xming - Perhaps a limited audience on this one, but if you need to use X in Windows, this one is for you.

SQLDeveloper - Wait… you mean to tell me Oracle made a lightweight, user friendly, super powerful database tool? And I don’t have to use sqlplus or Toad anymore? And it’s free?! Say no more…

Apache Directory Studio/Server - I absolutely love these two. I used to use Softerra LDAP Browser, which is a great tool in its own right, but I cannot tell you how happy I am to see an easy to install, easy to configure LDAP directory for Windows. I can now test LDAP integrations with impunity! Documentation is a bit sparse, but the first time I downloaded and installed this sucker I got all warm and fuzzy inside.

WinMerge - I blogged about this earlier and it is great for file and directory comparisons.

PDF Split and Merge - This is a great tool if you find yourself having to create expense reports or combine PDF’s of scanned documents. The name says it all.

Darkroom - I tend to take a lot of notes in text editors and this one just looks awesome. You may not use it all the time, but it definitely gets an A+ on style points.

Here are a few more that are pretty common so I won’t write blurbs about them: Pidgin, ImgBurn, PaintDotNet, LiveWriter, and Wireshark. And of course there are the myriad plugins you can get for Firefox, but they deserve a separate post.

Alright… happy downloading!

Sample Search Portlets

2009-09-03T00:00:00+00:00

A while back I wrote this post detailing my struggle to find a quick and dirty way of displaying paginated table data in the portal (or anywhere, for that matter). I ended up settling on the method I found at Spartan Java.

While I enjoyed the series of articles, I still ended up having to review a jQuery primer, the DWR documentation and the iBATIS user guide. I would have liked to have been able to download the finished application. On top of that there were the usual struggles with portletizing the code, and scrambling to jam a bunch of features into the finished web application before launch.

Thus, when I got asked to provide some “development best practices” portal code, I was in a bit of a bind. I wanted to show off the WCI portal’s ability to consume and use a variety of frameworks, but my original source was… pretty crappy. In the end I went back and rewrote a very simple sample application from scratch (including documentation) using the methods described above.

Hopefully I can save you some time and effort in similar endeavors. Without further ado here is a link to a sample database search application using the Oracle HR sample data.

It’s nothing fancy (yes, there are even possible SQL injection vulnerabilities), but if you want a simple example of jQuery, iBATIS, Java and the portal, this is definitely something to check out. All you should have to do is import the war file into eclipse and read the index.html documentation.

Configuring Oracle XE

2009-08-27T00:00:00+00:00

Lately I have found myself setting up a lot of Amazon EC2 instances, new computers, virtual machines and the like. As such, I’ve come to know and love Oracle XE. While it “just works”, there are a few tweaks that vastly improve performance and behavior. All of these tweaks require you to log in as the system user and run the noted SQL. In no particular order, they are:

Modifying the default listening port of the web server (why is it listening on the default Tomcat port?!): begin dbms_xdb.sethttpport(‘7080’); dbms_xdb.setftpport(‘2100’); end;
Increasing the number of sessions and processes so you don’t get locked out of the database: alter system set sessions=250 scope=spfile; alter system set processes=200 scope=spfile;
Making the web server available for non-local access (in case you are running out of a console): begin dbms_xdb.setlistnerlocalaccess(false); end;

Here are the original links for these tips:

Finally, I have a tip of my own. If you are running Oracle XE in an EC2 instance (see this article), you will undoubtedly notice that when you restart your brand new AMI with a new IP, Oracle will fail to start (doh!). In order to fix this, you need to do the following:

Make copies of your listener.ora and tnsnames.ora files. Modify them so that your current hostname in EC2 is replaced with “localhost”, and rename them to listener.ora.localhost and tnsnames.ora.localhost.

Add this script to the /etc/init.d/oracle-xe startup script (under start is preferable): NEWHOST=hostname

sed s/localhost/$NEWHOST/ /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/listener.ora.localhost
> /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/listener.ora
sed s/localhost/$NEWHOST/ /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/tnsnames.ora.localhost
> /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/tnsnames.ora

Profit (you may want to modify the above to run as the oracle user – not as root).

(sorry Windows users, you will have to create your own variation of this)

EC2/Cloudwatch Gaming Results

2009-05-31T00:00:00+00:00

As I mentioned in my previous post, I wanted to capture some real world info on hosting a game server in the cloud. The results were a rousing success. We had 5 or 6 people connected at various times, played some Deathmatch and Capture the Flag, and everyone had a ping of 40 or less the entire time. I didn’t notice any latency whatsoever and there were absolutely no packet loss or lag complaints throughout.

Cost

I haven’t broken down the numbers yet, but all told I started up an EC2 instance and hosted a game for 2 hours. I also attached an elastic IP for ease of use. That cost me less than $0.50. I’d say that’s a pretty good deal.

Usage

Below are the usage stats for network I/O and CPU usage. I gathered these using my simple Java application and created these no-frills charts in Microsoft Excel (all told, this took about 5 minutes to put together):

Figure 1 - Network I/O over a 2 hour F.E.A.R. game

Figure 2 - CPU Usage over a 2 hour F.E.A.R. game

Conclusion

This is a short and imperfect analysis, but overall I’d say the “small” EC2 instance could easily have handled a 16 person game, both from a load and network traffic standpoint, and it would have cost me a dollar or so to host for 2 hours. That seems like great bang for your buck if you’re looking to crank up a quick game and then move on to something else.

Update Publisher Publishing Targets

2009-05-21T00:00:00+00:00

I have recently been working on a utility for porting ALUI databases from a production environment to a development environment. Fabien Sanglier started this effort, and I hope to have some code to contribute to his ALUI toolbox project very soon.

In the meantime, however, I have been banging my head against the pain that is migrating Publish and Preview target URL’s in Publisher. These URL’s are stored in a binary BLOB in the Publisher database, and are actually serialized Java classes, making them extremely difficult to update (especially when you don’t have access to the original Publisher source code).

My original plan was to wrap all of this stuff into one “uber-utility” and then blog about it. Recently, though, I saw this post on the Oracle Webcenter Interaction discussion forums: http://forums.oracle.com/forums/thread.jspa?threadID=900736&tstart=0 and it made me think I should probably post the code for migrating Publishing Targets, for the benefit of the sanity of the community at large.

Here is a link to a jar file which will update Publisher publish targets. If you crack the jar file with a zip editor, you will be able to update the configuration.properties file in the root directory to suit your needs.

I took the liberty of including the Publisher classes in my own jar, making it simpler to run from a command line. To run it, you will only need to download the correct jdbc driver for your database:

Oracle JDBC Driver

SQL Server JDBC Driver

Next, simply execute it from a java command line with the driver in your classpath, like so:

java -cp updatepublishtargets.jar;ojdbc14.jar net.hross.content.UpdatePublishTargets

Note that the utility is in debug mode by default, so nothing will happen to your Publisher database until you set debug to false in the configuration, although now is probably a good time to let you know that I provide no warranties of any kind with this code.

In order to build and run the source, you will need the content.jar and dom4j.jar found in the WEB-INF/lib directory of your ptcs.war. Here is the relevant source code, in case you are looking to build your own version of the utility (source is also in the jar):

package net.hross.content;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import net.hross.utility.Configuration;

import com.plumtree.content.data.AttributeKey;
import com.plumtree.content.data.impl.RdbiPublishingTarget;

public class UpdatePublishTargets {

    public static void main(String[] args) {
        Connection connection = Configuration.getConnection();

        if (null == connection) {
            System.out.println("Unable to connect to database. Exiting.");
        }

        int directoryId = Integer.parseInt(Configuration
                .getString(Configuration.CONFIG_DIRECTORY_ID));
        boolean debug = Boolean.parseBoolean(Configuration
                .getString(Configuration.CONFIG_DEBUG_MODE));
        String newPublishTarget = Configuration
                .getString(Configuration.CONFIG_PUBLISH_TARGET);

        System.out.println("Updating publish targets for directory ID: "
                + directoryId);

        System.out.println();
        if (debug) {
            System.out.println("** DEBUG MODE ON ** Nothing will be updated.");
        } else {
            System.out.println("** DEBUG MODE OFF ** This is happening for real.");
        }
        System.out.println();

        updatePublishTarget(connection, directoryId, newPublishTarget, debug);
    }

    /***
     * Update the publishing target for a specified directory ID (-1 for all
     * items).
     * 
     * @param connection
     *            Publisher database connection.
     * @param directoryId
     *            Directory ID to update. -1 for all items.
     * @param newPublishTarget
     *            - New publishing target.
     * @param debug
     *            - if true, no replace will be made, data will just be output.
     */
    public static void updatePublishTarget(Connection connection,
            int directoryId, String newPublishTarget, boolean debug) {

        // create a statement to query the directory id
        try {

            // create prepared statement for directory query
            PreparedStatement psDirectory = null;
            if (directoryId &gt; 0) {
                psDirectory = connection
                        .prepareStatement("SELECT * FROM PCSDIRECTORY WHERE ITEMTYPE=0 AND DIRECTORYID=?");
                psDirectory.setInt(1, directoryId);
            } else {
                psDirectory = connection
                        .prepareStatement("SELECT * FROM PCSDIRECTORY WHERE ITEMTYPE=0");
            }
            ResultSet rs = psDirectory.executeQuery();

            // loop through any rows we need to check
            while (rs.next()) {

                // get basic info about the object
                String itemName = rs.getString("ITEMNAME");
                int size = rs.getInt("DATASIZE");
                
                // reset directory ID in case it was generic
                directoryId = rs.getInt("DIRECTORYID");

                // get binary input stream
                InputStream input = rs.getBinaryStream("DATABYTES");

                // if there's actually some settings, let's check them
                if ((null != input) &amp;&amp; (0 != size)) {

                    // generic catch statement for problems with this item
                    try {
                        byte[] buffer = new byte[size];
                        input.read(buffer);

                        // load the hash map from the database
                        Map map = (HashMap) deserialize(buffer);

                        // loop through the keys in the hash map
                        Iterator keys = map.keySet().iterator();
                        while (keys.hasNext()) {
                            Object key = keys.next();

                            // this should probably always be true
                            if (key.getClass().equals(AttributeKey.class)) {
                                AttributeKey akey = (AttributeKey) key;

                                // if we found a publishing target...
                                if (akey.getKeyString().equals(
                                        "PUBLISHING_TARGET")) {
                                    System.out.println();
                                    System.out.println("--------------------");
                                    System.out
                                            .println("Updating publishing target for:");
                                    System.out.println(directoryId + " - "
                                            + itemName);

                                    // get the publishing target info
                                    RdbiPublishingTarget val = (RdbiPublishingTarget) map
                                            .get(key);
                                    String publishTarget = val
                                            .getPublishDetail()
                                            .getTargetLocation();
                                    String publishBrowser = val
                                            .getPublishDetail()
                                            .getBrowserLocation();
                                    String previewTarget = val
                                            .getPreviewDetail()
                                            .getTargetLocation();
                                    String previewBrowser = val
                                            .getPreviewDetail()
                                            .getBrowserLocation();
                                    String ftpUser = val.getPublishDetail()
                                            .getUsername();
                                    String ftpPassword = val.getPublishDetail()
                                            .getPassword();

                                    System.out
                                            .println("Publish  browser location: "
                                                    + publishBrowser);
                                    System.out.println("Preview target: "
                                            + previewTarget);
                                    System.out
                                            .println("Preview browser location: "
                                                    + previewBrowser);
                                    System.out.println("FTP user: " + ftpUser);
                                    System.out.println("FTP password: "
                                            + ftpPassword);
                                    System.out.println("Old publish target: "
                                            + publishTarget);
                                    System.out.println("New publish target: "
                                            + newPublishTarget);

                                    // if we are doing this for real, update
                                    // values
                                    if (!debug) {
                                        val.setTargetValues(newPublishTarget,
                                                publishBrowser, previewTarget,
                                                previewBrowser, ftpUser,
                                                ftpPassword);

                                        map.put(key, val);

                                        // update the directory
                                        serializeToDirectory(connection,
                                                directoryId, map);
                                        System.out.println("Update successful.");
                                    }
                                    System.out.println("--------------------");
                                    System.out.println();
                                }
                            }
                        }

                        // clean up
                        input.close();
                    } catch (IOException ex) {
                        System.out.println("Something bad happened.");
                        ex.printStackTrace();
                    }
                } // if null
            } // while next rs
        } catch (SQLException ex) {
            System.out.println("Something bad happened.");
            ex.printStackTrace();
        }

        System.out.println("Procedure successfully completed.");
    }

    private static Object deserialize(byte bytes[]) {
        try {
            ByteArrayInputStream byteStream = new ByteArrayInputStream(bytes);
            ObjectInputStream objectStream = new ObjectInputStream(byteStream);
            return objectStream.readObject();
        } catch (Exception ex) {
            return null;
        }
    }

    private static void serializeToDirectory(Connection conn, int directoryId,
            Object obj) throws IOException, SQLException {
        byte bytes[] = getBytes(obj);
        ByteArrayInputStream byteStream = new ByteArrayInputStream(bytes);

        PreparedStatement ps = conn
                .prepareStatement("UPDATE PCSDIRECTORY SET DATASIZE=?, DATABYTES=? WHERE DIRECTORYID=?");
        ps.setInt(1, bytes.length);
        ps.setBinaryStream(2, byteStream, bytes.length);
        ps.setInt(3, directoryId);
        ps.execute();
        conn.commit();
    }

    public static byte[] getBytes(Object obj) throws java.io.IOException {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(bos);
        oos.writeObject(obj);
        oos.flush();
        oos.close();
        bos.close();
        byte[] data = bos.toByteArray();
        return data;
    }
}

Monitoring Performance with Amazon CloudWatch

2009-05-19T00:00:00+00:00

It is rare that I am on the bleeding edge of technology. Normally, I don’t think its worth the time and effort necessary to learn something brand new unless it has been at least somewhat widely adopted and accepted by the community at large.

Oddly enough, my blog post about running a game server on EC2 turned out to be perfectly timed, as Amazon launched its new CloudWatch, Elastic Scaling and Load Balancing services on Sunday. And since, as I discussed earlier, I have been looking at ways to monitor the usage of my EC2 game server, I somehow find myself on the bleeding edge of the cloud.

Why CloudWatch?

As I discussed in my previous post, setting up monitoring on an EC2 instance wasn’t that hard to do. However, it did come with some drawbacks:

Maintenance “ Although it can be fun to install new software and learn its in’s and out’s, the actual task of upgrading that software, maintaining it, patching it, watching it for security risks, etc, etc is a major pain in the rear end. CloudWatch solves this problem by providing a simple service for retrieving performance data, no maintenance or special setup required.
Granularity “ As I discovered with munin, there are limitations to the frequency with which you can store performance data, not to mention the storage requirements for vast quantities of it. Again, this is hidden from us in the case of CloudWatch.
Performance “ Last but certainly not least, monitoring something usually incurs a performance hit. In my previous article I was sampling data on the same host I was tracking statistics from. The very act of collecting performance data could cause that data to be skewed. Since CloudWatch abstracts this away from individual instances, this is no longer a problem.

Getting Started With CloudWatch

There are quite a few resources available to get you started with CloudWatch. I recommend taking a look at the javascript scratch pad and the other various developer libraries already available (more on this later).

If you really want to get down to the nitty gritty, you should start with the CloudWatch command line interface (CLI). Here are some simple steps to get you started:

Download the EC2 API Tools first (you’ll need them to set up monitoring). Check out the Getting Started Guide for instructions on extracting the tools and setting up the proper environment variables.
Download the CloudWatch API Tools. Check out the included readme for details on environment variable setup.
Start up an EC2 instance like you normally would (see my previous post).
Enable monitoring on your running instance using the EC2 API Tools command: ec2-monitor-instances .
Take a look at the CloudWatch Getting Started Guide for details on the available monitoring parameters, etc.
Run the CloudWatch command mon-get-stats to get some statistics from your running instance (mon-get-stats “help should give you some examples).

Here are a few things to keep in mind when running the command line utility:

I normally output data to a CSV file so I can create fancy graphs in Excel. Here is an example command (Windows) that delimits stats by comma and outputs to a CSV file: mon-get-stats CPUUtilization –start-time 2009-05-19T21:00:00 –end-time 2009-05-19T22:00:00 –period 60 –statistics Average –namespace AWS/EC2 –delimiter “,” –dimensions “InstanceId=i-2bb5cc42” > stats.csv
Timestamps “ As per the forums, input timestamps are in ISO-8601 format with the default timezone UTC (Eastern Standard Time + 4 hours). Output timestamps are in UTC and cannot be changed (so start thinking in Greenwich Mean Time).
Virtually as soon as monitoring is enabled, statistics are retrieved from your instances. Data is available up to a per-minute frequency and is stored for two weeks.

Writing a Simple Java Monitoring Utility

As much fun as I was having trying to parse and decipher various command line inputs, I was somewhat disappointed in the output. For one thing, there was the time formatting problem. For another, only one set of statistics (CPU utilization, network I/O, etc) were available at one time.

I am not one to do more work than I need to, so instead of setting off to invent an uber-utility for aggregating data, I simply downloaded the Java library for CloudWatch and hacked up some of the sample code until I had a very basic utility for downloading and aggregating the data I wanted. I present it below in case someone finds it useful:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Properties;

import com.amazonaws.cloudwatch.AmazonCloudWatch;
import com.amazonaws.cloudwatch.AmazonCloudWatchClient;
import com.amazonaws.cloudwatch.AmazonCloudWatchException;
import com.amazonaws.cloudwatch.model.Datapoint;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsRequest;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsResponse;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsResult;

public class GrabStats {

    public static void main(String[] args) {
        
        String fileName = "C:\stats.csv";

        String startTime = "2009-05-19T20:00:00";
        String endTime = "2009-05-20T00:00:00";
        
        String[] statList = { "CPUUtilization","NetworkIn","NetworkOut" }; //(%, bytes, bytes)
        
        HashMap&gt; map = new HashMap&gt;();
        
        // grab stats for each stat value
        for (int i = 0; i  stats = getStatistics(startTime, endTime, statList[i]);
            map.put(statList[i], stats);
        }
        
        // write to disk
        try {
            FileWriter fw = new FileWriter(fileName);
            
            // write the header
            fw.write("Date");
            for (int i = 0; i ",");
                fw.write(statList[i]);
            }
            fw.write("n");
            
            // get a date iterator from our first statistic
            Iterator dateIterator = map.get(statList[0]).keySet().iterator();

            while(dateIterator.hasNext()) {
                String date = dateIterator.next();
                fw.write(date);
                
                // get values for each stat at this date
                for (int i = 0; i value = map.get(statList[i]).get(date);
                    fw.write(",");
                    fw.write(value.toString());
                }
                
                fw.write("n");
            }
            
            fw.close();
        } catch (IOException ex) {
            // error storing data
            System.out.print("Error writing file: " + fileName);
        }

    }

    // define the cloudwatch service (should be a singleton)
    private static final String _accessKeyId = "";
    private static final String _secretAccessKey = "";
    private static AmazonCloudWatch _service = new AmazonCloudWatchClient(
            _accessKeyId, _secretAccessKey);

    public static HashMap getStatistics(String startTime,
            String endTime, String statName) {
        HashMap map = new HashMap();

        // build the request with some defaults
        GetMetricStatisticsRequest request = new GetMetricStatisticsRequest();
        ArrayList stats = new ArrayList();
        stats.add("Average");
        request.setStartTime(startTime);
        request.setEndTime(endTime);
        request.setPeriod(60); // statistics every minute
        request.setMeasureName(statName);
        request.setNamespace("AWS/EC2");
        request.setStatistics(stats);

        try {

            GetMetricStatisticsResponse response = _service
                    .getMetricStatistics(request);

            if (response.isSetGetMetricStatisticsResult()) {
                GetMetricStatisticsResult getMetricStatisticsResult = response
                        .getGetMetricStatisticsResult();
                java.util.List datapointsList = getMetricStatisticsResult
                        .getDatapoints();
                for (Datapoint datapoints : datapointsList) {
                    map.put(datapoints.getTimestamp(), datapoints.getAverage());
                }
            }

        } catch (AmazonCloudWatchException ex) {

            System.out.println("Caught Exception: " + ex.getMessage());
            System.out.println("Response Status Code: " + ex.getStatusCode());
            System.out.println("Error Code: " + ex.getErrorCode());
            System.out.println("Error Type: " + ex.getErrorType());
            System.out.println("Request ID: " + ex.getRequestId());
            System.out.print("XML: " + ex.getXML());
        }

        return map;
    }

}

Conclusion

The CloudWatch tools and utilities are nothing less than I’d expect from Amazon. Everything worked as expected, the documentation was well put together and there were no real surprises with the API. Overall, I am very satisfied with the finished product of my meager efforts.

There are, of course, a few shortcomings:

It would be nice to have more statistics available (memory usage being the main one I’m thinking of). Having the ability to define and collect your own statistics via an API would be even better. Since the API already has a flexible way of defining statistic and type, I have to assume this is coming.
Output visualization is certainly lacking. It would be great to see someone hack a Google Chart generator into the javascript scratch pad (given my lack of copious amounts of free time, this person won’t be me).
Adding some statistic collection and enablement to ElasticFox would certainly make things easier to set up and administer.

I have to assume these drawbacks will be addressed in future updates, as they have been in the past. I am willing to accept them as the price to pay for being on the bleeding edge of the cloud.

Direct Web Remoting, jQuery and Tables

2009-05-11T00:00:00+00:00

I recently came across a project where I had a need to display the results of a large SQL query in an HTML table using Java. Of course, I wanted to paginate it, style it, use AJAX to update it, and avoid the need for bulky toolkits or large frameworks. Oh yeah, I was also on an extremely tight deadline (read: proper coding and design principles were not used).

I looked into a couple of options:

Display Tag “ I used this for another project, but it uses session variables to paginate and doesn’t lend itself well to AJAX updates.
GWT “ One of the guys over at Function1 used it recently and it looked pretty slick. Unfortunately, it seemed like a lot of overhead and a styling headache for simple “display a table” functionality.
DWR and jQuery “ As it turns out, I found a great series of blog posts (part 1, part 2) over at Spartan Java that pretty much laid out a solution to my problem.

As you can see from the posts on Spartan Java, creating the framework to display SQL results using DWR and jQuery was simple, fast, and fairly straightforward. Because the poster makes some assumptions about your knowledge of DWR and jQuery, I would suggest combining the above with the getting started guide for DWR and the jQuery tutorials, if you are unfamiliar with either.

If you know basic DHTML and Java the learning curve should be no problem.

DWR and the Portal

As usual, the code I was writing was eventually destined to show up in the portal. Because jQuery is just a simple javascript library, it works great in the portal without issue. Unfortunately, DWR, like many other AJAX frameworks, has its issues when it is gatewayed.

After combing through the dustier nooks of the documentation, googling profusely, and downloading the source code, I discovered the secret to making DWR work. Since some of you may be thinking “wow, DWR looks like something I want to use in my next portlet”, I thought I’d elaborate:

Add an anchor tag to whatever portlet you will eventually display in the portal. This anchor tag should have an id (let’s call it “gatewaybase”) and it should reference the base path of DWR in your application (this will almost always be /dwr/). So, for any portlet I want to use DWR in, I would always have the following (this is in a JSP, you might need to change to another base path): script src=”dwr/interface/ClassNameExample.js”>script> >script> a id=”gatewaybase” target=”/dwr/” href=”/dwr/”>a>

The trick to getting DWR working is intercepting its initialization javascript. As it turns out, DWR unofficially supports this, but does not document it. Assuming you’re using jQuery, adding this javascript to either an external .js file, or directly in the page, should do the trick:

jQuery(document).ready(function() {
    if (typeof(PTPortalPage)!="undefined") {
        //TODO: this check won't work if JS in gateway
        dwr.engine._urlRewriteHandler = doInterceptUrl;
    } else if (document.getElementById("gatewaybase") != null) {
        dwr.engine._urlRewriteHandler = doInterceptUrl;
    }
});
    
function doInterceptUrl(data) {
    // this function intercepts http requests from DWR
    // and gateways them using an anchor on the main page
    //TODO: is there a better way? AJAX request for base?
    var rooturl = document.getElementById("gatewaybase").href;
    var nongateroot = document.getElementById("gatewaybase").target;
    
    data = data.replace(nongateroot, rooturl);
    return data; 
};

What this will do is effectively intercept any javascript requests from your page and add a properly gatewayed URL (via that anchor tag you added in step 1) to the HTTP request.

Also note that you may need to modify your web.xml with the following init-param for DWR:

servlet&gt;
  servlet-name&gt;dwr-invokerservlet-name&gt;
  display-name&gt;DWR Servletdisplay-name&gt;
  servlet-class&gt;org.directwebremoting.servlet.DwrServletservlet-class&gt;
  init-param&gt;
     param-name&gt;debugparam-name&gt;
     param-value&gt;trueparam-value&gt;
  init-param&gt;
  
  init-param&gt;
     param-name&gt;crossDomainSessionSecurityparam-name&gt;
     param-value&gt;falseparam-value&gt;
  init-param&gt;
servlet&gt; I'm sure there are probably a million ways to build a better mousetrap when displaying tables with Java, but using the above technologies was quick, easy and rewarding.

Running a Game Server on Amazon EC2

2009-05-10T00:00:00+00:00

Yes, it’s true that I haven’t posted in quite a while. My bad. Hopefully you enjoy this little tidbit, even though it’s my first non-ALUI post on this blog…

Game Night

Recently, after a long workday some co-workers and friends of mine started discussing a “game night”. All of us have jobs, lives outside of work, and are no longer college students, but all of us remember the glory days of Counterstrike, Quake and the like.

Of course, none of us has anything more than a decently performing laptop, and all of us have an aversion to spending money. And so it was that we happened upon a game called F.E.A.R. Combat. Perhaps its a bit long in the tooth, and perhaps it is behind the times, but it sure is fun, and it sure is FREE.

The point of that longwinded story is that every other Wednesday has become game night, or more specifically, F.E.A.R. night. And since we are all computer geeks, and we all work in the web technology world in some way or another, someone brought up the idea of running a F.E.A.R. instance on Amazon’s EC2.

Recently, I had a bit of time on my hands and an urge to try it out, and thus this post was born…

Starting and Connecting to an EC2 Instance

First, I signed up for Amazon EC2 (actually, I had already signed up when I wrote this blog post). The invaluable Getting Started Guide contained all the basics I needed to start instances, make images, sign up, etc..

Next, I made sure to download the ElasticFox plugin for Firefox. This makes managing and running EC2 instances much easier. If you want to get started quickly, here is a great Getting Started Guide for the plugin.

After installing and setting up ElasticFox, I was ready to start up a base image. I chose to run an Ubuntu image, since package management and documentation is readily available. This site has a few base AMI’s which I used to get started. I simply searched for the AMI ID I wanted and followed the Elasticfox instructions on starting an instance.

One thing I had to keep in mind was that I wanted to allow the proper TCP/UDP access so that people could connect to my server. In this case, I allowed the following ports:

Application	Protocol	Port
SSH	TCP	22
HTTP	TCP	80
F.E.A.R.	TCP	27888
TeamSpeak	UDP	8767

The other thing I did, in order to keep things simple for future connections, was associate a static IP with my running instance (these are called Elastic IP’s in EC2 parlance). The procedure is mind-numbingly simple in ElasticFox, so I’ll refer you to the Getting Started Guide if you need more information on how to do it.

At first, I had some issues actually connecting to my image using SSH (Elasticfox will auto launch an SSH client). The problem ended up being that I was using Putty for SSH and it does not recognize the private key format used by EC2. Doh. Fortunately, you can convert your keys using Puttygen. Amazon was nice enough to dedicate an appendix in their Getting Started Guide for this exact problem.

Problem solved.

My next steps were the steps you’d take to install and configure any server so that it could host F.E.A.R. Combat, a TeamSpeak server (an in-game voice communication server), and a Munin monitoring instance (so I could get some stats to see how well EC2 performed in a real world scenario).

Preparing for the Installation

After everything was running, I wanted to make sure I had the prerequisites to run F.E.A.R. and install any optional components. As it turned out, my Ubuntu instance was fairly locked down. In order to download/install what I needed, I had to update my sources list to included the multiverse and universe repositories.

Once this was done, I updated the list of installable applications via:

apt-get update

And installed some C++ compatibility libraries for the dedicated server via:

apt-get install libstdc++5

At this point I was all set to install the base components of my server.

Installing and Configuring F.E.A.R. Combat

My first step was to download the F.E.A.R. dedicated linux server here.

Since I already had the prerequisites installed (see above), all I had to do was extract the archive to disk, modify the included start.sh to my liking (I used a custom configuration via the “optionsfile argument, used nohup to prevent it from shutting down accidentally, etc), and start the server.

Installing and Configuring TeamSpeak

TeamSpeak is an in-game voice communication server. Since my game night buddies are mostly remote, I figured it would be nice to provide some voice communication for trash talk and strategy.

I logged in as root and ran:

apt-get install teamspeak-server

A teamspeak user was added, the server started and I was ready to rock and roll. As for configuring the server… it seemed to work okay, so I didn’t bother =). However, you can find some instructions for configuration here.

Installing and Configuring Munin

Munin is a monitoring tool that allows you to capture CPU, memory, process data, and all kinds of other stats in 5 minute increments. It can be used for monitoring many systems with many kinds of statistics, but that is outside of the scope of this post. For now let’s just say I wanted a simple way to capture statistics for my AMI.

The installation also turned out to be very simple. It involved using apt-get to install apache and Munin. Rather than regale you with the details, I’ll just point you to this simple tutorial.

Note: I did have some issues getting Munin to work at first, but once I made sure my local node was listening on the loopback adapter only, it seemed to work. See Section 1.3 (Configuring the Node) of the tutorial for details.

Creating and Registering an AMI

At this point I had everything I needed to run a game server. I tested client connections to my Teamspeak host, Apache server hosting Munin, and the F.E.A.R. server itself and everything worked great.

The only problem was that if I ever shut down the running instance, all of my work would be gone and I would have to re-install everything the next time I wanted to host a game. Thus, I needed to create an AMI from my base image.

The procedure for this was relatively simple, and well documented in the Getting Started Guide here. However, there are a few things you might want to know before you dive in:

You’ll need at least a basic working knowledge of Amazon S3, since you’ll need it to store your finished AMI. I suggest grabbing S3Fox and using it to create an Amazon S3 bucket. This process is fairly simple, but still a minor annoyance.
The base image I used did not have the EC2 API tools installed on it, which meant that I could not register my EC2 instance without installing them. I did this by running:

apt-get install ec2-api-tools

After that, all I needed to do was set my JAVA_HOME environment variable and follow the rest of the Getting Started Guide.

Final Thoughts

Security

As you probably noticed, the configuration on my AMI is hardly secure. I ran things as root, didn’t bother changing passwords or restricting IP’s, etc, etc.. I offer no excuses, save my own laziness.

However, the nice thing about an AMI is that it is only going to be used on game night for a few hours. I’m hardly worried about being hacked. Any time there is a problem, all I have to do is terminate the instance and boot up another AMI. Since nothing is persistent, and there are no credentials on the box, this is great.

Imagine if I had set up a dedicated server for this. I’d have to worry about all kinds of hardening due to the longevity of the configuration. Yuck.

Going Further

Of course, as always, there are some things I could have done that would have taken this post further:

Capturing “real” usage stats and anecdotal performance data (is this a feasible, reliable, and cost effective solution?). This will probably follow in a future blog post (after the next “game night”).
Writing a wrapper for the AMI so that it can be started and stopped on-demand via the web. Someone could definitely write a dedicated hosting web site if they could figure out all the possible licensing restrictions.

Otherwise… that’s about it. After reading this you should be in a position to create your own AMI’s using EC2. The overall experience for me was rather pleasant, though there were some things I think Amazon could have done to simplify the process.

Adventures in Search - Part 4 - Search Node Operation

2008-12-08T00:00:00+00:00

You know, it’s funny how some things can seem extremely complicated and then when you crack them open they turn out to be fairly easy to understand. Remember the mystery behind how a G.I. Joe stayed together, but then you broke one and found it was simply a rubber band holding his guts together? Turns out search server is much like that. A terribly complicated-seeming C program that, fundamentally, is held together by a rubber band.

What is a Search Node?

From my previous posts, you’ve probably inferred that search nodes are the fundamental building blocks of ALUI’s search capability. In fact, search nodes are actually the *only* building blocks of the search capability. Everything you need to set up a clustered or non-clustered search environment is contained in one simple install, a few directories and an executable.

All this seemingly complicated system amounts to is the following breakdown:

An executable running somewhere listening for requests
An open TCP port that receives text based search queries
Two directories that contain everything search needs to operate: a cluster directory and a node directory.

Here’s a more complicated picture of what I just listed:

Figure 1 - Search Node Architecture

Search Requests

Let’s start with the executable. When you start it up using the command line (from the bin directory in a *nix environment, or via a service on Windows), it uses environment variables to find its various configuration files, starts up a process, opens a TCP socket on whatever port you tell it to, and sits around waiting for stuff to happen.

The “stuff that happens” turns out to also be fairly simple. Search server doesn’t actually know anything about portals, documents or anything else for that matter. It sits around and waits for one of two things:

An index request (put some information into the search index so it can be searched for later)
A search request (search for something in the current index)

These two things are specified in a text-based custom language over a TCP port. What I mean is that you, Joe Six-pack, could open up a telnet session to your search server port and type a search query (index or request) freehand, were you so inclined. You would type something like the following:

( FIELDALIAS ptsearch,[2]PT1,[2]PT1_en,[0.1]PT2,[0.1]PT2_en,[0.1]PT50 ) (((ptsearch:a) TAG phraseQ OR (ptsearch:a*) TAG nearQ) AND ((subtype:"PTCARD")[0])) AND ((((@type:"PTPORTAL")[0]) OR ((@type:"PTCONTENTTEMPLATE")[0])) AND (((ptacl:"u2") OR (ptacl:"51"))[0]) AND (((ptfacl:"u2") OR (ptfacl:"51"))[0])) 

METRIC logtf [1] RESULTS 10 PRINT FIELDS parentids,ptacl,ptfacl,PT51,PT56,@type,subtype,ancestors,PT58,PT7,PT53,abstracttype,

PT1,PT1_en,PT2,PT2_en,PT3,PT4,PT5,PT6,PT8,collab_properties,collab_project_url,collab_project_name,collab_icon_alttext_index,collab_acl,publisheduser,portletid TERMS 10000 results[1-10] KWIC 15

Obviously, this kind of a query isn’t very pretty or intuitive, but the point is you could type it via telnet and search server would spit out an XML formatted response to your query. You can see these types of queries in your search node logs if you set your logging levels high enough. Lucky for you, the search API takes care of all of this heavy lifting and converts those XML results into the pretty HTML you see when you perform a search in the portal.

Building a Search Index

“Okay Ross,” you’re probably thinking, “I can run search queries over telnet to see what’s in my search index. That’s all well and good, but how does all that junk get in the index in the first place?”

How indeed. As I mentioned above, that junk gets in there via an index request, which is much like a search request (runs over a TCP port, follows a specific querying language), but allows whoever or whatever to put information into search instead of extract it.

If you look closely at your Publisher content.properties file, Collaboration config.xml file or even at the portal database (PTSERVERCONFIG table), you will see an “Indexing Search Port” and “Indexing Search Host” specified. What these values really do is tell each product (Portal, Publisher, Collaboration) where to submit their new document data (i.e. when someone publishes something, uploads something to a project, or a crawler runs). That data is submitted over the same TCP port to the same type of node that handles queries.

How an Indexing Request Works

Here’s a brief explanation followed by a couple of pictures:

An index request is submitted to a search node. Since that search node may be part of a multi-node cluster, the request goes straight to the cluster file system (remember, all nodes share this directory).
The request is assigned a transaction ID and added to a queue on the cluster (you can see this in the form of the requests folder in the cluster folder of your search node).
Every search node in the cluster independently maintains its own transaction ID, which corresponds to the last index request it processed. These nodes continually poll the shared requests folder. If they find a transaction that has a higher ID than the one they maintain, they pull the information for that transaction and add it to their local search index. They then update their local transaction ID to match the transaction they just processed.

You can actually see this process in real time by amping up your search logs and watching the transaction ID’s increment when you upload a collab document, create and admin object, etc.. Here’s a few Powerpoint diagrams I created of this process:

Figure 2 - Adding an index request to the cluster’s transaction queue.

Figure 3 - Updating a local search index from the transaction queue.

Conclusion

As far as node operation goes, that should clear up most of the mystery. At this point, you should understand most of the how’s and why’s of search operation. The last piece to this puzzle is the “checkpoint” feature, which I’ll review in the final exciting chapter of this blog series.

Using diff to avoid re-importing PTE's

2008-10-23T00:00:00+00:00

Let’s take a quick timeout from Search for a more basic post…

I don’t have a “Cool Tools” section of my blog, like some other notable ALUI bloggers, but I do know of a few “cool tools” that have helped me do my job. One of my favorites is a fancy diff utility called WinMerge.

(go download it now if you haven’t already)

One of the primary things I use it for is validating product upgrades. If you’re as lazy and/or paranoid as I am, you have probably given pause during an ALUI upgrade when you saw the step “re-import the PTE”. As most of us know, re-importing a PTE is a mixed bag, as it comes along with a lot of dependencies and can frequently wipe out customizations to web services, portlets, etc. Worse yet, you never quite know what’s happening when you import.

What if we could analyze a PTE and figure out what changes were made so that we could either:

make the changes ourselves
not bother re-importing
at least know what changes were going to be made to our existing data?

Turns out this is rather simple (and, obviously, involves WinMerge).

Let’s use a relevant example to demonstrate: a Publisher upgrade from 6.4 to 6.5. This is an upgrade of a minor revision number, so you would think there would be relatively few changes to the PTE’s. Nonetheless, the install guide tells me to re-import, re-import, re-import.

Yuck.

Instead, I’ll take an alternate approach. First, I run the Publisher 6.5 upgrade installer as I normally would. However, once I get to the re-import step, I navigate to the ptcs/6.4/serverpackages directory of my previous Publisher install and grab the publisher.pte file therein. Next, I grab the same PTE file from my ptcs/6.5/serverpackages directory.

Now I have both default install PTE’s. Any differences between them will be the changes due to the 6.4 to 6.5 upgrade. Since these PTE’s are really just XML files with fairly obvious naming conventions, I simply open them up side by side in WinMerge and compare the differences…

As it turns out, the only changes to the Publisher package in 6.5 are some /jspell URL’s that have been added to the gateway settings for some web services. Since I can read the new URL in WinDiff, I can copy the gateway URL’s and add them manually. Now I no longer need to import the PTE.

… and even if there were more changes and I had to re-import, I would be well informed of what they were before running the import.

Okay. We now return you to your regularly scheduled programming.

Adventures in Search - Part 3 - Search Administration

2008-10-20T00:00:00+00:00

Here we are, back again for another installment in my new blog “mini-series” about search. When I first started researching these posts (er… presentation, actually) the mini-series might have been more aptly titled “Lost” (not to be confused with ABC’s hit series, except for the mass confusion and never ending storyline).

Last time I promised some hard-hitting dirt on Search Administration, and as always, I deliver on my blog promises. Okay, maybe hard hitting is a bit of a stretch… let’s talk about Search Administration. Most of you are probably familiar with the Search Cluster Manager and Search Service Manager in the Administrative Utilities drop down, but what are they and how do they work?

Let’s start tackling this with a diagram:

This diagram represents the end-all be-all of the search administration process. There are two parts:

Portal communication with a search node directly. This is the *Search Service Manager *(left side of the diagram). It is basically the portal asking the node about the health and topology of the search server and the node replying with this information. This node is extremely important, since it tells the portal front end how and which search nodes to query. The query is performed over the same port as any other search request, using the same mechanisms, and will show up in your search logs if you have them at a high enough verbosity.
Portal communication with the search topology indirectly. This is done via the *Search Cluster Manager *(right side of the diagram). I have heard much rumor and hearsay regarding the Search Cluster Manager, so let me clear up any misconceptions you might have with a properly bolded and formatted statement:

The Search Cluster Manager is a Java web application that reads and writes files on the Cluster File System.

What this really means is that the Search Cluster Manager is totally unnecessary. All administration can be done with the cadmin tool (in your search server’s bin directory) or via direct changes to specific initialization files (this is what the Search Cluster Manager does, anyway). So basically, the diagram above actually looks like this:

Wrap Up

So that’s it. Basically, the take-away’s here are:

Search Cluster Manager is simply a prettied up version of the command line utility and does not need to run for search to function in the portal.
Search Service Manager controls the contact node and determines search topology for the portal front end.

Pretty simple, eh? Next up… some more interesting details on node operation.

Adventures in Search - Part 2 - Search Architecture

2008-10-15T00:00:00+00:00

Breaking Down a Search Collection

Last time I listed the various functions of search and reposted my first search slide. It was fairly simple, just an abstract “Search Collection” diagram. This time let’s break that diagram down a bit more:

What we see above is a less abstract view of the same diagram. Instead of one giant “Search” lump, we actually have an API, which makes the communication decisions, and a collection of search nodes. These nodes are just processes running somewhere, listening on a specific port. More about them later.

Partitions

That was pretty simple, right? Let’s throw in one more wrinkle before moving on to the complicated bits: Partitions. A partition is simply a grouping of search data into a set of nodes. Applying that concept to the above diagram, a partitioning of our search collection might look something like:

In other words, some of the data indexed by search (search results) will reside in Partition 1 on Node 1, and some of the data will reside in Partition 2 on Nodes 1 and 2. If we draw out the partitions in a more abstract manner, they look like this:

As you can see, there are two separate “bins” of data. When new information is indexed it goes into one of these two bins. It is important to note that neither partition contains duplicate data, so when you search for something the results from Partition 1 and Partition 2 must be aggregated together. Duplicate data will, however, exist on Nodes 1 and 2 in Partition 2 (see above).

Search Coordination

With all this data moving about, being partitioned, searched, etc, you may be wondering how all of the search nodes communicate with one another. How do they know which partition they belong to, which node they are and what data has already been indexed?

The answer, it turns out, is extremely simple. They all must share at least one common set of files and directories, which I’ll call the “Cluster File System”. There is no special port-to-port communication, magic pixie dust, or any other way for search nodes to talk to each other. The cluster file system contains configuration information about the entire search topology, as well as a common queue/locking mechanism for incoming search indexing requests (more detail later). In other words, our previous diagram now looks like this:

And that’s really all there is to it. I’ve just covered all of the concepts you’ll need for a basic understanding of search.

Wrap Up

Alright, well we’ve covered the basics, but as you know, I’m never fully satisfied with the basics. Hopefully you now have a base understanding of search operation and are ready to stick with me for the under-the-covers part. Most of the information I’ve provided to this point is covered in the docs, just (in my opinion) not very well. Next time look for some more detailed information on how search administration works and under the covers node operation.

Adventures In Search - Part 1 - What is Search?

2008-07-22T00:00:00+00:00

If you’ll recall, a few posts ago I promised to start fleshing out the presentation I gave at Participate in this blog. It’s a somewhat boring task, since I already came up with a presentation, but since I gave the presentation and posted it to my blog, there has been a lot more interest in it than I anticipated.

Apparently, everybody else is just as confused about what search is and how it works as I was. So how about we break out the flashlight and provide a point of reference for the folks who weren’t at Participate, or who prefer reference material to a presentation (I know I am in that camp).

What is Search?

As illustrated by the diagram below, when I talk about search, I simply mean a repository of information. On one side, information about the stuff we want to search is added to the repository and on the other side users or programs query that repository with requests for that information:

However, this is a somewhat simplified version of Search, since it interfaces with our portal in more ways than just the search box in the header.

Here is a comprehensive listing of search uses (that I know of):

Portal Search Box - When you search for things in the portal as a non-administrator.
Administrative Search - When you search for things as an administrator (folders, objects, etc)
Knowledge Directory - All of the folder/document browsing screens are built from the search index, not the database. You can change this in Portal Admin Options, but it’s not recommended, for performance reasons.
Content Crawlers - Every time a new document is submitted, metadata is updated, etc (basically every time a crawler is run)
Publisher - Used only when publishing/saving content. Publisher search is actually just a database query)
Collaboration File Upload - Used when uploading/indexing Collaboration documents.
Collaboration Search - Collaboration search actually does use Search.
IDK Search Factory - When you use the IDK to perform search requests.

That’s really it. In its basic operation, search is extremely simple. Next time I’ll start to delve into search architecture, specifically Nodes and Partitions.

Adventures In Search - Part 1 - What is Search?

2008-07-01T00:00:00+00:00

Since Fabien finally updated his blog with a nice write up on the published content redirect, and I told him I would try and beat him to the punch, I think I now owe a post or two. This one has been sitting unpublished for a few days, so here we go…

One of my favorite parts of portal work is the fact that most portal implementations touch a variety of technologies: different programming languages, a variety of internal systems and many different authorization and authentication mechanisms. One of the most common of these is LDAP, whether it be in the form of Active Directory or some other LDAP server.

Unfortunately, I have the same problem I’d like to think most techies have: if I don’t work with something for 6 months or so, I tend to forget at least half of the important details. And since LDAP integration usually only happens every so often, mostly on new portal installs, I find I tend to forget the details only to have to re-learn them again.

Hopefully this post can serve as a reminder, and perhaps a primer for the uninitiated.

User Synchronization

For every user that logs into the portal, the portal has a record of their account in its PTUSERS database table. No matter that they are in Active Directory, LDAP, what have you, the user still must be listed in this table in order to log into the portal (basically meaning they are a user object in the portal).

The PTUSERS table is updated periodically by jobs run on configured authentication sources which essentially go out to an LDAP directory (or custom directory), ask for any new user accounts or groups and set them up in the portal.

One thing to note about this mechanism is that there is a time lag between when a user is created in a directory and when they can log into the portal. It would be nice if the authentication source were queried if the user was not found, and you can introduce customizations to do this, but OOTB you have to wait on the sync job.

Every authentication source in the portal also has an associated prefix, as well as a set of users and groups. The prefix is like (and can even be) a Windows domain. It is used to distinguish duplicate user names on different authentication sources.

A Quick Reference

Unfortunately, it can be highly confusing when you’re trying to figure out what all the different user properties in the portal are, how they relate to LDAP/AD configuration, and what they actually mean. To that end, I have come up with the following table that (hopefully) explains each mapping in enough detail that you can see how it is built and what it is meant to do.

PTUSERS Column	AD Auth Source Value	LDAP Auth Source Value	User Profile Value	Description
NAME	Auth Source Prefix + User Name Attribute	Auth Source Prefix + User Name Attribute	Display Name	A "throw away" descriptive name for the user. Can be changed with a PWS or manually by the user.
MAPPINGAUTHNAME	User Name Attribute	User Name Attribute	none	A base mapping name for the user (without auth source prefix)
LOGINNAME	Auth Source Prefix + User Name Attribute	Auth Source Prefix + User Name Attribute	Login Name	The name a user has to type to log into the portal on the login screen (including the value that must be in the auth source drop down)
AUTHUNIQUENAME	objectGUID in AD (not specifiable)	User Unique Name Attribute (defaults to DN)	Remote Unique Name	A uniqueness constraint in the directory to tell the portal this is the same user, even if their login or name changes.
AUTHUSERNAME	User Authentication Attribute (usually userPrincipalName to guarantee cross domain uniqueness - not sAMAccountName which is only unique in the domain)	User Authentication Name Attribute (if not specified, defaults to DN - Distinguished Name)	Remote Authentication Name	The property used to authenticate the user with the directory. May not be used for authentication if an SSO solution is in place.

A couple of notes:

The help files on the authentication source configuration files are surprisingly helpful.
If you need to figure out your LDAP/AD structure, see user properties or run queries, I highly recommend this free tool: Softerra LDAP Browser.

See you next time.

Everything You Ever Wanted to Know About Search (But Were Afraid to Ask)

2008-05-15T00:00:00+00:00

As I mentioned previously on this blog, I’m at BEA Participate this week. In order to convince the powers that be to send me out here for free, I gave a presentation on Search Server. As usual, it was based on my initial lack of understanding of the product and subsequent painful discovery of its inner workings.

If you were here, I hope you enjoyed it. If you weren’t, I’ll try to write up a few detailed posts when I get some time. For now, here’s a link to the slide deck.

As for the conference itself, my hat is off to all the BEA folk who pulled it off. It was great to see our customers, partners and BEA employees at the event. It was definitely a reminder of the large number of smart, and fun, people I have the pleasure of working with on a regular basis. And a special kudos to BDG, who pulled together a really slick portal for the conference.

Keeping Track of Your Job Logs

2008-04-25T00:00:00+00:00

Over time, there are a lot of bad things that can happen to a portal installed at a customer site. Unfortunately, philosophically speaking, we are all fighting entropy in our daily lives (think about how many times you’ve done laundry or taken out the trash). Portal maintenance is just another way of doing that.

Every portal instance that’s running properly has at least one Automation Server running in the background. The Automation Server is supposed to take care of automated tasks like Analytics data collation, the periodic synchronization of users and groups, and system maintenance.

The cool thing about Automation Server is that it will save a log of each job’s results so that you can view it later. These results are stored in the database in the PTJOBLOGS table. The uncool thing is that sometimes the job results are many pages long. So many pages, in fact, that a few runs can start to eat up space in your database at an enormous rate. There are a few things you should know about this, and a few things you should know about how to mitigate it.

Managing Job Log Space Usage

First, let’s review your space-saving options:

You can reduce the frequency of jobs that have verbose output. This is probably a crappy option, unless you really don’t need to run the job in question.
You can reduce the verbosity of your jobs. Every job in the portal contains a setting called Logging Level. This setting allows you to control what actually shows up in the result log for the job. Your options are Silent, Low, Normal and Verbose. Obviously, setting a job to silent can have a detrimental impact if it fails (you won’t know why), whereas setting it to verbose can have a detrimental impact on the amount of space it eats up. Some jobs don’t always need a high level of verbosity.
You can increase the frequency at which your job logs are cleaned up. By default, the portal will save job logs for 60 days before they are removed from the database. The* Weekly Housekeeping Job* will remove any job logs older than 60 days each time it is run. In my experience, this is quite a long time. Unfortunately, there is no way to change this value in the user interface of the portal. You can, however, change it in the database with the following SQL:

UPDATE PTSERVERCONFIG SET VALUE= WHERE SETTINGID=15

Where ** is the number of days you’d like to keep your job logs for. Reducing this number will reduce the size of your PTJOBLOGS table. (do I need to mention you should double check the SETTINGID and back up the table before running this?)

What happens when the job log table gets too big?

If you haven’t mitigated the size of your PTJOBLOGS table, or you somehow forgot to schedule the Weekly Housekeeping Job for a while, or you have some other space problem in your database, you may run into some issues:

Your Weekly Housekeeping Job may fail. Unfortunately, this job tries to run a query on PTJOBLOGS and then delete the necessary rows. If you have an enormous number of rows, the SQL statement it uses to do this is a long running operation which may actually cause automation server to think the Weekly Housekeeping Job has become unresponsive and kill it (it will show up as Failed in your job history).
Your database may have space issues.

To correct these problems, you can do one of two things. You can either truncate the PTJOBLOGS table (not recommended, but possibly necessary if you’re in dire straits), or you can run something like the following (this is PL/SQL):

DELETE FROM PTJOBLOGS WHERE INSTANCEID SELECT MAX(INSTANCEID) FROM PTJOBHISTORY WHERE RUNTIME '01/01/2008','mm/dd/yyyy'));
DELETE FROM PTJOBOPHISTORY WHERE RUNTIME '01/01/2008','mm/dd/yyyy');
DELETE FORM PTJOBHISTORY WHERE RUNTIME '01/01/2008','mm/dd/yyyy');

Note that you’ll have to modify this a bit for it to work on SQL Server, and you will need to change the date as appropriate, but you get the idea.

Using Content Expiration to Improve Portal Performance

2008-04-21T00:00:00+00:00

Recently I was asked by a colleague what kind of tips I might have for portal administrators in order to compile a “top 10” list of portal tips and tricks. I hate to possibly ruin the surprise, in case it makes it to the Participate presentation, but one of the tips I often give people is to enable content expiration on their image server.

The portal has a ton of images and javascript that get provided to a user’s browser on each request, and this is not necessarily a good thing (check out Yahoo’s YSlow analysis of it). Luckily, those images/javascript don’t change very often. Thus, we can normally tell the user’s browser to cache those images so it doesn’t have to ask for them every time.

Sometimes, especially on intranet portals, we can cache those images for days at a time. This is called configuring content expiration. Although it doesn’t improve “real” performance, it sure reduces the amount of round trips someone’s browser has to make, thereby improving perceived performance. Here’s a few links to give you specifics on configuring it in your image server of choice (courtesy of Google, of course):

How to configure content expiration in IIS

How to configure content expiration in Apache

And if you’re looking for more details on caching/performance improvement, this is an interesting article.

Speaking of Participate, I’ll be giving a (fairly technical) presentation on Search Server, so be sure to give me a heads up if you’ll be attending.

Integrating Amazon S3 with the Portal

2008-04-10T00:00:00+00:00

Once again, my blog posting has been sparse for the past few weeks. But, as the old adage goes: good things come to those who wait. As you can see from the title of this post, good things come in the form of integrating your portal with Amazon’s S3 web service framework. Hopefully you think that’s cool. Otherwise you may as well stop reading right now.

Okay. For those of you still with me… down the rabbit hole we go…

What is S3?

Amazon S3 is a recently released pay-as-you -go “Simple Storage Service”. Hence the alliterative S3 moniker. Simply put, rather than worrying about your storage needs by buying more disk, you open an account with Amazon and they provide you with an unlimited REST/SOAP based interface to store as much content as you want. You pay for uploads/downloads and storage space (prices are on the main page, or you can check out this calculator). Relatively speaking, the pay as you go model works great for a rapidly expanding site or for those who don’t want to deal with the maintenance headaches of keeping up their own storage space.

In terms of the portal, integration with S3 means a place we could store an infinite amount of document data. Ideally, this would be for any of the core ALUI portal services: Knowledge Directory, Collaboration, Publisher, etc.

But how could we go about accomplishing this…?

Integration Through the Document Repository

Ah yes, my old friend the Document Repository. If you’ll recall from previous posts, the repository is just that: a central place for storing document data in the portal. So what if we could somehow create our own repository, or modify the existing one, to upload our documents to Amazon S3 instead of the file system?

As it turns out, I’m guilty of a little unintentional foreshadowing. If you read my comments on repository configuration in Part 1 of Deconstructing the Document Repository, you’ll see a mention of the possibility of implementing other types of providers.

And guess what? that’s exactly what I ended up doing.

Ah, but it’s never quite that simple…

Unfortunately, in the course of writing this article I discovered a bug in the document repository. Apparently nobody’s ever written another provider for it. How do I know that? Well, there’s an explicit cast to the FileSystemProvider in one of the basic classes that enable DR operation (yes, I realize that sentence is entirely technical mumbo jumbo). In order to implement your own provider you have to patch the class to get it to work.

Hopefully, I can convince the guys in engineering to fix this minor bug, but until then I’ve included a patched version below, along with some install instructions:

Download this file here.
Realize it’s a .class file and think “what the heck are you having me do to my DR, Ross?”
Follow these instructions anyway.
Unpack your $PT_HOME/ptdr/6.x/webapp/dr.war file with your favorite zip editor (you’ll be doing this later, anyway).
Open WEB-INF/lib/dr.jar (I have previously recommended WinRAR for these types of things).
Replace the exact same file under complumtreedrtransportglueserver.
Repack everything, restart and test the document repository as a sanity check. You should be good to go.

(P.S. – I don’t have original DR source, so this is decompiled and recompiled code. Did I mention this blog should come with a disclaimer?)

Signing Up For S3

The next step in this process is to get yourself an Amazon S3 account. Since it costs money, I’m not going to just go and provide you with mine (nice try though). Simply sign up here and note your access key and secret key in the email they send you (you’ll be using them later).

Now you have two choices… you can continue with me to the How it Works section, or you can skip right ahead to the How to Install section.

How it Works

A glutton for punishment, eh? Alright. Here we go…

Managing S3

First things first. There’s a great open source Java toolkit called JetS3t. It abstracts all those fun SOAP/REST calls you might otherwise be making and gives you straight up Java objects to play with. I highly recommend using it if you’re planning on any S3/Java development (there are others for .NET, etc). Here’s some links for you to play with if you want to know more:

Out of the box, JetS3t just works: it uses REST and *HTTPS *calls only, so security is fairly good. You can reconfigure it by looking at the advanced configuration guide, should you so choose. You can also take a look at the additional jar’s it requires (apache commons) and go from there.

Should you want to manage your S3 account (and included files) without the benefit of actual programming, I highly recommend you grab the S3Fox Firefox Plugin.

Creating a Repository

Before I go any further, let me provide you this zip file of the appropriate jar’s for this project. The only code I’ve written is contained in s3provider.jar. The source I’m about to explain is also included therein.

The source itself is fairly simple. The DR allows us to implement a handful of interfaces (one for a document, one for a repository and one for a factory class that creates the repository). After that, all we have to do is manage where our documents go, how they are named, and implement the requisite functions of each class.

Really, there’s not much to the source. I simply generate a new document when asked by the DR, open output/input streams to new or existing documents using their ID’s, and generate new GUIDs for new documents after they are uploaded. I used a couple of the pre-existing temp file management classes to make my job even easier. All in all, the hardest part was understanding how the interfaces were supposed to work without any documentation.

The GUID’s I used for unique document naming on upload were generated using the Java Uuid Generator, which is an open source native Java implementation that worked quite well for my purposes.

Perhaps you were expecting more complexity? I was fairly impressed with the DR’s flexible implementation. Actually, I initially started this project by rolling my own “document repository”, but it seemed excessively complicated and I ended up sniffing around the DR source to see if I could do something easier. Turns out I could.

How to Install

Got bored of the How it Works Section, didn’t you?

This will take a bit of effort, and faith, on your part, but I assure you it’ll be worth it in the long run (oh yeah, you might want to back up these files before you start messing with them):

First we need to add the appropriate jars to the war file:
1. Unpack your $PT_HOME/ptdr/6.x/webapp/dr.war file with your favorite zip editor.
2. Extract this zip file to your hard drive and add the contained jars to the war file’s WEB-INF/lib directory. See the How it Works section for details on what these jars do.
3. Re-pack the war file.
Open your $PT_HOME/ptdr/6.1/settings/config/dr-server.xml file and get ready to start fiddling.
Now, set up the provider under your desired application node to amazon: amazon. The cool thing here is that you could register one or all of your document repository services to work with S3. Simply change the entries for ptcollab, ptupload, etc from the file system provider to the S3 provider.
Next, you need to configure the provider. Add a provider to the providers section of the configuration file, like so:
amazon true com.plumtree.dr.provider.amazon.S3Factory ptupload false XXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Note that you will have to add an application section for each application you changed in step 2. However, configuration is simple enough. You will really only need to change three things:

encrypted - Are your AWS access key and secret key encrypted? Obviously in a production environment they should be, but for testing purposes I normally leave them in plain text.
awsAccessKey - your Amazon S3 public access key
awsSecretAccessKey - your Amazon S3 secret access key

You can encrypt your keys by using this utility I have provided (extract zip, run bat file or convert to shell script). Incidentally, this can also be used to change your DR passwords (but don’t forget to change them on both sides of the wire).

So Here We Are

At this point you should have a working repository that integrates with an unlimited, web-based file system. If you’re a dork like me, you think this is great and are glad you just spent a couple hours of your free time figuring out how to set it up. Otherwise, you probably didn’t make it through the entire article.

Amazon S3 seems to have some amazing potential. The entire field of distributed services is an exciting area that could eventually change the way we think about corporate IT, and even the way we do business (wow, I just sounded like a marketing bobble-head for a second there, didn’t I?). Hopefully this post will give you some interesting insight into ways you can leverage your portal implementation with some of these new technologies, or at the very least, has provided you with some information on S3.

Querying the Portal Database using the Server API

2008-02-29T00:00:00+00:00

I love the ALUI Server API. It’s robust, fairly easy to use, cross-platform, and powerful. Unfortunately, it still has its limitations. One of the biggest limitations is the same limitation that plagues most object models built on top of a database layer: its inability to be a database.

No matter how brilliant the design of the object hierarchy, there will always be situations where running a SQL query to get information would be a lot simpler. Recently, I ran into one of those situations and was lucky enough to know how to circumvent the “rules”. What follows is an analysis of that situation.

As usual, this post has me thinking I should just put “this is not supported” in my blog description.

The Problem

I was recently asked, “How can I get a list of all the portlets currently in use in my portal?” In other words, could I come up with a table of pages, their parent communities, and the portlets on them for the entire portal.

Using the server API, creating such a table is possible via the following logic:

Loop through all communities in the portal
For each community, get a list of pages
For each page, get a list of portlets.
Look up each portlet name and ID.

While this is certainly do-able, you don’t need big O notation to see the embedded for/each statements, large volume of data, and potential for this code to chew up a ton of CPU cycles and make quite a few database queries. In a production environment, this just doesn’t seem feasible.

But oh, if I only had access to the database. I could write a complex query that would join a few tables together and give me what I want. One simple SQL statement. Here is the statement I would write to produce the described table:

SELECT pagegadgets.GADGETID, gadgets.NAME AS GADGETNAME,
communities.NAME AS COMMUNITYNAME, pages.NAME AS PAGENAME FROM 
PTPAGEGADGETS pagegadgets LEFT JOIN PTPAGES pages 
    ON pagegadgets.PAGEID=pages.OBJECTID 
INNER JOIN PTGADGETS gadgets 
    ON pagegadgets.GADGETID=gadgets.OBJECTID 
INNER JOIN PTCOMMUNITIES communities 
    ON pages.FOLDERID = communities.FOLDERID 
ORDER BY gadgets.NAME, communities.NAME, pages.NAME ASC

Some of you are probably thinking, “How about I just create a remote portlet that directly connects to the portal database?”, which you can do, and some customers have. However, you lose the ability to combine this table with other API code, lose the portable nature of Sever API libraries, and the cross platform capability to execute the query.

How about I show you a way to use the Server API instead?

Casting to an Internal Session

It turns out that this process is much easier than you think. The first part involves getting an internal session object. An internal session is simply a back end class that we aren’t expected to use. It provides a lot of goodies that aren’t available to normal server API IPTSession objects. To get an internal session, we simply need to know about it. This means the following imports:

import com.plumtree.server.impl.core.PTSession;

import com.plumtree.server.impl.core.InternalSession;

and casting an IPTSession object like so:

InternalSession iSession = ((PTSession) session).GetInternalSession();

Congratulations. You have an internal session object. Intellisense will show you all kinds of undocumented goodies related to this object. Given the post topic, today we are mainly interested in the database querying ability…

Running a Query

I could write something long and witty to explain the rest of the code, but it hardly seems necessary. Here is the entire source (Java) for a tag which does as described. The SQL is hard coded into the example:

package com.bea.services.tags;

import com.plumtree.openkernel.db.IOKDBCursor;
import com.plumtree.openkernel.db.IOKDBResultSet;
import com.plumtree.openkernel.db.IOKDBRow;
import com.plumtree.portaluiinfrastructure.tags.ATag;
import com.plumtree.portaluiinfrastructure.tags.TagType;
import com.plumtree.portaluiinfrastructure.tags.metadata.*;
import com.plumtree.server.*;
import com.plumtree.server.impl.core.InternalSession;
import com.plumtree.server.impl.core.PTSession;
import com.plumtree.taskapi.portalui.TaskAPIUIUser;
import com.plumtree.uiinfrastructure.activityspace.AActivitySpace;
import com.plumtree.xpshared.htmlconstructs.PTStyleClass;
import com.plumtree.xpshared.htmlelements.*;

public class PortletLocationTag extends ATag {

    public static final ITagMetaData TAG = new TagMetaData("portletlist",
            "This tag lists portlet locations.");

    public static final OptionalTagAttribute PORTLETID = new OptionalTagAttribute(
            "portletId", "A specific portlet ID to query by.",
            AttributeType.STRING, "");
    
    public static final OptionalTagAttribute MAXROWS = new OptionalTagAttribute(
            "maxRows", "The maximum number of rows in the query.",
            AttributeType.STRING, "500");

    private static final String PORTLET_LOCATION_QUERY = 
        "SELECT pagegadgets.GADGETID, gadgets.NAME AS GADGETNAME, "
            + "communities.NAME AS COMMUNITYNAME, "
            + "pages.NAME AS PAGENAME FROM "
            + "PTPAGEGADGETS pagegadgets "
            + "LEFT JOIN PTPAGES pages ON "
            + "pagegadgets.PAGEID=pages.OBJECTID "
            + "INNER JOIN PTGADGETS gadgets ON "
            + "pagegadgets.GADGETID=gadgets.OBJECTID "
            + "INNER JOIN PTCOMMUNITIES communities ON "
            + "pages.FOLDERID = communities.FOLDERID ";
    
    private static final String WHERE_PORTLET_ID = "WHERE gadgets.OBJECTID = ";
    private static final String ORDER_BY_GADGET = " ORDER BY gadgets.NAME, communities.NAME, pages.NAME ASC";
    private static final String ORDER_BY_COMMUNITY = " ORDER BY communities.NAME, pages.NAME ASC";

    public ATag Create() {
        return new PortletLocationTag();
    }

    public TagType GetTagType() {
        return TagType.SIMPLE;
    }

    public HTMLElement DisplayTag() {

        if (!hasAdminAccess()) {
            return null;
        } // they don't have access

        // create a table for our result set
        HTMLTable result = new HTMLTable();
        result.SetWidth(CommonHTMLStrings.ONE_HUNDRED_PERCENT);
        result.SetBorder(CommonHTMLStrings.ZERO);
        result.SetCellPadding(CommonHTMLStrings.ONE);
        result.SetCellSpacing(CommonHTMLStrings.ONE);

        // get the user session
        IPTSession session = getSession();
        InternalSession iSession = ((PTSession) session).GetInternalSession();

        // run the query as a cursor
        String query = PORTLET_LOCATION_QUERY;
        
        // build the query based on tag options
        if (getPortletId() &gt; 0) {
            query += WHERE_PORTLET_ID + getPortletId() + ORDER_BY_COMMUNITY;
        } else {
            query += ORDER_BY_GADGET;
        }
        
        // open the cursor and run it
        IOKDBCursor cursor = iSession.CreateCursor(query);
        int maxRows = getMaxRows();
        IOKDBResultSet results = cursor.Open(maxRows);
        
        // build the table of results
        HTMLTableRow tableRow = new HTMLTableRow();
        tableRow.SetStyleClass(PTStyleClass.LIST_SORT_HEADER_BG);

        HTMLTableCell tableCell = new HTMLTableCell();
        tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
        tableCell.SetStyleClass(PTStyleClass.LIST_SORT_HEADER);
        tableCell.AddInnerHTMLString("**Portlet Name**");
        tableRow.AddInnerHTMLElement(tableCell);
        
        tableCell = new HTMLTableCell();
        tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
        tableCell.SetStyleClass(PTStyleClass.LIST_SORT_HEADER);
        tableCell.AddInnerHTMLString("**Community Name**");
        tableRow.AddInnerHTMLElement(tableCell);
        
        tableCell = new HTMLTableCell();
        tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
        tableCell.SetStyleClass(PTStyleClass.LIST_SORT_HEADER);
        tableCell.AddInnerHTMLString("**Page Name**");
        tableRow.AddInnerHTMLElement(tableCell);
        
        result.AddInnerHTMLElement(tableRow);
        
        // output the results
        for (int i = 0; i new HTMLTableRow();
            
            // row coloring
            if ((0 == i) || (((i + 1) / 2) == (i / 2))) {
                // it's even
                tableRow.SetStyleClass(PTStyleClass.LIST_ITEM_TWO_BG);
            } else {
                // it's odd
                tableRow.SetStyleClass(PTStyleClass.LIST_ITEM_ONE_BG);
            }

            tableCell = new HTMLTableCell();
            tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
            tableCell.AddInnerHTMLString(dbrow.GetString("GADGETNAME"));
            tableRow.AddInnerHTMLElement(tableCell);

            tableCell = new HTMLTableCell();
            tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
            tableCell.AddInnerHTMLString(dbrow.GetString("COMMUNITYNAME"));
            tableRow.AddInnerHTMLElement(tableCell);
            
            tableCell = new HTMLTableCell();
            tableCell.SetVAlign(CommonHTMLStrings.MIDDLE);
            tableCell.AddInnerHTMLString(dbrow.GetString("PAGENAME"));
            tableRow.AddInnerHTMLElement(tableCell);
            
            result.AddInnerHTMLElement(tableRow);
        }

        return result;
    }

    private int getPortletId() {
        try {
            return Integer.parseInt(GetTagAttributeAsString(PORTLETID));
        } catch (Exception ex) {
            return -1;
        }
    }

    private int getMaxRows() {
        try {
            return Integer.parseInt(GetTagAttributeAsString(MAXROWS));
        } catch (Exception ex) {
            return 0;
        }
    }
    
    private IPTSession getSession() {
        return (IPTSession) GetEnvironment().GetUserSession();
    }

    private boolean hasAdminAccess() {
        return TaskAPIUIUser.HasAdminLinkAccess((AActivitySpace) this
                .GetEnvironment());
    }
}

Caveat Emptor

As Uncle Ben would say, “With great power comes great responsibility.” The power I gave you above may also allow you to perform INSERT’s, UPDATE’s and DELETE’s, which I strongly caution against. Not only that, but the InternalSession object doesn’t perform all those nifty security checks that happen when we use a normal session (notice the hasAdminAccess function), so make sure you either do your own authentication, or limit the amount of information you provide.

Happy querying.