Exploring

Firefox Caching

2007-09-04T16:29:00.000-04:00

Federico posted about some work he was doing on making Firefox not cache as many uncompressed bitmaps in memory. I was playing around with the cache stuff and noticed something: my Firefox cache is full of youtube videos. YouTube videos aren't exactly the best thing for Firefox to cache. My internet connection is fast enough that streaming the videos works just fine. I suspect that most people who use online video frequently do so on a connection that can support streaming (otherwise, YouTube would be painfully slow, and they'd go do something else).

It turns out that Firefox's cache is based only on least-recently-used. So, let's say you have a 50 MB cache. Right now, all 50 MB of it is full with cached javascript, css, images, etc. You go to youtube and start watching a 10 MB movie. 20% of your cache gets blown away. In all likelihood, you'll never view that video again.

Even worse is if you listen to a flash-based media player. The MP3s that this downloads are cached just like anything else. So if you listen to 50 MB worth of music your disk cache gets blown away.

Probably LRU isn't the best technique to use here. I'm not sure how one would evaluate various choices (what is a representative test set of browsing sessions?)

Yay Dual Homing!

2007-08-30T19:53:00.000-04:00

Today, we had our first drill with dual homing on reCAPTCHA. In Pittsburgh, the water main that serves the Carnegie Mellon area broke today, causing a complete water outage on campus. This has resulted in many servers being shut down. reCAPTCHAs servers were kept up, as they are production servers, however we were told that it was possible they'd be shut down.

It's times like these when you just love having a backup. We have a DNS service that does automatic health checking and routes away from unplanned outages. However, with DNS it takes a few minutes for these sorts of changes to take affect. We proactively switched away our traffic off of the pittsburgh servers.

One of the funny things about using DNS for Dual Homing is how long it takes to really kick in. We're still getting requests to our pittsburgh servers even hours after we made the switch. This is one reason it's important to have DNS not be the only load balancing solution (you need a L7 or L4 load balancer as well)

Facebook 2.0

2007-08-28T15:42:00.000-04:00

So if your profile says you are single, and looking for women, single women looking for men might soon get a higher ranking in search results? I'm not sure what other "intentions" facebook might know about

Of course, this could open up a whole new era of social networking: I'd call if AdFaces. If you feel that you are not showing up often enough in search results, you can bid for clicks on your profile with a CPC model. Or maybe Facebook can experiment with a cost-per-action model. Then we'll need a product like Facebook Analytics to improve profile conversion rate, and FaceSense to allow publishers to embed targeted profile advertisements on their website.

Making bugs... and fixing them

2007-07-17T01:13:00.000-04:00

Two interesting bugs from today.

First, you gotta be careful with order of operations. I wrote this code:

int someValue = ...;
storePref(MY_PREF_NAME, "" + someValue + 1);

The code looks innocent enough. However, order of operations kicks in here. The compiler translates this as: (("" + someValue) + 1), or Integer.toString(someValue) + Integer.toString (1). So rather than adding one, we multiply by 10 and then add one :-). The fun part about this experience was that I had Neal Gafter sitting next to me to explain exactly what I'd done, and also to point out where this problem is discussed in his fantastic book Java Puzzlers (Neal gave me a copy, which I've been meaning to read).

In the "Fixing bugs" column, I was testing something out on IE 5.0 today (yes, five-point-oh, released in 1999. Sadly, it sill has some market share). The box had Google.com as the homepage, and I noticed that it displayed a JavaScript error (for older versions of IE, displaying this error was a default setting). After reporting this, it turns out that it was actually an interaction with Google Desktop. Now, I don't expect that there are that many users with IE 5.0 and Google Desktop, but with millions of users, "not a lot" means thousands or tens of thousands of people.

Yahoo's and Microsoft's CAPTCHAs likely NOT broken

2007-07-10T19:39:00.000-04:00

BitDefender went a bit overboard in their claim about CAPTCHAs. Their statement about CAPTCHAs was issued as a press release (which clearly has meet their goals of getting press -- regardless of the accuracy of their statements). The article states that about 500 accounts are being created per hour. This is about the effort of one person solving CAPTCHAs. If they had actually broken the CAPTCHAs of Hotmail and Yahoo, there would be tens of thousands of accounts every hour. The article also mentions that about 15,000 accounts has been created. At 2 cents per CAPTCHA, that's a $300 investment to manually solve the CAPTCHAs (this rate is easily obtainable in some countries). It's extremely unlikely that one could hire a person to break the CAPTCHAs of Yahoo and Hotmail for this price. Also, if you're working on a virus-type program, one of the easiest ways to generate CAPTCHA solutions would be to use your infected users (eg, make them type in a CAPTCHA once per day. If you integrate it into the web browser, it might not raise suspicion).

The information that BitDefender has published actually suggests that these spammers/virus makers have not beaten CAPTCHAs using OCR

Life at Google

2007-06-27T14:42:00.000-04:00

This blog is pretty funny. It's sort of like what the Daily Show might say about Google -- the facts are mostly true (some are pretty out-dated), but they're twisted in the opposite direction of how things actually are.

The blog entry got me thinking about what I like and don't like about an internship at Google. One of my favorite things is the freedom to set my own hours. I personally have an aversion to waking up any time before 10am. Usually, I wake up, read some blogs, check personal email and reCAPTCHA support email (I can't check Google email remotely as an intern), then I walk to work around 11:30-12:30. Having the free meals every day (I rarely get to take advantage of breakfast, which ends at 9:30) is a huge plus. The blog article hinted at the end how huge of a factor the free food actually is. It's a relatively inexpensive perk that makes a huge difference.

The comments about how the developer's work areas are laid out is also really interesting. The first time I saw the Google layout, I was a bit surprised. "I thought I was getting an office!". I ended up really liking this in the end. Before Google, when working on the Mono project, the primary way to communicate with other people was IRC. When I had to ask a question, sometimes it wasn't always possible to get a response right away. At Google, my coworkers are sitting very close by. I can work something out on a whiteboard with them. I don't have to walk a long way to their office.

One thing the article didn't mention (probably because it's a problem that's worse at MSFT than Google) is that going into a big environment like Google can be intimidating. With open source, building things was always easy. ./configure; make; make install. The process takes about 10 minutes the first time, 2-3 minutes every day, depending on how many changes. At Google (and I'm sure pretty much any place similar), checking things out can be an adventure. A simple build process is probably an advantage of working on an open source project, or at a smaller company.

At the end of the day, the thing I really enjoy about Google is the access to the vast repository of interesting code Google has to offer. Being able to see how a Google product works, under the hood, is just an amazing experience. I remember going snorkeling on an 8th grade trip to the Bahamas. The excitement of being able to see ocean life for the first time is very similar to my experience of being able to look into the moving parts of Google. Surely this isn't something unique to Google. I'm sure there are as many fascinating moving parts inside Microsoft, or many other large companies.

On another note, the reCAPTCHA launch went fantastically well. I was happy and relieved that we didn't have any embarrassing incidents like crashing under the load of Digg (Our servers handled it just fine!). We've had some exciting customers adopting our product. I hope to write more soon.

NYTimes Article on CAPTCHAs

2007-06-11T04:08:00.000-04:00

The New York Times is running an article today on CAPTCHAs. The article really misses some key points. For example, it talks about the CAPTCHAs on YouTube. YouTube's CAPTCHA is really, really bad. The CAPTCHA is mis-designed, using different colors to attempt to provide security. I can't imagine solving this as a color blind user, it must be nearly impossible. Most CAPTCHA providers have migrated to using a monochrome CAPTCHA (for example Google, Yahoo and MSN). The way to create a challenging CAPTCHA today is to make segmentation difficult. This can be achieved without causing as much pain for humans.

Then there's this Asirra thing. Did anybody from the Times actually try it? Here's an unscalled image of what it looks like:

Now, you can hover over an image for a larger version. But now to solve one of these CAPTCHAs, you've got to hover over 12 images, and make a decision on each. Asirra is undeniably cute, but it's not clear that it's all that much easier than the current, well designed, CAPTCHAs. The security of Asirra is also unclear. It'd be interesting to see what happens if Asirra is ever put in front of a high value target (something that can be used to send email, host pagerank-gaining links, or host porn/warez). I have a feeling that some spammer would find a way to abuse a botnet and take advantage of some of the design issues in Asirra.

reCAPTCHA: A new way to fight spam

2007-05-23T16:31:00.000-04:00

You've probably seen a CAPTCHA before. It's those funky letters you have to enter before you sign up for an account on almost any website. I'm proud to announce a new type of CAPTCHA: reCAPTCHA: (click to see a live demo!).

You might notice that reCAPTCHA has two words. Why? reCAPTCHA is more than a CAPTCHA, it also helps to digitize old books. One of the words in reCAPTCHA is a word that the computer knows what it is, much like a normal CAPTCHA. However, the other word is a word that the computer can't read. When you solve a reCAPTCHA, we not only check that you are a human, but use the result on the other word to help read the book!

Luis von Ahn and myself estimated that about 60 million CAPTCHAs are solved every day. Assuming that each CAPTCHA takes 10 seconds to solve, this is over 160,000 human hours per day (that's about 19 years). Harnessing even a fraction of this time for reading books will greatly help efforts in digitalizing books.

reCAPTCHA provides an easy to use API for putting CAPTCHAs on your site. Installing is as easy as adding a few lines of code to your HTML and then making a HTTP POST request to our servers to verify the solution. We also wrote plugins for WordPress, MediaWiki, and phpBB to make it very easy to integrate.

One other interesting service reCAPTCHA provides is a way to securely obfuscate emails. Many sites display emails like bmaurer [at] foo [dot] com or use hacks with tables, javascript or encodings to get the same effect. Spammers are getting smarter and figuring out these tricks. Spammers are especially diligent at working around the strategies of well known open source software. Consider this warning on bugzilla.mozilla.org:

Although steps are taken to hide addresses from email harvesters, the spammers are continually getting better technology and it is almost guaranteed that the address you use with Bugzilla will get spam.

reCAPTCHA Mailhide provides a scalable solution to email obfuscation that can be widely deployed without being breakable. Mailhide provides a way to encrypt a user's email with a key only reCAPTCHA knows. reCAPTCHA will only display the email address when the user solves a CAPTCHA. With reCAPTCHA, I can display my email address as bmau...@andrew.cmu.edu. If you click on the three dots and solve a CAPTCHA, you can see my address. Mailhide provides a way for individual users to encode their email address as well as an API for services (like Bugzilla) to share an encryption key with reCAPTCHA.

If you're suffering problems with spam, take a look at reCAPTCHA. Not only can you solve your problems with spam, you can help preserve mankind's written history into the digital age!

LD_LIBRARY_PATH empty entries

2007-05-14T15:30:00.000-04:00

Many of us developers have a bashrc that has lines like:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/install/lib

I've always known that this isn't perfect, that one should check $LD_LIBRARY_PATH isn't empty, but had always thought it was just a minor point. It turns out that the loader sees an empty entry as meaning the current working directory. This means that it looks there for libraries.

The reason I noticed this is because I was using sshfs to mount something on my workstation in Pittsburgh from my laptop in California. When I ran any command (for example "ls"), the loader would look for tons of libraries. Each one of these libraries, it'd execute a stat for. A round trip between Pittsburgh and California is 90ms... so you can imagine everything was quite slow.

Of course, there are security implications too. I'm not that worried about a rogue directory on my laptop, but on shared systems (such as some of the university ones), I can imagine this being a risk.

In the Bay Area...

2007-05-09T01:36:00.000-04:00

Starting this Saturday I'll be in the Bay Area, specifically Mountain View, for my internship at Google. While there, I'll be working on Google Calendar.

A new spam technique

2007-04-05T15:33:00.000-04:00

A spammer got very clever in terms of ways to make money

Subject: Search on Google raise money for charity Dooniz is now affiliated with Google. That will permit to redistribute a part of the money made on Internet to charity foundations. Internet users can make a difference by search Google using Dooniz.com. A simple click can help children in difficulty or bring more money to cancer or climatic changes researches

This person basically created a homepage with loads of affiliate links, and claims that it helps charity (the site says 75% goes to charity. Yeah, right). Primarily, Google is advertised (using the site specific search stuff that AdSense provides). People get convinced they are helping charity when very little of the money, if any, will actually go that, make it their homepage, the dude profits.

Hopefully, Google will prevent the guy from cashing in on spamming...

Google Job Ads

2007-03-30T11:22:00.000-04:00

Google likes to advertise job positions with Adwords. I was searching for something about the HTTP protocol and encountered this Google ad:

Nice to see a sense of humor in the ad

Gnomefiles needs love

2007-03-02T17:05:00.000-05:00

Eugenia Loli-Queru sent me a quick note today that GNOMEfiles is in need of an owner. The site gets 25,000 pages daily on average. This seems like a pretty important resource for the GNOME community, something worth keeping.

Google's CAPTCHA Broken?

2007-02-28T22:44:00.000-05:00

A few months ago, I found a nice trick that let me read comments on my blog without polling for them. GData allows you to get a ATOM feed of comments on your blog. For example mine is: http://bmaurer.blogspot.com/feeds/comments/default I put this in to Google Reader and blog comments show up just like any other type of blog entry. Recently, I've noticed that, from time to time, I am getting spam comments. However, Google uses a CAPTCHA to protect it's comments. This means one of two things:

Google's CAPTCHAs have been broken
Some spammers are willing to hire humans to break CAPTCHAs

The rate at which spammers post is very small, maybe one or two comments per month. I think this might support a theory that spammers are using humans (if they were using computers, I think it'd be easier to post on the blogs more often). However, Google may be using anti-spam filters in addition to the CAPTCHA (this would be easy enough for somebody to verify, just copy and paste some blatent spam in to blogger, and solve the CAPTCHAs). To be honest, I don't think blog spam would make enough of a profit to justify humans. Google is using the nofollow tag, so the links don't get any PageRank. I bet that spammers are able to break Google's CAPTCHA with a <1% href="http://www.ceas.cc/papers-2005/160.pdf">this paper from Microsoft Research on the importance of segmentation in CAPTCHAs).

CMU dorm policy: Nerds gone wild?

2007-02-28T14:04:00.000-05:00

Recently Carnegie Mellon announced that it was going to test out a gender neutral housing program next semester. It's hard to see how this can be all that shocking (most university housing is co-ed by room). Of course, there's always somebody with an ridiculous point of view:

Unfair or not, my fear is that nerdy kids at Carnegie Mellon might put aside writing computer language for the space program and attempt to brush up their knowledge of biology in the privacy of their own dormitories. This is wrong. Nerds should not be having love affairs with other nerds. There is always the danger that in the throes of nerd passion, their thick glasses will collide or else they will drop heavy laptops onto vulnerable body parts. [CMU Dorm Policy: Nerds gone wild?]

I'm glad to hear that some folks think of students at CMU as nerds who need to be protected from distractions such as members of the opposite sex.

Big Media DMCA Notices: Guilty until proven innocent

2007-02-07T12:05:00.000-05:00

It's no secret that media companies have started to hire companies such as BayTSP to automatically find file sharers and send letters to their ISPs. The goal of this is to use fear to persuade people to use legal methods of getting digital content.

Many ISP's, especially universities, trust the good faith of these companies and will automatically deactivate the Internet connection of those who they get notifications for. As a personal project, and with the help of Carnegie Mellon's Information Security Office (which employs me to work on various computing security tasks), I decided to investigate the reliability of notices from companies such as BayTSP. The answer: the companies do not actually gather the data they claim to. Their standards for sending DMCA notices are very low.

In order to understand the issues, it's first necessary to have a basic understanding of BitTorrent. In order to download something via BitTorrent you download a ".torrent" file from any number of sites that index the content. This file contains a fingerprint for every piece of the file that you are attempting to download. It also contains a reference to a tracker. This tracker is the way that peers (the people downloading the content) find each other. After contacting the tracker, you contact each of the potential peers that the tracker shares with you (and other peers may contact you). The client then begins swapping parts of the file with each of the peers. What the media companies object to is that in the process of downloading the file, your client will offer parts of their copyrighted content to other users -- a violation of copyright law. In order to catch these violations, BayTSP advertises fake clients to the Bittorrent tracker and uses the list of peers which it gets back to find violations

For my investigation, I wrote a very simple BitTorrent client. My client sent a request to the tracker, and generally acted like a normal Bittorrent client up to sharing files. The client refused to accept downloads of, or upload copyrighted content. It obeyed the law.

I placed this client on a number of torrent files that I suspected were monitored by BayTSP (For my own protection I don't want to identify the torrents used for this research. I used the fact that NBC is a client of BayTSP to find trackers. If you want to check if BayTSP is monitoring a torrent, look for IPs coming from ranges in test.blocklist.org). Because the university's information security office is very diligent about processing DMCA notices, I would be able to tell if the BayTSP folks sent notices based on this. With just this, completely legal, BitTorrent client, I was able to get notices from BayTSP.

To put this in to perspective, if BayTSP were trying to bust me for doing drugs, it'd be like getting arrested because I was hanging out with some dealers, but they never saw me using, buying, or selling any drugs.

The fact that BayTSP does not confirm that the client it is accusing actually uploads illegal content could cause false identification of innocent users. BitTorrent trackers work via a standard HTTP request request, for example:

GET /announce?info_hash=579CC43E4D66D35AE22312985EA04275939AB477&peer_id=asdfasdfadfasdf&port=12434&compact=1

One easy way to make somebody look likea bittorrenter would be to get them to go to a website with the code <img src="http://tracker.com:12345/announce?info_hash=579CC43E4D66D35AE22312985EA04275939AB477&peer_id=asdfasdfadfasdf&amp;amp;port=12434&compact=1" />. They'd be on the tracker, and BayTSP would see their IP address, and might send them an infringement notice. BayTSP might check that they are listening on the port they advertise (maybe even check for a BitTorrent handshake). If the user is using bittorrent for legal usages, you could just advertise a port they were listening on. More investigation is needed into exactly what triggers the notice.

One even easier trick you can use: the BitTorrent clients BayTSP uses support Peer Exchange. You can give them the name of another peer for them to rat out to the ISP.

At the end of the day, BayTSP (and probably other similar companies) are sending DMCA notices which claim that they detected a user uploading and downloading copyrighted files. This is a lie. They didn't catch the user in the act of downloading. A lying tracker, a peer using peer exchange, hostile web page, or buggy BitTorrent client could all result in a false DMCA notice.

If your ISP forwards a DMCA notice from these guys, point them here. This research suggests that they have no evidence of wrong-doing. If ISPs learn that the folks sending them DMCA notices are not being completely honest, they may be willing to reconsider their position about how they respond to the notices. The people I work with at Carnegie Mellon seemed willing to reevaluate their policies given this evidence. I believe that ISPs should require that any peer-to-peer related DMCA notice include a statement regarding exactly what evidence of sharing was found. Ideally, the notice should contain evidence that could be corroborated with log files (for example, "we found that the client at 123.1.2.3 uploaded 1 MB of file X to 4.3.2.1". The ISP may be able to check that there was 1 MB of traffic between these two clients).

A piece of good news for anybody who has gotten a bittorrent related notice from BayTSP: it doesn't seem like a studio could do much in terms of court action with the evidence BayTSP gives them.

For the technically minded, I though I'd share some observations of the behavior of BayTSP's clients

BayTSP's clients don't don't accept incoming connections, only send outgoing ones. I wonder what exactly this is for.
Some of the BayTSP clients claim to be using Azureus (and support Azureus extensions), while others run libtorrent. I'm not sure why they are doing this
When BayTSP's clients connect to a BT user, they claim to not have downloaded any of the file, but refuse uploads. Not only does this behavior not make any sense for an actual user, but it seems like BayTSP would want to accept data, which might provide proof of infringement.
Some of the IP ranges I noticed coming from BayTSP were: 154.37.66.xx, 63.216.76.xx, 216.133.221.xx. Sometimes, they make themselves really obvious on the tracker. For example, 154.37.66.xx and 63.216.76.xx will send 10 clients to the same tracker all claiming to listen on port 12320. Maybe trackers should block these folks

Interning at Google Again

2007-01-18T16:23:00.000-05:00

This summer, I'm going to do another internship at Google. I'll be working on Google Calendar.

Beware random CAPTCHAs found on slashdot

2007-01-01T15:02:00.000-05:00

This CAPTCHA, found on slashdot is pretty silly. First, the HTML doesn't really provide that much security. It wouldn't be that hard to script Gecko to render the thing. Worse, it has a very insecure implementation:

if (isset($_POST['hash']) && isset($_POST['CaptchaStr']) ) 

{

 if($captcha->validate_submit($_POST['hash'],$_POST['CaptchaStr']))

  $Message = "Correct.";

 else

  $Message = "Incorrent.";

}

  function check_captcha($correct_hash,$attempt)

  {

   // when check, destroy picture on disk

   if(file_exists($this->get_filename($correct_hash)))

   {

    $res = @unlink($this->get_filename($correct_hash)) ? 'TRUE' : 'FALSE';

    if($this->debug) echo "\n
-Captcha-Debug: Delete image (".$this->get_filename($correct_hash).") returns: ($res)";

   }

   $res = (md5($attempt)===$correct_hash) ? 'TRUE' : 'FALSE';

   if($this->debug) echo "\n
-Captcha-Debug: Comparing public with private key returns: ($res)";

   return $res == 'TRUE' ? TRUE : FALSE;

  }
  /** @private **/

  function get_filename($public='')

  {

   if($public=='') $public=$this->public_key;

   return $this->tempfolder.$this->filename_prefix.$public.'.jpg';

  }

So, here are a few bad things you can do

If your OCR can read 1/2 the chars on the page, the md5sum lets you crack the others. Really quickly
Forget OCR. It doesn't check that the server itself generated the hashes. Hash "apple" then submit the hash and the word "apple".
There are no checks for duplicates. You can solve one captcha and submit it 1000000 times.
You can delete any jpeg file on the website, due to the non-checking of the hash for the word ".."
You can fill up the dude's disk by requesting lots of captchas but not solving them

Don't trust this kind of script!

Posting Zero-Day Scripting Exploits

2007-01-01T14:15:00.001-05:00

It's really sad to see people posting zero day exploits for large applications, such as this GMail exploit. First, it's not clear what this guy's motives are. Maybe he wants to get slashdotted so that the ads on his page will get clicked due to the massive number of visitors. He might also want to get a bit of fame, which is easier to do if you post a zero-day issue and then get it slashdotted.

Maybe he just wants the security issue fixed as fast as possible, and having notified the Google security folks is unsatisfied with their response time. If that's the case, I think he was very irresponsible in the posting of the exploit. First, it's new year's day. That means response time from any website is going to be slow. Thus, it will take longer to get something pushed out. Why not publish something like this on a weekday, when people are at work? The issue will be fixed faster, and slashdot traffic will be higher (more ad clicks, more fame!).

It's also worth noting how dangerous such zero-day issues are. Spammers could do quite a bit of damage in a short amount of time (even if it was open for an hour or two). Spammers likely have (or will acquire) pages that get a fair number of clicks (domain landing pages and porn sites are likely good candidates for this). A zero day exploit could easily let them gather some great data for spamming (Imagine being able to send out an email to somebody from one of the people on their contact list, including the full name of the person! It's a spammer's dream come true).

With all that said, I think the use of JSON for things like sending contact lists is becoming a large danger. I've found and reported similar issues to Google and Facebook in the last month. I bet lots of web 2.0 sites have the exact same issue. There are two easy and secure ways to fix the issue

Use a secret token. For example, make the url something like google.com/contacts?tok=asdfasdfasdfasdf. Make the tok a per-user string (like a HMAC of their username). If the tok isn't correct, deny the request
Rely on XmlHttpRequest. Insert the following code at the top of the JS document "while (1);". Using XmlHttpRequest, download the code, and remove the token. People trying to use a script tag to include the document won't be able to do so.

Performance Tip of the Day: Script Tags are Blocking

2006-12-04T17:32:00.000-05:00

Today I downloaded the fantastic firebug extension. It has a mode where it shows network activity:

I learned that if you have a JavaScript file, the browser will block rendering of the page until the request is done. I saved about 100ms on a few sites I run by moving the Google Analytics tracker to the bottom of the page (not sure why it wasn't being cached, probably because I am on an SSL site).

Posting sensitive data in JSON

2006-11-28T13:12:00.000-05:00

If you are using JSON in AJAX, make sure not to put sensitive data in the JSON feed. Because script tags don't follow the same-origin policy, it's possible to include a script from third party sites.

Google's GData-JSON feeds (which I blogged about earlier) had just such an issue. Google allowed you to request a URL such as http://www.google.com/calendar/feeds/default/private/basic?alt=json-in-script. If you use Google calendar, take a look at that feed with the alt= part taken off. It likely has your email address, your full name, and possibly some sensitive events in it. Any site you visited could have requested that URL and scraped the data. Note that with more advanced techniques, it's possible to get data that doesn't use the callback, ie, array literals. See Jeremiah Grossman's blog

Luckily, this was fixed relatively quickly after I reported it.

DomBuilder + Functional Programming == Awesome

2006-11-25T13:27:00.000-05:00

The DOM sucks. It's so so slow to type document.createElement and document.createTextNode. One nice solution for this is DomBuilder which allows you to say:

document.body.appendChild(
 DIV({ id : "el_" + times, 'onclick' : 'alert("sdsdsd")'}, 
  STRONG({ 'class' : 'test' },"Lovely"), " nodes! #" + times
 )
);

When using the DomBuilder in a project of mine, I found that it couldn't handle data very well. I had a list of items, and I wanted to make a table. There's no easy way to do that with DOMBuilder.

However, a bit of functional programming can save the day. Using Prototype, and adding the following line of code to tagFunc gets lots of millage:

arguments = $A(arguments).flatten ().compact ();

What is this doing? First, we turn arguments into an array so that we can handle it cleanly. Then we flatten any arrays (turn [a,[b,c]] into [a, b, c]) and then compact any null entries ([a,null,b,c] into [a,b,c]). What's the win? Now this library can handle data very elegantly:

var stocks = [{ name : "NOVL", price : 6.28 }, { name : "GOOG", price : 505.00 }];
document.body.appendChild($table (
   $tr ($th ("Name"), $th ("Price")),
   stocks.map (function (stock) {
       return $tr ($td (stock.name), $td (stock.price.toString ()));
   })
));

Note the use of map to handle each of the stocks. Without the flatten, this would not have worked. It's pretty easy to build up HTML from data like this very elegantly.

Using GCal JSON to make a free/busy schedule

2006-11-22T23:44:00.000-05:00

Lately, I seem to be getting lots of emails of the form "When are you free this week, I'd like to meet with you sometime". Each time I get this email, I have to go to my calendar, copy my appointments for the next week, and send it in a reply.

In an ideal world, I could just paste a link to my calendar in iCal format. Sadly, not enough people use a calendaring client for this to be reliable (and worse off, many of the people I interact with use the horror that is Oracle Calendar, which doesn't really handle external ical).

This week, Google added JSON output to their Google Calendar feeds. This allows me to make a pure-javascript solution to this problem. I created a bit of Javascript code (here) which loads my calendar in JSON format and tells the other person when I'm busy

It's nice to be able to only show a free-busy projection of my calendar (I don't want the world to know who I'm meeting with, where I am, etc at every moment. I also use the calendar as a place to dump event related date, for example, airline confirmation numbers). I also like that I only have to host a small static html page to do this. No figuring out where to put a PHP script, no SQL, just a bit of javascript

TODO:

Handle multi-day events
Better date formatting (use day of week, month names, etc)
Combine events (If I'm busy from 10:30-11:30 and 11:30-12:30, I can just be busy between 10:30 and 12:30)
Not depend on prototype (or only take what I need)
Make it pretty

Now that javac is open source...

2006-11-13T16:10:00.000-05:00

Maybe somebody (me?) can finally make a patch for this issue:

[bmaurer@omega ~]$ cat x.java
public class x {
        public static void main (String[] args) {
                System.out.println ("hello world");
        }
}
[bmaurer@omega ~]$ time javac x.java

real    0m0.766s
user    0m0.604s
sys     0m0.040s

For the record, mcs has a time of:

[bmaurer@omega ~]$ time mcs x.cs

real    0m0.483s
user    0m0.440s
sys     0m0.024s

But Java is using a form of Ahead of Time compilation (they call it class file sharing or something) while my MCS is not.

Don't echo back plain text passwords

2006-11-01T22:14:00.000-05:00

Today I found two nice little security issues on an e-commerce site I use. First, the site has a page that allows you to change passwords. The code on the page is of the form <input type="password" name="password" value="MY PASSWORD IN PLAIN TEXT">. Secondly, the site had some Cross Site Scripting issues. At the end of the day, it was drop-dead easy to phish for people's passwords. Yikes.

Never, ever, ever echo sensitive data back to the user. It makes an XSS attack really damaging (and is also bad if somebody leaves their computer unlocked).