Stensland Systems

Social Security

2005-03-17T09:22:59-04:00

I will tend to avoid talking about politics on here, but this came up in my email today and I find this a great read (actually, eventually I will be talking about many of the things on this guy's site, it is all interesting to read there).

From the site:

So, how realistic is the Social Security Trustees Middle Projection assumption on what the US GDP will be 27 years from now?

There is a (0.911121914 - 0.48167789) / 0.138352754 = 3.10397886261 standard deviation chance of it-or a chance of 0.000954684857590260, (about a tenth of a percent,) which is about one chance in a thousand, that the US GDP, 27 years from now, will be as bad, or worse, than their assumption.

A very conservative assumption, indeed.

Once our data feed series and we have something up and working again, one of the first things I will be talking about is the Brownian motion discussion which is all over the place on John's site.

Overconfidence

2005-03-07T15:35:18-04:00

Over at Travis Morien's site, he has a little blurb up on overconfidence in regards to trading the market.

Travis is a financial planner and in that little blurb he raises some excellent points - as he does in many sections of his site. He references many well known books and investing gurus.

It is a good little read to occasionally refer back to. Were I to jump up and down and wave my arms around more, trying to appear more confident and telling you how these formulas are a sure thing - maybe I'd have more readers.
But I'm instead going to keep going through the discussion of the analysis and then formal testing so that we can look and see what works and what doesn't.

I will warn you though that for the most part, you are going to see that technical indicators don't really work. There are some that do, but probably not in the way that you are thinking. We can get to this and more once the free data feed series is resolved and we can then write code to analyze the data that we have and test which technical indicators, if any, actually work.
There are many people who will jump up and down and wave their arms and say that if you are saying indicator XYZ doesn't work, then you aren't using it correctly. They very well may be right, but be careful those people aren't overconfident in nothing more than apophenia.

Feedburner

2005-03-04T13:57:22-04:00

For those of you that are aware of RSS feeds, you may have subscribed to our site and been enjoying the full text feed that we offer. For those of you that haven't - now is your chance.

One of the upsides of an RSS feed is that you can provide a "light" path for your users who want to get your posts, but don't necessarily want to visit the site everyday.

One of the downsides is that the person/people running the site offering said feed doesn't get a good idea of how many people are reading the feed if they aren't visiting the site (as you may or may not have noticed, we have a few counter/tracker things in place to see what sort of traffic we get).

Well, thanks to the very cool Feedburner service, we can now track the RSS readership. What this means for you is really nothing - you don't have to worry about changing your links or RSS feed - I handled that on the server side with a mod_rewrite reference which points all RSS feeds towards Feedburner. It is also smart enough to format all RSS and Atom versions appropriately for your browser, and it even fancies it up a bit so that if you view the XML in a browser, it should be human readable.
I have also updated the ping service for Stensland Systems updates to notify Feedburner when there are new updates, so it should always have the proper and updated feed.

All in all, a great service.

If Feedburner stops behaving properly, I can simply remove the feed redirects and instantly you (assuming you are pointing to our local feeds), will go back to getting the feed we publish locally.

Also, as you can see at the top of this post, and also in the sidebar, there is a counter for the feed. As more people use it and refresh their feeds, then we can get a more accurate assessment of how many people are hitting the feed. (still not entirely clear how often that updates)

The nerd in me rejoices.

RSS feed back up again

2005-03-03T21:01:38-04:00

I've added an RSS 2.0 feed back again for the site. Feel free to use that for your daily updates.

(I had taken it down some time back, but I think it is a great feature for any site - so please feel free to use it)

Free Data Feed: Updated ticker listing

2005-03-03T18:10:17-04:00

I downloaded the OTC tickers with that new URL, renamed that file to "OTC.csv" and then imported that into Excel. From there I pulled out the tickers, added them to the end of my "tickers.xls" file (which is a long list of the tickers all combined). Then I copied those all into a text file and saved it out and reran the "stipTickers.pl" script against it (that script is the one which looks for characters that aren't letters and removes that ticker).

Two things to note from doing that:

1) When I pasted them into a text document, I am on a Mac, so I just double-clicked the old tickers file and pasted in the new data over that. When I double-clicked, it defaulted to TextEdit. When I saved that out, it must have worked some Mac "magic" on it, and it put a Mac newline character at the end ("^M"). When I ran the script, it saw that as one very long line with an illegal character and it deleted it all. I checked via the command line this was the case and then in order to fix this I pasted the data back in via BBEdit, which is much better about acting properly and not forcing Mac things around (there are many ways around this, that was the way I chose to go).

2) I ran the script again after that and we can now note that here are the before and after figures:
Before: 11780
After: 11000

So now we have an even 11K stock tickers that we are going to look at and try to get data for, and update their data at the end of the day (late at night actually).

So the next part coming in the series is going to be how to automate it all in a way that we can start getting data in, but we won't hammer the Yahoo servers. Readjusting our storage requirements to somewhere between 600 and 700MB.
In order to automate, we can essentially use the scripts which we already have, we just have to put in full paths so that when they are run by a cronjob, it will know where to look. Some might argue that had we put in the full paths from the start, we wouldn't need to go back and change this. That is completely and totally correct - but I know from my own experience on my server that this way is easier with this setup in terms of explaining it to someone else.

So stay tuned and hopefully learn something along the way.

An update to some of the URLs

2005-03-02T19:31:54-04:00

One of our readers sent this in:

I thought you'd be interested in a bit I found out. Using "exchange=O", returns the OTCBB stocks (which I'm interested in personally). However, there aren't verbose descriptions in the CSV file returned, as there are with the major exchanges. In other words, there are only two columns in the returned file - the company name and the ticker symbol.

That of course is referring to that link in this post. So in addition to NASDAQ (Q), NYSE (N), and AMEX (A), you can now get those OTC stocks as well with "O" (that is an "oh" and not a zero).

Thanks Dan for sending that in.

Free Data Feed: Daily Updates

2005-03-01T15:34:12-04:00

Now "all" that is left is a way for us to get the end of day data to put on top of our historical data. That way we can update the data which we have in the database. It is going to take time to get thousands of tickers worth of data - more than a day unless you really slam that Yahoo server and they likely aren't going to like that. So the process will ideally involve having an automated script which gets the historical data a few times during the trading day. Then at night it stops and runs the code to get the current updates for the data and add that on to the tickers.
Once you have all of the tickers, then you no longer need to run the script to pull down the historical data.

Another option, and this is how I had it seutp in the past, is to create a PHP page which will display all of the tickers on your list and then have a link to either view the data which you have - or to get data for it if there is none yet. This is nice because due to the nature of clicking on links in PHP, each click gets sent off in a separate thread and is handled on its own - and these days we can even throw in the XMLHTTPRequest and do it in background tasks and not have to reload the page.
That web setup comes in extremely handy and so in the future I will definitely create a write-up on that.

For now, I am going to show you how to do it with Perl scripts and automated cron jobs.
Remember if you are going the Perl route and there is a module which you don't have (I am spoiled and use Pair.com, so they already have pretty much all of the modules you could want already installed), then you can install it on your system with CPAN if you have Perl. On my Powerbook I type "cpan" which starts it up and then as an example "install Finance::Quote" and it will then download and install all of what it needs for that. Then I can run scripts locally which call that and it will use that module. All very nice and all very easy.

So here we can get to my script. Essentially it is to be run late at night when the Yahoo servers will have the updated data (I would suggest about 2-3am EST). It will then look at all of the tickers and load them up. It will grab the data and see if it is a small file - if so, then it will try to get data for it, then try it again to see how much data is in it. If there still is no data, then it will skip it and move on. If there is data, then it will then check the date for the latest update. If it needs new data, then we will add it to the list of data to get and then move on doing this for each file we have.
Then we will run all of those into Finance::Quote and it will pull down the data and put it into the right file, updating it. We could then run analysis scripts against this and have the most recent data to work on.

Here is the code:

#getDailyCloseData.pl
use strict;
use LWP::Simple;
use Finance::Quote;

my $ticker = '';
my $file = '';
my $data = '';
my $count = 0;
#get all tickers in the stocks directory
my @allTickers = ;
my @updateThese = ();
my @possibleIssues = ();
my @args = ();
my @dataArr = ();
my $newRow = '';
my $test = '';
#note that we will want to pull off the "stocks" part 
#when we use that in the lookup part

sub convertDate($){
    #takes in one date form (m/d/yyyy)
    #spits out the Yahoo form (d-M-yy)
    my $inDate = shift;
    my @tmpArr = split('/',$inDate);#m,d,yyyy
    my $day = int($tmpArr[1]);
    my $month = int($tmpArr[0]) - 1;
    my $year = int($tmpArr[2]);
    my @monthStrings = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    my $outDate = $day . '-' . $monthStrings[$month] . '-' . substr($year,2);
    
    return $outDate;
}

#now loop over those
for $file (@allTickers){
    #check to see if the file has more than 10 lines of data in it
    $ticker = substr($file,7);
    $count = `wc -l < $file`;
    die "wc failed: $?" if $?;
    chop($count);
    $count = int($count);
    if($count < 10){
        #not enough, try to get full history
        #we could call out to our external "getsingleHistory.pl" instead
        $data = get('http://itable.finance.yahoo.com/table.txt?s=' . $ticker . '&a=1&b=1&c=1998&ignore=.txt');
        #write it out to disk (clobber)
        open(OUTFILE, '+>' , 'stocks/' . $ticker) or die "Could not write out data file: $!\n";
        print OUTFILE $data;
        close(OUTFILE) or die "Could not close data file: $!\n";
        #it should get the newest data in there
        #print "More data for $ticker \n";#uncomment for debugging
    }
    else{
        #put it on the update list
        push(@updateThese, $ticker);
        #print "Update $ticker \n";#uncomment for debugging
    }
}

#now do the updating
#pass them into Finance::Quote
my $q = Finance::Quote->new;
$q->require_labels(qw/date open high low close volume/);
my %quotes  = $q->fetch("usa",@updateThese);

for $ticker (@updateThese){
    #check to see if there is data in there
    #if not, put it in the Issues array
    #otherwise, update the data
    $test = $quotes{$ticker,"date"};
    if(length($test) > 2){
        #2 seems odd, but will get around whitespace issues
        #in here we are good
        #open this ticker file, just reading in
        open(TICKER,"stocks/$ticker") or die "Bad things on open: $!\n";
        @dataArr = ();#clear it out from last time
        while(){
            chomp;#kill the newlines
            push(@dataArr, $_);
        }
        close(TICKER) or die "Bad things on close: $!\n";
        
        #create the new row
        #date,open,high,low,close,volume (ignore adj close)
        $newRow = '';#clear it out from last time in the loop
        $newRow .= convertDate($quotes{$ticker,"date"});#append on
        $newRow .= ',' . $quotes{$ticker,"open"};
        $newRow .= ',' . $quotes{$ticker,"high"};
        $newRow .= ',' . $quotes{$ticker,"low"};
        $newRow .= ',' . $quotes{$ticker,"close"};
        $newRow .= ',' . $quotes{$ticker,"volume"};
        
        splice(@dataArr,1,0,$newRow);#put it in the first position after the descriptors
        
        #now write it back out to the file, clobber it
        open(TICKER,'+>',"stocks/$ticker") or die "Could not open: $!\n";
        for(@dataArr){
            print TICKER "$_\n";
        }
        close(TICKER) or die "Could not close: $!\n";
    }
    else{
        #in here we are bad
        push(@possibleIssues,$ticker);
    }
}

#dump the possible issues to the screen
#if we are running it manually, then we will see them
#if we are running via cron, they will get emailed to us
for(@possibleIssues){
    print"$_\n";
}
#and done

Now, a few things we can note:

We are totaly ignoring the "Adj(usted) Close" in the Yahoo data for now - we can get into that more later.
It would probably be good to add in code that would figure out what the date would be if it had current data and then if that date is already in the first line, don't bother trying to add more.
Another option is that we could manually hit Yahoo's site and pull down ticker data, then parse it ourselves, and then put that into files. This is still easier. I did it the other way another time and this is easier. Trust me.
Of course you could do all of this far more easily if you just bought a proper data feed, and I inevitably get 10 emails a week telling me that sort of thing. It would also have better/cleaner data which you could be sure to trust (there are many that don't trust public feeds and trusting your data is a big deal in the end with your analysis). But this particular post isn't about setting up that sort of data feed - instead it is just how to setup a free data feed.
Also note that yes we could put this into a SQL database instead of flat files. But for our purposes, that may just be overkill. There are several arguments I can think of as to why a SQL database would make several stages of the code easier, but it would also involve writing code to interface to the database. I may eventually write a post which discusses how to do exactly that, but for now, this particular one does it with flat files (well, CSV files of a sort I guess).
Technically if you wanted to be impolite to Yahoo, you could cheat a bit and write a script that iterates over the stock list and dumps them all out to files named after the stock into our stocks dir. You can put whatever you want in there, as long as it is less than 10 lines (a single space would be just fine). Then run this script and it will go through and get all of the updates for you as part of the process. It would be slow and it would be getting a lot of data, but it would be a way to force your "database" to fill.
This code is something I have written before, lost, and now am imporving on from what I have in my head from last time. I work on it in spurts and am not trying to make it perfect. I am trying to make it work. Over time we can update/tweak it so that it is much better and easier to read, but for now, these are really just rough examples. So while you are free to email me and tell me about some coding style, I am likely to ignore it until I spend more time later to do anything cleaned up (whcih may be never on here).

Now that we have something working, this is good. But note that we will still need to tweak this - and there will most certainly be future posts to do just that. A few exmamples of things we need to do, in no particular order:

Account for errors. Sometimes when the data is pulled from the server, it will be old data. Technically included in here we also need to account for American holidays, during which the exchange will be closed (we will treat "close early" holidays as a normal day, for good or bad). Or if data isn't updating on a stock, that is probably a sign that it has delisted. We will want to alert ourselves to that so that we can then lookup those tickers and we can see what they changed their tickers to or if they delisted entirely.
Web access. Currently our scripts are designed from a command line interface perspective. This is the interface of the gods, the interface through which we live the true experience. But perhaps we want to add pretty colors and pictures of puppies - and why the hell not? Or perhaps we want to share our results with friends? Then onward and upward - with a web interface.
Automate them all. Set it up so that they are run via a cronjob - this will require being far more careful about the paths.
Are there indexes in there? If not, we will want to look to see how Yahoo thinks of indexes and then add those into our data feed - with historical data and updates. There is certainly analysis code which will be glad to have some indexes to look at (if nothing more than for comparisons).
It would be nice to get this into a MySQL database and then use that to pull data out of and see if that is faster/easier - that way it would answer all of the questions I get from people wanting to know if the text file method is worse/better than a SQL database method.
The site currently doesn't appear to be wide enough to easily display the code without causing some issues. So the site could use a redesign so that it can be wider, and it would also be good to offer up the scripts in a single zipfile so that they can be downloaded and edited easily.

Free Data Feed: URL oddity

2005-02-28T20:47:23-04:00

I spoke about URLs to use to pull down data off of Yahoo - and I tested it with "http://itable.finance.yahoo.com/table.txt?s=TICKER" and it worked great. But I just tested it again write now as part of the next write-up, and it isn't working now.

So I am going to have to change it to this now:
"http://itable.finance.yahoo.com/table.txt?s=TICKER&a=1&b=1&c=1998&ignore=.txt" - it currently appears to be letting me use that.

I will go and add notes to the other posts now so that future searches will see it.

Apophenia

2005-02-28T14:39:49-04:00

I have been reading the book Pattern Recognition by William Gibson. It doesn't really have anything to do with the stock market or stock market analysis in general, but the general tone of the book hints a few times (and even outright mentions it) at the idea of apophenia.

The Answers.com page for apophenia tells us:

Apophenia is the experience of seeing patterns or connections in random or meaningless data. The term was coined in 1958 by Klaus Conrad, who defined it as the "unmotivated seeing of connections" accompanied by a "specific experience of an abnormal meaningfulness".

Conrad originally described this phenomenon in relation to the distortion of reality present in psychosis, but it has become more widely used to describe this tendency in healthy individuals without necessarily implying the presence of neurological or mental illness.

In statistics, apophenia would be classed as a Type I error. Apophenia is often used as an explanation of paranormal and religious claims. It has been suggested that apophenia is a link between psychosis and creativity.

Some immediate examples of this might be:
1) From William Gibson's own blog, he mentions that seeing the Virgin Mary in burnt toast is a perfect example of apophennia
2) John Nash, made famous to the masses in the movie A Beautiful Mind (and famous to scientists/mathematicians for his equilibrium theory which revolutionized Game Theory at the time) - he was seeing many things which weren't there. This was both a curse and a godsend, since it was part of his schizophrenia but also part of his ability to use the pattern recognition to help him in scientific ways.
3) Quacks around the world, from those who believe that something in the night sky which can't easily be explained by those of us without much knowledge of events in the sky... must be aliens flying around in spaceships, those who feel that everything is a government conspiracy far more complicated than anything appears on the surface - all of which are discussed in two books that are on the Suggested Reading list for this site. A Demon Haunted World, and How We Know What Isn't So.

All in all, the logical step that is probably obvious which I would like to then follow is that the stock market is an area ripe for this issue. As I have mentioned before on this site, humans are genetically evolved to see patterns easily, even if they aren't there. An error on the side of seeing a pattern is usually going to be far safer than not seeing a pattern at all. The problem is that in matters of things like the stock market, it actually can be dangerous financially to think you see something there which doesn't really exist.
This is something that I think we should keep firmly in the front of our minds when we are discussing analysis on this site. There are things that might look really good to us and seem to indicate the way a stock is or will move. But it is very much possible that we are just seeing another fine example of apophenia.

Free Data Feed - extra bits

2005-02-25T17:13:10-04:00

We have talked about how to get the ticker names for the US exchanges. We have talked about how to get the data for those tickers. Now here is a quick bit about variations on that last part - getting the data.

First up, if we want to get just a single ticker file, then the following script allows us to pass in the ticker name via the command line instead of reading in many via a big list. You would call this with "perl -w getSingleHistory.pl KO" if you wanted to get all of the history for Coke and you were in the directory of the script*.

#getSingleHistory.pl
use strict;
use LWP::Simple;#might eventually need a UserAgent if Yahoo starts requiring it 

my $stockDir = 'stocks/';#the directory we are writing the stocks into
my $ticker = shift(@ARGV);#incoming from the command line.
my $data = '';

#now we will grab the data
print "$ticker\n";#comment out this line if you don't want to see the progress
$data = get('http://itable.finance.yahoo.com/table.csv?s=' . $ticker);
#write it out to disk (clobber)
open(OUTFILE, '+>' , $stockDir . $ticker) or die "Could not write out data file: $!\n";
print OUTFILE $data;
close(OUTFILE) or die "Could not close data file: $!\n";

#and done

Another thing we noted in the previous post was that looking over the data, you will see that some of the files don't have data in them. This could be because Yahoo didn't return the data to you properly for any number of reasons, or simply because the ticker doesn't exist any longer. One's first thought might be to write a script to delete all of the files which don't have any data - which makes sense - no sense keeping them around. But the problem is if you find one that doesn't have data and then manually look it up on the data feed link - you very well may find that it loads data just fine for you. So that means there weas some non-fatal error that took place. The data is out there and we want it - so no need to delete the file. Instead we just want to have a record of which files don't have data in them. If you are on a Windows system or really any system that allows a graphical view of the filesystem, then you could sort the files by size and the ones that are considerably smaller than the rest are ones that you will want to take note of. If there are only a few, then you could just look right then. But there might be many of them, so for the most part it is probably better for a script to make a list of them for you to refer to later.

#cleanup.pl
use strict;

my $count = 0;
my $file = '';
my @questionableFiles = ();

#get all of the stock filenames
my @filenames = ;
#iterate over them
for $file (@filenames){
    #get their linecount
    $count = `wc -l < $file`;
    die "wc failed: $?" if $?;
    chop($count);
    $count = int($count);
    if($count < 10){
        #if in here, we should look into it
        #get all text after "stocks/"
        $file = substr($file,7);       
        #put it into an array
        push(@questionableFiles, $file);
    }
}

#now dump that out to a file
open(OUTPUT,'+>', 'questionableFiles.txt') or die "Bad things man: $!\n";
for(@questionableFiles){
    print OUTPUT "$_\n";
}
close(OUTPUT) or die "Couldn't close the file: $!\n";

#and done

*This is actually an interesting point. For these scripts, you should be in the direcotry of the scripts since that is what the scripts assume. But later if and when we want these to be called by cron, then we are going to need full paths used in the scripts. So keep that in mind if we move to calling them via cron and you are seeing issues with them not working properly that way, but they work just fine when you call them yourself in the directory.

**NOTE**
It appears the URL that was working when I tested this is currently not really working as well. So I am now saying to use this one, which does appear to work: "http://itable.finance.yahoo.com/table.txt?s=TICKER&a=1&b=1&c=1998&ignore=.txt"
And like before, replace TICKER with whatever stock ticker.

Free Data Feed - Pulling down the data

2005-02-25T15:46:06-04:00

Continuing with our free data feed series (previously here), we now have the following things left to do:

Load that file so that we have each ticker in a spot in an array. Then iterate over that array and get all of its historical data. For each ticker, take its data and save it into a file named with the ticker symbol.
We will then want code that will get the end of day data for each ticker and update the data files, and we will want to automate this to happen late at night.

So now we will need a script which will load up the tickers and then create a historical data file for each. For mine, I am going to assume that there is a directory named "stocks" where you are running the script and it will put the data files in there. The names of the files will just be the same as the tickers. There for KO (Coca Cola) will be saved into a filed named "KO" in the stocks directory.

We are going to use Yahoo's historical data and for ease of use just keep in in the same format that they give it to us. If we go back to this entry and also this entry, we can see that there is a URL over at Yahoo which we can use to get historical data: "http://itable.finance.yahoo.com/table.txt?s=TICKER&d=3&e=26&f=2004&g=d&a=1&b=1&c=1998&ignore=.txt". We will use that and replace TICKER with whatever ticker we are currently pulling from our "tickers.txt" file. We could also adjust the date so that it is N years earlier than our current date today - but since that page has the nice feature of getting as much data as it can if there isn't enough to meet the date requirements put in, we can just leave it as an old date and be done with it. What I mean by that is say we use the year 1998 as our data starting point (which according to tests I have run in the past, should be plenty of data if we can get it going back that far). But the stock that we are passing in only has 2 years of data available, so there is no way it could go back to 1998. But the page handles it well and just gives as much data as it does have - which is great for our needs. By choosing a date long enough in the past like that, it allows us to avoid having to do much scripting logic and instead just grab the data. After playing with the URL a little more, we can see there is an even nicer feature that if you leave off all of the data in the URL, it just assumes you want everything up until the most recent date. So that means we only really need "http://itable.finance.yahoo.com/table.csv?s=TICKER" (note that the URL appears to change around over time and in different browsers - they are doing some server-side redirects as far as I can tell - so don't worry too much if you see "itable" instead of "table" in the URL - also it doesn't appear to be vital, at least on my Mac, to have the "ignore" variabe in there but in Windows and IE it will crash if it is there and will spawn off Excel if it has "table.csv" instead of "table.txt").

Note that this script will kill off the tickers.txt file as it gets data for each stock. Since it is feasible that it might not be successful at getting the data, but still removes the ticker from the list - it is probably best to use two ticker files. One which has all of the tickers in it, and one which starts with all of the tickers in it and from which we remove the ones which we have downloaded data for. Worst case scenario we can then always start over without having to gather all of the tickers again. In the case of my examples which I am creating, I will be using "tickers.txt" to hold all of the tickers, and "needData.txt" for the tickers which remain which we don't have data for yet. That way we can just check on that file and know that if there are any tickers still in there, then we aren't done yet. We can also add tickers back into that list if we want it to redo the history for those files.

Each page that we get will be roughly 65KB in size. Some will be larger, others smaller, but that is about it for the size. We have already established that we will be getting 7475 tickers. Sp 7475 * 65KB = ~475MB of data. So we will need to keep that in mind in terms of 1) bandwidth and 2) disk space which we will need. Bandwith is important if you are on a remote hosted site - for example the server that this page is hosted on allows a certain amount of bandwidth to be transferred every month. If your bandwidth is relatively close to 475MB, then you are going to use that up just in getting your history. Now keep in mind that you won't be doing that on a regular basis - the updates are just one line comparitively, so much smaller/faster and less of a load. Also think of it from Yahoo's persepective, they are offering you a nice thing, so you don't want to abuse it and grab too much at once. Plus I have heard there is some cut off as to how much you can pull down (although haven't tried testing that yet for obvious reasons). On a 512kbps connection I saw that 100 tickers took less than 6 minutes to download and process on my Powerbook.

Here is the code I used to download the data. The general idea of it is the "runCount" variable is how many tickers you want to get in one run. Then you open up the "needData.txt" file which originally has all of the tickers in it (copied from "tickers.txt" which we created before). Then it iterates over those tickers, pulls down their data, and then dumps them out into a file in the "stocks" directory. So KO (Coke) would get dumped to "stocks/KO".

#getAllHistoricData.pl
use strict;
use LWP::Simple;#might eventually need a 
                        #UserAgent if Yahoo starts requiring it 

my $runCount = 100;#this is how many tickers are going to try to get data for
#we want to keep that fairly low since it is rude to hit Yahoo overly hard

my $stockDir = 'stocks/';#the directory we are writing the stocks into
my @allTickers = ();
my $ticker = '';
my $data = '';

#open the tickers file
open(TICKERS, 'needData.txt') or die "Could not open needData.txt: $!\n";
while(){
    chomp;
    push(@allTickers, $_);
}
close(TICKERS) or die "Could not close needData.txt: $! \n";

#check that the number of tickers in the allTickers array is 
#greater than or equal to runCount
#if not, then make runCount equal to the length of allTickers
if(scalar(@allTickers) < $runCount){
    $runCount = scalar(@allTickers);
}

#now we will grab the data
while($runCount > 0){
    $ticker = pop(@allTickers);
    print "$ticker\n";#comment out this line if 
                              #you don't want to see the progress
    $data = get('http://itable.finance.yahoo.com/table.txt?s=' . $ticker . '&a=1&b=1&c=1998&ignore=.txt');
    #write it out to disk (clobber)
    open(OUTFILE, '+>' , $stockDir . $ticker) or die "Could not write out data file: $!\n";
    print OUTFILE $data;
    close(OUTFILE) or die "Could not close data file: $!\n";
    --$runCount;
}

#now dump out the tickers that we have left into the needData.txt file
#clobber - meaning '+>' 
#which will overwrite the existing data in that file with this
open(TICKERS,'+>','needData.txt') or die "Could not open needData.txt: $!\n";
for(@allTickers){
    print TICKERS "$_\n";
}
close(TICKERS) or die "Could not close needData.txt: $!\n";

#and done

Now this doesn't account for when it hits Yahoo with a ticker and doesn't get any data back. It could be that Yahoo is having some issue (or blocking your IP for pulling down too much data at once), or a far more likely issue is that the ticker no longer exists (delists, was bought out, etc). If that is the case, this script doesn't account for that and it will write out a file with the ticker's name to the directory, but it will be an empty file. Also, you may notice the comment in there which adds that as it stands now, Yahoo doesn't require a user agent string to show up. Google on the other hand does require that (assuming you are hitting a web page and not going through their API - I haven't tried the latter, so perhaps that doesn't need one). If that does happen eventually, then that code will have to be a few lines longer to account for the user agent string and its associated creators and calls.

So coming up next, we will talk about how to cleanup the directory (remove empty files) and also how to do the same as above, but only one ticker, passed in through the command line.

**NOTE**
It appears the URL that was working when I tested this is currently not really working as well. So I am now saying to use this one, which does appear to work: "http://itable.finance.yahoo.com/table.txt?s=TICKER&a=1&b=1&c=1998&ignore=.txt"
And like before, replace TICKER with whatever stock ticker.

Free Data Feed... starting over

2005-02-23T01:20:33-04:00

Okay, I know I had written a series up to this point describing how we could create a data feed for free merely combining some sticks, a ball of twine, and our own two hands. But I went and didn't get enough sleep for about 2 months in a row and then one morning woke up and via a typo only the idiot I am could make, deleted all of my files.

So instead of continuing that path, I am going to start over and let you, gentle reader, live this experience with me. Also this way I can tweak it and make notes to the blog about what works for me, as well as hopefully answer some questions if they come up.

To outline what I hope to do in this series:

Get all of the ticker symbols.
Put those tickers into a file (in alphabetical order) so that we can later grab this file and use it to know which tickers to look up for data.
Load that file so that we have each ticker in a spot in an array. Then iterate over that array and get all of its historical data. For each ticker, take its data and save it into a file named with the ticker symbol.
We will then want code that will get the end of day data for each ticker and update the data files, and we will want to automate this to happen late at night.

See, that is fairly straightforward. So let's get started...

(I will be doing these examples in Perl mainly because I really like Perl. Sure, you can do them in Python, Ruby, Java, C, C++, various .NET variations, and so on. But I am going to be doing it in Perl.)

Get all ticker symbols.
There is a webpage which will allow you to get all tickers for a given exchange. If you go to: http://www.nasdaq.com/asp/symbols.asp?exchange=Q&start=A&Type=0 then you will see the NASDAQ page. Now that particular page is great, we could parse that HTML and pull out the tickers that we want as well as adjust it and pull down multiple pages, moving through the alphabet. Or, as one of our kind readers pointed out here, we could just change that "start=A" to "start=0" (that's a zero) and it throws a CSV file at us that has all of the tickers. This is a nice thing. Very nice. Now we only need to know that URL and adjust the "exchange=" variable. Q = NASDAQ, N = NYSE, and A = AMEX. So we only will have to load three URLs and then we have all of the data. Nice.
When we hit that URL and it tosses back a CSV file, the format of the file is so that it gives us extra information (in order: "Company Name","Ticker Symbol","Market Value (millions)","Description (as filed with the SEC)" and the first two rows are descriptive data (tells us what the doc is and then what each data spot is). So we can ignore the first two rows and we only want the second position in each row (the ticker symbol of course).
Now I spent a far too long writing code to deal with it (at least on my Mac, the CSV parsing code doesn't like the CSV data due to newlines showing up in the text - but in my text editor I don't see them), but then I realized that it is only three files that we are dealing with here. I can easily just download those and manipulate them in Excel to get what I want. I downloaded each one, named them appropriately (since the download for each wants to name itself "symbols.csv") and then put them into Excel. From there I pulled out just the tickers column of each and added them all to one long Excel column. I then sorted that alphabetically and then copied them all out into a text file and named it "tickers.txt".
This allows me to easily see that there are 8255 tickers in there. Now there is an issue that I want just easy tickers to pass into Yahoo (which we will be using for our feed), so I am going to strip out tickers that have either "^" or "/" in them. (also manually scroll through to make sure you didn't accidentally grab any copyright data which was at the end of the files)
After alphabetizing the list and dumping into a straight text file, here is the Perl code I use to strip out the lines which I don't feel like bothering with. (Before stripping, 8255 tickers. After stripping, 7475 tickers.)

#stripTickers.pl
use strict;

my @tickersIn = ();
my @tickersOut = ();
my $count = 0;

#get all of the tickers into an array
open(TICKERS, 'tickers.txt') or die "Could not open tickers.txt: $!\n";
while(){
    chomp;
    push @tickersIn, $_;
    $count++;
}
close(TICKERS) or die "Could not close tickers.txt: $!\n";

print "Before: $count\n";
$count = 0;

#now iterate over those and if the ticker doesn't contain our bad chars, 
#then we keep it
for my $ticker (@tickersIn){
    #if it doesn't have the chars, then we add it to the keepers
    #check for "/"
    if(!($ticker =~ /\//) && !($ticker =~ /\^/)){
        push @tickersOut, $ticker;
        $count++;
    }
}

print "After: $count\n";

#rename the old tickers.txt file to tickers_old.txt
rename 'tickers.txt', 'tickers_old.txt';

#dump back out (clobber)
open(TICKERS,'+>','tickers.txt') or die "Could not open tickers.txt: $!\n";
for(@tickersOut){
    print TICKERS "$_\n";
}
close(TICKERS) or die "Could not close tickers.txt: $!\n";

#and done

That is the first stage - we now have our tickers. The next installment will be up in the next day or two which will go into how we can use Perl and cron to pull down our data.

*Note that IE appears to be displaying the code with a big gap in it - if you are using IE, please don't. Try something like FireFox.

Upgrade complete

2005-02-21T22:42:18-04:00

Okay, I have upgraded to the newest Movable Type and MT-Blacklist. I would also like to redesign the site, but that is way lower on my list of priorities. The next step is to rebuild the stock database and document that on here.
Hopefully sometime between now and the weekend.

I'm back!

2005-02-21T18:19:42-04:00

Things are still super busy right now (is that good or bad?), but I have been getting a lot of e-mails asking me to continue with the site. Therefore I am making a serious effort to maintain this.

First off, I have to admit that one bleary eyed morning I stupidly typed in the wrong command on my server and wiped out my stock database which I use for analysis.
As a result, I will need to rebuild that from scratch and setup the updating process. This actually works out well because I had been writing up something along those lines on here anyway. Since I will be going over the process from scratch on my end and it might change slightly from what I had before - I am just going to start explaining it again on here - from scratch.

Also, this is running an older version of MovableType, so I will be upgrading to the newest version at some point "soon" (feasibly even tonight). This shouldn't impact you as the end user in anyway - but if you see anything odd, it might be that I am upgrading the site.

That's it! More soon...

Mandelbrot throws down the gauntlet

2004-08-07T15:20:04-04:00

Wired has up an article in which Benoit Mandelbrot is asking Wall Street's financial wizards to tackle the mathematics of the markets in the same way that the DNA genome was sequenced and the way SETI works, via a concentrated distributed effort (hmm, sounds like an oxymoron).

Do note that Mandelbrot has a book coming out soon, and this is taken from the book, so it is highly feasible that this is essentially a way for him to garner attention towards said book.

Since each financial institution feels, and perhaps rightfully so I should add, that their own techniques are unique and what makes them money - they aren't likely to want to join into a global effort to resolve something that wouldn't give them any sort of edge over anyone else.