<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>ptigas blog</title>
	
	<link>http://ptigas.com/blog</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Sun, 22 Jan 2012 15:41:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ptigas" /><feedburner:info uri="ptigas" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><image><link>http://creativecommons.org/licenses/by/2.0/</link><url>http://creativecommons.org/images/public/somerights20.gif</url><title>Some Rights Reserved</title></image><item>
		<title>name2gender in python</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/wbVAVuoAMPQ/</link>
		<comments>http://ptigas.com/blog/2012/01/21/name2gender-in-python/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 19:26:39 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[data]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=218</guid>
		<description><![CDATA[The problem is the following. Given a name or an email address how can you guess the gender ? The answer is as simple as that. It&#8217;s all about Data. US Social Security Administration office http://www.ssa.gov/ created a service where you can request the number of births of a babies per name per gender since<div class="read-more"><a href="http://ptigas.com/blog/2012/01/21/name2gender-in-python/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>The problem is the following. Given a name or an email address how can you guess<br />
the gender ? The answer is as simple as that. It&#8217;s all about Data.</p>
<p>US Social Security Administration office <a href="http://www.ssa.gov/" target="_blank">http://www.ssa.gov/</a> created a service where you<br />
can request the number of births of a babies per name per gender since 1880. However, their<br />
pages are not easily accessible. After searching a while I found this page<br />
<a href="http://www.infochimps.com/datasets/popular-baby-names-by-year-top-1000-us-social-security-administr" target="_blank"> http://www.infochimps.com/datasets/popular-baby-names-by-year-top-1000-us-social-security-administr</a><br />
where is given a method of how to scrape SAS name list with a bash script.</p>
<pre>
!#/bin/sh
url=http://www.ssa.gov/cgi-bin/popularnames.cgi
mkdir -p names
for ((yr=1879 ; $yr <= 2010 ; yr++)) ; do
	echo $yr
	curl -d "year=$yr&#038;top=1000&#038;number=n" $url > names/$yr.html
done
</pre>
<p>Alternatively you can just download the data from infochimp&#8217;s website.</p>
<p>The problem is this is raw information and thus not very helpful. What we need is to have them in a data structure which will enable us to query the gender per name (ideally a distribution).</p>
<p>I used BeautifulSoup so as to parse the page and extract <name,number of births,gender,year> records.</p>
<pre class="brush: python">import glob
from BeautifulSoup import BeautifulSoup

files = glob.glob('names/*.html')

for f in files :
	html_data = open( f ,'r').read()

	soup = BeautifulSoup(html_data)
	year = soup.find(id="yob")['value']
	tables = soup.findAll('table')
	trs = tables[2].findAll('tr')
	for tr in trs[1:-1]:
		tds = tr.findAll('td')
		print "%s,%s,%s,%s" % (tds[1].contents[0], tds[0].contents[0].replace(',',''), 'male', year)
		print "%s,%s,%s,%s" % (tds[3].contents[0], tds[2].contents[0].replace(',',''), 'female', year)</pre>
<p>I stored the output to names.csv .</p>
<p>Then, to transform this information to a probability distribution per name I used the following code.</p>
<pre class="brush:python">import json 

def prob( m, f ) :
    s = m + f
    return {'male':m/(1.0*s), 'female':f/(1.0*s)}

def load_data( file ) :
    names = {}
    f = open( file, 'r' )
    for l in f :
        d = l.rstrip().split(',')

        name = d[0]
        counter = d[1]
        gender = d[2]
        year = d[3]

        if name not in names :
            names[name] = { 'male':0, 'female':0 }

        if gender == 'male' :
            names[name]['male'] += int(counter)
        else :
            names[name]['female'] += int(counter)

    return names

db = load_data('names.csv')

names = {}
for d in db:
    p = prob( db[d]['male'], db[d]['female'])
    if p['male'] &gt; p['female'] :
        gender = 'male'
    elif p['female'] &gt; p['male'] :
        gender = 'female'
    else:
        gender = 'both'

    names[d] = gender

print json.dumps( names )</pre>
<p>I saved this to names.json.</p>
<p>Finally, to query the gender of a name you just compare the probabilities</p>
<pre class='brush:python'>
f = open('names.json','r')
names = json.loads( f.read() );
print names
def check_name( name ):
	if name in names :
                if names[name]['male']>names[name]['female']:
                         return 'male';
                elif names[name]['male']>names[name]['female']:
                         return 'female';
                else :
                         return 'unknown';
	else :
		return 'unknown'
</pre>
<p>The probability approach gives the flexibility to compute also a confidence of the gender for a given<br />
name.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/wbVAVuoAMPQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/01/21/name2gender-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/01/21/name2gender-in-python/</feedburner:origLink></item>
		<item>
		<title>Robot rock</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/LGkOd027JvQ/</link>
		<comments>http://ptigas.com/blog/2011/05/28/robot-rock/#comments</comments>
		<pubDate>Sat, 28 May 2011 22:24:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NoN]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=199</guid>
		<description><![CDATA[After three weeks of development we finally finished our project in robotics. The task was to construct a robot from lego parts from the lego NXT kit. It wasn&#8217;t so straighforward how to achieve that since a bad design could cause a lot of trouble in the end. Finally, we ended up with a simple<div class="read-more"><a href="http://ptigas.com/blog/2011/05/28/robot-rock/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>After three weeks of development we finally finished our project in robotics. The task was to construct a robot from lego parts from the lego NXT kit. It wasn&#8217;t so straighforward how to achieve that since a bad design could cause a lot of trouble in the end. Finally, we ended up with a simple tripod.</p>
<p>Then, the software part was to programm the robot to perform a taks, known as global localization. In simple words, imagine that you are given a map but you have no idea about the position of the robot in the map. Then, by using a ultrasound sensor you have to find the position and the using this information to drive the robot to a target. As you see, an error in the estimation can be disastrous. </p>
<p>The method we used for the localization part is know as Monte Carlo Localization. This method (uniformly) samples a number of hypotheses (particles) and then by comparing the measurements with the hypothetical measurements we re-sample hypotheses by weightening them so as the more consistent a measurement is, the more probabilities are that a hypothesis will be re-chosen.</p>
<p>After a serveral iterations of the MCL, an estimation of the position is found. Then, using this estimation, with the hope that it is also the correct position, we guide the robot to the target. So, how do we chose how to get from a position to another ? A simple and safe solution to that was to compute the Delauney triangulation of the map and then compute the dual graph of it, <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' />. Then we find the closest node of the dual graph to the robot and to the target. So, we have a starting position, a node (A) of <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' /> that is close to the starting point, a node (B) of <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' /> that is close to the target and a path from A to B on <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' />. Voila, this is the path that we have to follow so as to be as safe as possible that we won&#8217;t hit on a wall.</p>
<p>Well, here it is on action.</p>
<p><iframe src="http://player.vimeo.com/video/24313757?title=0&amp;byline=0&amp;portrait=0" width="643" height="361" frameborder="0"></iframe></p>
<p>That was fun <img src='http://ptigas.com/blog/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/LGkOd027JvQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/05/28/robot-rock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/05/28/robot-rock/</feedburner:origLink></item>
		<item>
		<title>Sol Robeson wise words</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/y6528CScfac/</link>
		<comments>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/#comments</comments>
		<pubDate>Sun, 15 May 2011 16:03:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NoN]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=182</guid>
		<description><![CDATA[Sol Robeson: Have you met Archimedes? The one with the black spots, you see? You remember Archimedes of Syracuse, eh? The king asks Archimedes to determine if a present he&#8217;s received is actually solid gold. Unsolved problem at the time. It tortures the great Greek mathematician for weeks &#8211; insomnia haunts him and he twists<div class="read-more"><a href="http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<blockquote><p><strong>Sol Robeson:</strong> Have you met Archimedes? The one with the black spots, you see? You remember Archimedes of Syracuse, eh? The king asks Archimedes to determine if a present he&#8217;s received is actually solid gold. Unsolved problem at the time. It tortures the great Greek mathematician for weeks &#8211; insomnia haunts him and he twists and turns in his bed for nights on end. Finally, his equally exhausted wife &#8211; she&#8217;s forced to share a bed with this genius &#8211; convinces him to take a bath to relax. While he&#8217;s entering the tub, Archimedes notices the bath water rise. Displacement, a way to determine volume, and that&#8217;s a way to determine density &#8211; weight over volume. And thus, Archimedes solves the problem. He screams &#8220;Eureka&#8221; and he is so overwhelmed he runs dripping naked through the streets to the king&#8217;s palace to report his discovery</p>
<p><strong>Sol Robeson:</strong> Now, what is the moral of the story?</p>
<p><strong>Maximillian Cohen:</strong> That a breakthrough will come.</p>
<p><strong>Sol Robeson:</strong> Wrong! The point of the story is the wife. You listen to your wife, she will give you perspective, meaning. You need a break, you have to take a bath or you will get nowhere.</p></blockquote>
<p>From Aronovsky’s Pi. I’ve forgotten how much I enjoyed that movie.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/y6528CScfac" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/</feedburner:origLink></item>
		<item>
		<title>Real-time jam session system</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/m1CtQ1-wTL0/</link>
		<comments>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 00:28:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[MSc Thesis]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=99</guid>
		<description><![CDATA[I&#8217;ve always been fascinated by arts and science. But, what was getting me excited most was their combination and one of my ambitions is to achieve that. As a computer scientist I couldn&#8217;t find a better way to fulfill this ambition. I finally chose my MSc project which combines my two passions. Algorithms and music.<div class="read-more"><a href="http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve always been fascinated by arts and science. But, what was getting me excited most was their combination and one of my ambitions is to achieve that. As a computer scientist I couldn&#8217;t find a better way to fulfill this ambition. I finally chose my MSc project which combines my two passions. <strong>Algorithms</strong> and <strong>music</strong>.</p>
<p>As the title suggests, my project is to research and develop a system which will make use of <a href="http://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> and <a href="http://en.wikipedia.org/wiki/Music_information_retrieval" target="_blank">music information retrieval</a> so as to understand and then generate music automatically, simulating an improvising musician. In short, a system which will improvise in a jam session.</p>
<p>There are several ideas to begin with. The first milestone is to construct a system which will use simple midi input and fixed tempo, like the system suggested in [1]. However, the goal is to be able to use real instruments and beat to be extracted by rhythm instruments (take a look at [2] and [3] for real-time feature extraction).</p>
<p>Here is a list of the tools and languages i am planning to use:</p>
<ul>
<li><a href="http://chuck.cs.princeton.edu/" target="_blank">ChucK</a> and <a href="http://wekinator.cs.princeton.edu/" target="_blank">weakinator</a>.</li>
<li><a href="http://en.wikipedia.org/wiki/Pure_Data" target="_blank">Pure Data</a> and/or <a href="http://en.wikipedia.org/wiki/Max_(software)" target="_blank">Max/MSP</a> for interaction, programming, generation (using it with <a href="http://www.ableton.com/maxforlive" target="_blank">max for live</a>).</li>
<li><a href="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank">WEKA</a> for machine learning.</li>
<li><a href="ableton.com" target="_blank">Ableton</a> for audio synthesis (generative part)</li>
</ul>
<p>Feel free to leave a comment, to make a suggestion or just tell me what you think.<br />
<strong> </strong></p>
<p><strong>References</strong></p>
<p>1. <strong>Kitahara, Tetsuro, Naoyuki Totani, R. Tokuami, and H. Katayose</strong>. 2010. BayesianBand: Jam Session System Based on Mutual Prediction by User and System.<em> Entertainment Computing ICEC 2009</em></p>
<p>2.<strong> Stark, A.M., and M.D. Plumbley. </strong>2009. “Real-time chord recognition for live performance.”in <em>Proceedings of International Computer Music Conference</em>.</p>
<p>3. <strong>Stark, Adam M, Matthew E P Davies, and Mark D Plumbley</strong>. 2009. “REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO Centre for Digital Music Queen Mary University of London London , United Kingdom.” <em>Analysis</em> 1-6.</p>
<p>&nbsp;</p>
<p><em><br />
</em></p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/m1CtQ1-wTL0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/</feedburner:origLink></item>
		<item>
		<title>Simple CAPTCHA solver in python</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/K2io6wCBtik/</link>
		<comments>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 00:13:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=11</guid>
		<description><![CDATA[This is a very simple method for exploiting very simple CAPTCHAs  like those proposed here and here . In this example we are going to use the following images. It&#8217;s easy to observe the followings. First of all, a fixed size (monospace) font has been used. This makes extracting all the letters and using them<div class="read-more"><a href="http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>This is a very simple method for exploiting very simple CAPTCHAs  like those proposed <a href="http://www.white-hat-web-design.co.uk/articles/php-captcha.php" target="_blank">here</a> and <a href="http://www.white-hat-web-design.co.uk/articles/php-captcha.php" target="_blank">here</a> .</p>
<p>In this example we are going to use the following images.</p>
<p style="text-align: center;"><img class="alignnone size-full wp-image-29" title="test" src="http://ptigas.com/blog/wp-content/uploads/2011/02/test1.jpg" alt="" width="75" height="25" /> <img class="alignnone size-full wp-image-30" title="test2" src="http://ptigas.com/blog/wp-content/uploads/2011/02/test2.jpg" alt="" width="75" height="25" /></p>
<p>It&#8217;s easy to observe the followings. First of all, a fixed size (monospace) font has been used. This makes extracting all the letters and using them as masks to check each digit, one by one, very easy. Also, the alphabet is simple lowercase hexadecimal letters. Thus, we had to extract only 16 letters.</p>
<p>The first part was to to extract all the letters. To achieve that, first of all we sampled several images so as to be sure that the images we have contains all the 16 letters. Then, using a simple image editor we cropped all the letters, one by one. We had to be careful so all the letters be aligned properly. Here is the final mask.</p>
<p style="text-align: center;"><img class="size-full wp-image-38   aligncenter" title="letters" src="http://ptigas.com/blog/wp-content/uploads/2011/02/letters.bmp" alt="" width="250" /></p>
<p style="text-align: left;">As you can notice there is some noise which we have to remove. After playing with several techniques we finally ended to the following. We turned the image to greyscale. Then we used a threshold to remove some of the noise. Here is the example after the filtering (cropping also applied).</p>
<p style="text-align: center;"><img class="size-full wp-image-34 aligncenter" title="source" src="http://ptigas.com/blog/wp-content/uploads/2011/02/source.bmp" alt="" width="100" /></p>
<p style="text-align: left;">So, now we have the image almost cleared and some letters to play with.</p>
<p style="text-align: left;"><strong>Procedure</strong></p>
<p style="text-align: left;">Move each letter across the image and take the difference of the pixels for each position and sum them. Thus for each position we have a score of how much the letter (mask) fits the letter behind it. Then, store for each letter the position where the maximum score found. Then sort by score, take the top five results (our captcha is five letters) and finally sort by position. The result is the CAPTCHA text.</p>
<p style="text-align: left;">More formally</p>
<p style="text-align: left;">Let</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=d%28I%2Cl%2Co%29%3D%5Csum_%7B0%5Cleq%20i%20%5Cleq%20W%20%5C%5C%200%20%5Cleq%20j%20%5Cleq%20H%7D%7B%5BI%28o%2Bi%2C%20j%29-l%28i%2Cj%29%5D%7D&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='d(I,l,o)=\sum_{0\leq i \leq W \\ 0 \leq j \leq H}{[I(o+i, j)-l(i,j)]}' title='d(I,l,o)=\sum_{0\leq i \leq W \\ 0 \leq j \leq H}{[I(o+i, j)-l(i,j)]}' class='latex' /></p>
<p style="text-align: left;">be the distance of the image <img src='http://s.wordpress.com/latex.php?latex=I&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='I' title='I' class='latex' />, with the letter <img src='http://s.wordpress.com/latex.php?latex=l&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='l' title='l' class='latex' /> in position <img src='http://s.wordpress.com/latex.php?latex=o&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='o' title='o' class='latex' /></p>
<p style="text-align: left;">Then</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=p%28I%2Cl%29%20%3D%20%5Carg%5Cmax_%7Bo%7Dd%28I%2Cl%2Co%29&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='p(I,l) = \arg\max_{o}d(I,l,o)' title='p(I,l) = \arg\max_{o}d(I,l,o)' class='latex' /></p>
<p style="text-align: left;">Thus, we need 5 letters <img src='http://s.wordpress.com/latex.php?latex=l_%7B1%7D%2Cl_%7B2%7D%2Cl_%7B3%7D%2Cl_%7B4%7D%2Cl_%7B5%7D&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='l_{1},l_{2},l_{3},l_{4},l_{5}' title='l_{1},l_{2},l_{3},l_{4},l_{5}' class='latex' /> with maximum <img src='http://s.wordpress.com/latex.php?latex=d%28l_%7Bi%7D%2CI%2Co%29&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='d(l_{i},I,o)' title='d(l_{i},I,o)' class='latex' /> ordered by <img src='http://s.wordpress.com/latex.php?latex=p%28l_%7Bi%7D%2C%20I%29&#038;bg=T&#038;fg=ffffff&#038;s=0' alt='p(l_{i}, I)' title='p(l_{i}, I)' class='latex' />.</p>
<pre class="brush: python">def p(img, letter):
		A = img.load()
		B = letter.load()
		mx = 1000000
		max_x = 0
		x = 0
		for x in xrange(img.size[0]-letter.size[0]):
			sum = 0
			for i in xrange(letter.size[0]):
			    for j in xrange(letter.size[1]):
					sum = sum + abs(A[x+i, j][0] - B[i, j][0])
			if sum &lt; mx :
				mx = sum
				max_x = x
		return (mx, max_x)</pre>
<p style="text-align: left;">Here is the code which implements this method. You can browse and download everything from <a href="https://github.com/ptigas/simple-CAPTCHA-solver" target="_blank">https://github.com/ptigas/simple-CAPTCHA-solver</a></p>
<p style="text-align: left;">
<p style="text-align: left;">
<pre class="brush: python">from PIL import Image

def ocr(im, threshold = 200, aplhabet = "0123456789abcdef"):
	img = Image.open(im)
	img = img.convert("RGB")
	box = (8, 8, 58, 18)
	img = img.crop(box)
	pixdata = img.load()

	letters = Image.open('letters.bmp')
	ledata = letters.load()

	# Clean the background noise, if color != black, then set to white.
	for y in xrange(img.size[1]):
	    for x in xrange(img.size[0]):
			if not(pixdata[x, y][0] &gt; threshold \
			and pixdata[x, y][1] &gt; threshold \
			and pixdata[x, y][2] &gt; threshold):
				pixdata[x, y] = (0, 0, 0, 255)
			else:
				pixdata[x, y] = (255, 255, 255, 255)

	counter = 0;
	old_x = -1;

	letterlist = []

	for x in xrange(letters.size[0]):
		black = True
		for y in xrange(letters.size[1]):
			if ledata[x, y][0] &lt;&gt; 0 :
				black = False
				break
		if black :
			if True :
				box = (old_x+1, 0, x, 10)
				letter = letters.crop(box)
				t = p(img, letter);
				print counter, x, t
				letterlist.append((t[0],aplhabet[counter], t[1]))
			old_x = x
			counter = counter + 1

	box = (old_x+1, 0, 140, 10)
	letter = letters.crop(box)
	t = p(img, letter)
	letterlist.append((t[0],aplhabet[counter], t[1]))

	t = sorted(letterlist)
	t = t[0:5] # 5-letter captcha

	final = sorted(t, key=lambda x: x[2])
	answer = ""
	for l in final:
		answer = answer + l[1]
	return answer

print ocr('test.jpg')</pre>
<p style="text-align: left;"><strong>[update]</strong></p>
<p style="text-align: left;">Today I found <a href="http://www.wausita.com/captcha/" target="_blank">this</a>. Very nice tutorial for CAPTCHA solving using python and vector space searching.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/K2io6wCBtik" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/</feedburner:origLink></item>
		<item>
		<title>Hello world!</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/FQI7y7ZKFoc/</link>
		<comments>http://ptigas.com/blog/2011/02/17/hello-world/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 15:53:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NoN]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=1</guid>
		<description><![CDATA[Well, this is (I hope) my final try to keep a blog or a log of my ideas, thoughts and things I like. So, ss the title says, Hello (internet) World. The template currently used is the result of my work the last 2 days. It was developed from scratch using: 960 framework WordPress Adobe<div class="read-more"><a href="http://ptigas.com/blog/2011/02/17/hello-world/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>Well, this is (I hope) my final try to keep a blog or a log of my ideas, thoughts and things I like. So, ss the title says, Hello (internet) World.</p>
<p>The template currently used is the result of my work the last 2 days. It was developed from scratch using:</p>
<ul>
<li><a title="960 framework" href="http://960.gs/" target="_blank">960 framework</a></li>
<li><a href="http://wordpress.org/" target="_blank">WordPress</a></li>
<li><a href="http://www.adobe.com/products/photoshop/" target="_blank">Adobe Photoshop</a></li>
<li>CSS3 goodies and love</li>
</ul>
<p>This is a first draft and it will get extended with several ideas i&#8217;m working on. <del>Also, I plan to release it online. Check my <a href="https://github.com/ptigas" target="_blank">github</a> for updates.</del>You can download it from <a href="http://github.com/ptigas/nullspace" target="_blank">http://github.com/ptigas/nullspace</a></p>
<p>Feel free to leave me any (constructive) feedback.</p>
<p>Cheers</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/FQI7y7ZKFoc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/02/17/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/02/17/hello-world/</feedburner:origLink></item>
	</channel>
</rss>

