<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>ptigas blog</title>
	
	<link>http://ptigas.com/blog</link>
	<description>Panagiotis Tigas personal blog</description>
	<lastBuildDate>Thu, 12 Apr 2012 22:07:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ptigas" /><feedburner:info uri="ptigas" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><image><link>http://creativecommons.org/licenses/by/2.0/</link><url>http://creativecommons.org/images/public/somerights20.gif</url><title>Some Rights Reserved</title></image><item>
		<title>MSc thesis poster and complete text</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/b7Cu8UUfXsE/</link>
		<comments>http://ptigas.com/blog/2012/04/12/msc-thesis-poster-and-complete-text/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 22:07:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[MSc Thesis]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=352</guid>
		<description><![CDATA[My initial thought was to start writing posts so as to explain and document my thesis. However  as a first step I decided to post here the thesis poster and upload the complete text of my thesis on arxiv. Voila! http://arxiv.org/abs/1201.6251 and]]></description>
			<content:encoded><![CDATA[<p>My initial thought was to start writing posts so as to explain and document my thesis. However  as a first step I decided to post here the thesis poster and upload the complete text of my thesis on arxiv.</p>
<p><span id="more-352"></span></p>
<p>Voila!</p>
<p style="text-align: center;"><a title="Real-time jam-session support system" href="http://arxiv.org/abs/1201.6251" target="_blank">http://arxiv.org/abs/1201.6251</a></p>
<p style="text-align: center;">and</p>
<p style="text-align: center;"><a href="http://dl.dropbox.com/u/217134/misc/poster_grey.pdf"><img class="size-medium wp-image-353 aligncenter" title="msc thesis poster" src="http://ptigas.com/blog/wp-content/uploads/2012/04/Screen-Shot-2012-04-12-at-22.59.17-209x300.png" alt="" width="209" height="300" /></a></p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/b7Cu8UUfXsE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/04/12/msc-thesis-poster-and-complete-text/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/04/12/msc-thesis-poster-and-complete-text/</feedburner:origLink></item>
		<item>
		<title>programming bread slicing</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/SLvER_SDg64/</link>
		<comments>http://ptigas.com/blog/2012/03/18/programming-bread-slicing/#comments</comments>
		<pubDate>Sun, 18 Mar 2012 19:20:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=341</guid>
		<description><![CDATA[In the dark ages of programming, functions acted on data. To slice your bread, you passed a bread data structure to a slice function: slice(bread); Then came object oriented programming. Instead of having an external function slice our bread, we would ask the bread to slice itself by calling the slice method on a bread<div class="read-more"><a href="http://ptigas.com/blog/2012/03/18/programming-bread-slicing/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<blockquote><p>In the dark ages of programming, functions acted on data. To slice your bread, you passed a bread data structure to a slice function:</p>
<pre>slice(bread);</pre>
<p>Then came object oriented programming. Instead of having an external function slice our bread, we would ask the bread to slice itself by calling the slice method on a bread object:</p>
<pre>bread.slice();</pre>
<p>Obviously a vast improvement.<br />
Now object oriented programming has become more refined. First we create a bread-slicing object and then we simply pass bread objects to the slice method on the bread-slicer:</p>
<pre>BreadSlicer slicer = new BreadSlicer();
slicer.slice(bread);</pre>
</blockquote>
<p>via <a href="http://www.johndcook.com/blog/2012/03/18/sliced-bread/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+TheEndeavour+%28The+Endeavour%29" target="_blank">John Cook</a></p>
<p><strong>update</strong></p>
<p>There is an argument against the final improvement. <a href="http://objology.blogspot.it" target="_blank">Travis Griggs</a> advices not to make objects with -er, which probably will act the same way functions act (applying a function over an argument). You can read about it <a href="http://objology.blogspot.it/2011/09/one-of-best-bits-of-programming-advice.html" target="_blank">here</a></p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/SLvER_SDg64" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/03/18/programming-bread-slicing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/03/18/programming-bread-slicing/</feedburner:origLink></item>
		<item>
		<title>Memento decorator in python and fibonacci sequence</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/9GhCHaodRak/</link>
		<comments>http://ptigas.com/blog/2012/03/06/memento-decorator-in-python-and-fibonacci-sequence/#comments</comments>
		<pubDate>Tue, 06 Mar 2012 20:17:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=322</guid>
		<description><![CDATA[Computing fibonacci with the naive method takes exponential time. However, it&#8217;s known that by using dynamic programming you can compute fibonacci in linear time. The trick is to check if you have already computed the value of , and if yes return the already-computed value instead of calling the function again (which is expensive). Here<div class="read-more"><a href="http://ptigas.com/blog/2012/03/06/memento-decorator-in-python-and-fibonacci-sequence/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>Computing <a href="http://en.wikipedia.org/wiki/Fibonacci_number" target="_blank">fibonacci</a> with the naive method takes exponential time. However, it&#8217;s <a href="http://en.wikipedia.org/wiki/Dynamic_programming#Fibonacci_sequence" target="_blank">known</a> that by using dynamic programming you can compute fibonacci in linear time. The trick is to check if you have already computed the value of <img src='http://s.wordpress.com/latex.php?latex=f%28a%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='f(a)' title='f(a)' class='latex' />, and if yes return the already-computed value instead of calling the function again (which is expensive).</p>
<p>Here is the naive implementation in python :</p>
<pre class="brush: python">
def fib( n ) :
	if n <= 1 :
		return 1
	return fib(n-1) + fib(n-2)
</pre>
<p>Calculating fibonacci sequence from 1 to 35 takes about 8 seconds (8.056s). So, let's how we can improve this by using <a href="http://stackoverflow.com/questions/739654/understanding-python-decorators" target="_blank">decorators</a>.</p>
<p>The Memento decorator will allow us to save time by <strong>not</strong> calling a function which we've already called.</p>
<p>Here is an implementation of Memento decorator:</p>
<pre class="brush: python">
class Memento :
	__mem__ = {}
	def __init__(self, f) :
		self.f = f

	def __call__(self, arg) :
		if arg not in self.__mem__ :
			self.__mem__[arg] = self.f(arg)
		return self.__mem__[arg]
</pre>
<p>Let's see what it does. The <em>__init__</em> function is pretty obvious. It assigns the function f to an object variable. The <em>__call__</em> function is triggered every time the decorated function is called. By using a dictionary we check and return the result of a function call with the same argument (if we've already computed that). The cost of querying the dictionary is almost <img src='http://s.wordpress.com/latex.php?latex=O%281%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='O(1)' title='O(1)' class='latex' />.</p>
<p>By just applying this decorator to our function</p>
<pre class="brush: python">
@Memento
def fib( n ) :
	if n <= 1 :
		return 1
	return fib(n-1) + fib(n-2)
</pre>
<p>we manage to compute the first 1000 fibonacci numbers in less than a second.</p>
<p>Here is a gist of this example <a href="https://gist.github.com/1988734" target="_blank">https://gist.github.com/1988734</a> .</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/9GhCHaodRak" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/03/06/memento-decorator-in-python-and-fibonacci-sequence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/03/06/memento-decorator-in-python-and-fibonacci-sequence/</feedburner:origLink></item>
		<item>
		<title>Loop detection in O(n) using Floyd’s algorithm</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/SNBulv1mFqM/</link>
		<comments>http://ptigas.com/blog/2012/03/03/loop-detection-in-on-using-floyds-algorithm/#comments</comments>
		<pubDate>Sat, 03 Mar 2012 21:32:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=298</guid>
		<description><![CDATA[Given a linked list find whether the list contains a loop or not. There are several solutions to this but the following is considered one of the best. Also it&#8217;s dead simple. The algorithm is also known as &#8220;Tortoise and Hare Algorithm&#8221; def toirtoise_and_hare(l) : tortoise = l.head hare = l.head steps = 0 while<div class="read-more"><a href="http://ptigas.com/blog/2012/03/03/loop-detection-in-on-using-floyds-algorithm/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>Given a linked list find whether the list contains a loop or not.</p>
<p>There are <a href="http://ostermiller.org/find_loop_singly_linked_list.html" target="_blank">several</a> solutions to this but the following is considered one of the best. Also it&#8217;s dead simple. The algorithm is also known as &#8220;<a href="http://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_hare" target="_blank">Tortoise and Hare Algorithm</a>&#8221;</p>
<div style="margin-top:30px">
<img style="float:left" title="tortoise and hare" src="http://upload.wikimedia.org/wikipedia/commons/6/64/Tortoise_and_hare_rackham.jpg" alt="" width="200" height="277" /></p>
<div style="margin-left:205px; width:490px; margin-top:-20px">
<pre class="brush: python">def toirtoise_and_hare(l) :
	tortoise = l.head
	hare = l.head
	steps = 0
	while True:
		if hare == None :
			return (steps, 'No loop found')
		hare = hare.next
		if hare == None :
			return (steps, 'No loop found')
		hare = hare.next
		tortoise = tortoise.next
		if hare == tortoise:
			#print hare, tortoise
			return (steps, 'Loop found')
		steps += 1</pre>
</div>
<div style="clear:both"></div>
</div>
<p><span id="more-298"></span><br />
Below you can find a simple implementation of linked list in python</p>
<pre class="brush: python">class Node :
	def __init__(self, data) :
		self.next = None
		self.data = data

	def __str__(self) :
		return str(self.data)

class LinkedList :
	def __init__(self) :
		self.head = None

	def add_node(self, data):
		if self.head == None :
			self.head = Node(data)
		else :
			tmp = Node(data)
			tmp.next = self.head
			self.head = tmp

	def add_loop(self, f, t) :
		node = self.head
		i = 0
		node_f = None
		node_t = None
		while node != None :
			i += 1
			if i == f :
				node_f = node
			if i == t :
				node_t = node
			node = node.next
		node_f.next = node_t

	def print_list(self):
		node = self.head
		while node != None :
			print node.data
			node = node.next</pre>
<p>&#8230; and test &#8230;</p>
<pre class="brush: python">l = LinkedList()
for i in xrange(1000) :
	l.add_node(i)

l.add_loop(800, 200)

print toirtoise_and_hare(l)</pre>
<p>As mentioned the complexity of the algorithm is <img src='http://s.wordpress.com/latex.php?latex=O%28n%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='O(n)' title='O(n)' class='latex' />.</p>
<p><a href="https://gist.github.com/1968328" target="_blank">Here</a> is the complete gist.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/SNBulv1mFqM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/03/03/loop-detection-in-on-using-floyds-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/03/03/loop-detection-in-on-using-floyds-algorithm/</feedburner:origLink></item>
		<item>
		<title>Review: Collaborative Filtering for Implicit Feedback Datasets</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/sbZIjd2JZi8/</link>
		<comments>http://ptigas.com/blog/2012/03/02/review-collaborative-filtering-for-implicit-feedback-datasets/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 23:12:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=246</guid>
		<description><![CDATA[Introduction Review of &#8220;Collaborative Filtering For Implicit Feedback Datasets&#8221; Collaborative Filtering is a method for providing recommendations based on preferences fetched from users about items (products, shows, tracks, etc. ). Recommender systems utilise such techniques for personalised suggestions using past history of users. For example, Amazon use a type of Collaborative Filtering, called item-based collaborative<div class="read-more"><a href="http://ptigas.com/blog/2012/03/02/review-collaborative-filtering-for-implicit-feedback-datasets/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<h2 style="text-align: left;">Introduction</h2>
<p>Review of &#8220;<a title="Review: Collaborative Filtering for Implicit Feedback Datasets" href="http://www2.research.att.com/~yifanhu/PUB/cf.pdf" rel="bookmark">Collaborative Filtering For Implicit Feedback Datasets</a>&#8221;</p>
<p><a href="http://en.wikipedia.org/wiki/Collaborative_filteringJ11QQ&amp;sig2=wz6434ryOscPir8CZvwdBw" target="_blank">Collaborative Filtering</a> is a method for providing recommendations based on preferences fetched from users about items (products, shows, tracks, etc. ). Recommender systems utilise such techniques for personalised suggestions using past history of users. For example, <a href="http://www.amazon.co.uk/" target="_blank">Amazon</a> use a type of Collaborative Filtering, called<a href="http://www10.org/cdrom/papers/519/" target="_blank"> <em>item-based collaborative filtering</em></a>. According this method, relationships between pair of items are being extracted and combined with user&#8217;s data (purchase history) so as to infer recommendations.</p>
<p>As we saw in the previous example, users provide feedback to Amazon about items by rating and buying. This (explicit) feedback is characterised by the fact that the users explicitly define what they like (by buying it or rating positively) and what they don&#8217;t (by giving low ratings). However, there are cases that this kind of feedback is not available. For example, when a user is listening to a track or an album, without rating it, it is difficult to know if he liked it or not. This kind of feedback is known as <em>implicit feedback</em> and its main characteristic is that it describes users&#8217; behaviours. As the authors argue, there are several implications that arise from the lack of explicit feedback.<span id="more-246"></span></p>
<p>What is of interest is what the user didn&#8217;t like. This information is difficult to be extracted for the reason that a user (listener) doesn&#8217;t provide negative feedback (such as low ratings) of tracks that he didn&#8217;t like. What is more, it is dangerous to assume that he didn&#8217;t like the songs the he hasn&#8217;t listened to since there is a positive probability that he hadn&#8217;t even been aware of them. Also, implicit feedback might contain a lot of nois . For example, it is very common for a family&#8217;s members to share a computer thus by tracking listening behaviours from this specific computer might not match to a specific member of the family.</p>
<p>In this paper, the authors describe a latent factor model method based on matrix factorisation which deals with this problem. Providing recommendations to users using implicit feedback data.</p>
<h2 style="text-align: left;">The method</h2>
<p style="text-align: left;">For <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=T&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> users and <img src='http://s.wordpress.com/latex.php?latex=m&#038;bg=T&#038;fg=000000&#038;s=0' alt='m' title='m' class='latex' /> items, users histories are stored in <img src='http://s.wordpress.com/latex.php?latex=r%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%5Ctimes%20m%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r \in \mathbb{R}^{n\times m}' title='r \in \mathbb{R}^{n\times m}' class='latex' /> matrix such as :</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=%20r_%7Bui%7D%20%3D%20%5Ctextrm%7Bpreference%20by%20user%20%7D%20u%20%5Ctextrm%7B%20of%20item%20%7D%20i%20&#038;bg=T&#038;fg=000000&#038;s=0' alt=' r_{ui} = \textrm{preference by user } u \textrm{ of item } i ' title=' r_{ui} = \textrm{preference by user } u \textrm{ of item } i ' class='latex' /></p>
<p>For example, <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> might represent the times a user <img src='http://s.wordpress.com/latex.php?latex=u&#038;bg=T&#038;fg=000000&#038;s=0' alt='u' title='u' class='latex' /> listened to track <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=T&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> multiplied each time by the percentage of the track listened.</p>
<p>What is of great importance is that in contrast to explicit feedback where <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> indicates likeness or dislikeness, in implicit feedback it indicates confidence. For example, a low value of <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> doesn&#8217;t reveal dislikeness of item <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=T&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> by user <img src='http://s.wordpress.com/latex.php?latex=u&#038;bg=T&#038;fg=000000&#038;s=0' alt='u' title='u' class='latex' /> but low confidence about the preference of user <img src='http://s.wordpress.com/latex.php?latex=u&#038;bg=T&#038;fg=000000&#038;s=0' alt='u' title='u' class='latex' /> for this item <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=T&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' />.</p>
<p>For the needs of the method the binarization <img src='http://s.wordpress.com/latex.php?latex=p_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='p_{ui}' title='p_{ui}' class='latex' /> (preferences) of <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> is computed as follows:</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=%20%20p_%7Bui%7D%20%3D%20%5Cleft%5C%7B%20%20%5Cbegin%7Barray%7D%20%7Bll%7D%20%201%26%20%3A%20r_%7Bui%7D%20%3E%200%20%5C%5C%20%200%26%20%3A%20r_%7Bui%7D%20%3D%200%20%5C%5C%20%20%5Cend%7Barray%7D%20%20%5Cright.%20%20&#038;bg=T&#038;fg=000000&#038;s=0' alt='  p_{ui} = \left\{  \begin{array} {ll}  1&amp; : r_{ui} &gt; 0 \\  0&amp; : r_{ui} = 0 \\  \end{array}  \right.  ' title='  p_{ui} = \left\{  \begin{array} {ll}  1&amp; : r_{ui} &gt; 0 \\  0&amp; : r_{ui} = 0 \\  \end{array}  \right.  ' class='latex' /></p>
<p>What is more <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> values are mapped to confidence values <img src='http://s.wordpress.com/latex.php?latex=c_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='c_{ui}' title='c_{ui}' class='latex' />. For this the authors chose to use the following linear relation:</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=%20%20c_%7Bui%7D%20%3D%201%20%2B%20%5Calpha%20r_%7Bui%7D%20%20&#038;bg=T&#038;fg=000000&#038;s=0' alt='  c_{ui} = 1 + \alpha r_{ui}  ' title='  c_{ui} = 1 + \alpha r_{ui}  ' class='latex' /></p>
<p>The main method tries to factorize <img src='http://s.wordpress.com/latex.php?latex=p_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='p_{ui}' title='p_{ui}' class='latex' /> in two factors <img src='http://s.wordpress.com/latex.php?latex=x_u%20%5Cin%20%5Cmathbb%7BR%7D%5Ef&#038;bg=T&#038;fg=000000&#038;s=0' alt='x_u \in \mathbb{R}^f' title='x_u \in \mathbb{R}^f' class='latex' /> (user factors) and <img src='http://s.wordpress.com/latex.php?latex=y_i%20%5Cin%20%5Cmathbb%7BR%7D%5Ef&#038;bg=T&#038;fg=000000&#038;s=0' alt='y_i \in \mathbb{R}^f' title='y_i \in \mathbb{R}^f' class='latex' /> (item factors) such that <img src='http://s.wordpress.com/latex.php?latex=p_%7Bui%7D%20%3D%20x_u%5E%7BT%7Dy_i&#038;bg=T&#038;fg=000000&#038;s=0' alt='p_{ui} = x_u^{T}y_i' title='p_{ui} = x_u^{T}y_i' class='latex' />. Please note that <img src='http://s.wordpress.com/latex.php?latex=f&#038;bg=T&#038;fg=000000&#038;s=0' alt='f' title='f' class='latex' /> is a parameter of the system which represents the number of hidden factors and the authors experimented with values from <img src='http://s.wordpress.com/latex.php?latex=20&#038;bg=T&#038;fg=000000&#038;s=0' alt='20' title='20' class='latex' /> to <img src='http://s.wordpress.com/latex.php?latex=200&#038;bg=T&#038;fg=000000&#038;s=0' alt='200' title='200' class='latex' />. For the estimation of the factors the authors used Tikhonov regularisation to minimise the cost function :</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=%20%20min_%7Bx%5E%2A%2Cy%5E%2A%7D%5Csum_%7Bu%2Ci%7D%7Bc_%7Bui%7D%28p_%7Bui%7D-x_u%5E%7BT%7Dy_i%7D%29%5E2%2B%5Clambda%28%5Csum_u%7B%7C%7Cx_u%7C%7C%5E2%7D%2B%5Csum_i%7B%7C%7Cy_i%7C%7C%5E2%7D%29%20%20%5Cvspace%7B-7pt%7D%20%20&#038;bg=T&#038;fg=000000&#038;s=0' alt='  min_{x^*,y^*}\sum_{u,i}{c_{ui}(p_{ui}-x_u^{T}y_i})^2+\lambda(\sum_u{||x_u||^2}+\sum_i{||y_i||^2})  \vspace{-7pt}  ' title='  min_{x^*,y^*}\sum_{u,i}{c_{ui}(p_{ui}-x_u^{T}y_i})^2+\lambda(\sum_u{||x_u||^2}+\sum_i{||y_i||^2})  \vspace{-7pt}  ' class='latex' /></p>
<p>The greatest characteristic of this optimisation problem is that 1) it incorporates confidence levels thus it takes into consideration how sufficient is the behaviour observation and 2) it can be solved by an algorithm called Alternative-Least-Squares with Weighted-<img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=T&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' />-Regularization (ALS-WR) whose sub steps can be parallelized. According this method, by fixing <img src='http://s.wordpress.com/latex.php?latex=x_u&#038;bg=T&#038;fg=000000&#038;s=0' alt='x_u' title='x_u' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=y_i&#038;bg=T&#038;fg=000000&#038;s=0' alt='y_i' title='y_i' class='latex' /> alternately the cost function is minimised using simple quadratic minimisation methods. The following algorithm is used to compute the factors :</p>
<div style="border-top: solid 1px #c0c0c0; border-bottom: solid 1px #c0c0c0;padding-top:15px">
<ol>
<li>Initialise <img src='http://s.wordpress.com/latex.php?latex=y_i&#038;bg=T&#038;fg=000000&#038;s=0' alt='y_i' title='y_i' class='latex' /> by assigning average item ratings to the first entry and random numbers for the remaining entries.</li>
<li>Fix <img src='http://s.wordpress.com/latex.php?latex=Y&#038;bg=T&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' /> matrix (<img src='http://s.wordpress.com/latex.php?latex=y_i&#038;bg=T&#038;fg=000000&#038;s=0' alt='y_i' title='y_i' class='latex' />s) and compute <img src='http://s.wordpress.com/latex.php?latex=x_u%20%3D%20%28Y%5ETC%5EuY%20%2B%20%5Clambda%20I%29%5E%7B-1%7DY%5ETC%5Eup%28u%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='x_u = (Y^TC^uY + \lambda I)^{-1}Y^TC^up(u)' title='x_u = (Y^TC^uY + \lambda I)^{-1}Y^TC^up(u)' class='latex' /></li>
<li>Fix <img src='http://s.wordpress.com/latex.php?latex=X&#038;bg=T&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' /> matrix (<img src='http://s.wordpress.com/latex.php?latex=x_u&#038;bg=T&#038;fg=000000&#038;s=0' alt='x_u' title='x_u' class='latex' />s) and compute <img src='http://s.wordpress.com/latex.php?latex=y_i%20%3D%20%28X%5ETC%5EiX%20%2B%20%5Clambda%20I%29%5E%7B-1%7DX%5ETC%5Eup%28i%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='y_i = (X^TC^iX + \lambda I)^{-1}X^TC^up(i)' title='y_i = (X^TC^iX + \lambda I)^{-1}X^TC^up(i)' class='latex' /></li>
<li>Repeat step 2 and 3 until stoping criterion is satisfied.</li>
</ol>
</div>
<p style="text-align: center; font-size: 13px; font-weight:bold">Algorithm</p>
<p><img src='http://s.wordpress.com/latex.php?latex=C%5Eu&#038;bg=T&#038;fg=000000&#038;s=0' alt='C^u' title='C^u' class='latex' /> is a <img src='http://s.wordpress.com/latex.php?latex=n%20%5Ctimes%20n&#038;bg=T&#038;fg=000000&#038;s=0' alt='n \times n' title='n \times n' class='latex' /> diagonal matrix where <img src='http://s.wordpress.com/latex.php?latex=C%5Eu_%7Bii%7D%3Dc_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='C^u_{ii}=c_{ui}' title='C^u_{ii}=c_{ui}' class='latex' />. Also <img src='http://s.wordpress.com/latex.php?latex=p%28u%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='p(u)' title='p(u)' class='latex' /> contains all values of <img src='http://s.wordpress.com/latex.php?latex=p_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='p_{ui}' title='p_{ui}' class='latex' />. Please note that matrix <img src='http://s.wordpress.com/latex.php?latex=C%5Eu&#038;bg=T&#038;fg=000000&#038;s=0' alt='C^u' title='C^u' class='latex' /> is very sparse and only <img src='http://s.wordpress.com/latex.php?latex=n_u%20%5Cll%20n&#038;bg=T&#038;fg=000000&#038;s=0' alt='n_u \ll n' title='n_u \ll n' class='latex' /> are non-zero.<br />
Complexity of step 2 is <img src='http://s.wordpress.com/latex.php?latex=O%28f%5E2N%2Bf%5E3m%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='O(f^2N+f^3m)' title='O(f^2N+f^3m)' class='latex' /> and for step 3 <img src='http://s.wordpress.com/latex.php?latex=O%28f%5E2N%2Bf%5E3n%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='O(f^2N+f^3n)' title='O(f^2N+f^3n)' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=N%3D%5Csum_u%20n_u&#038;bg=T&#038;fg=000000&#038;s=0' alt='N=\sum_u n_u' title='N=\sum_u n_u' class='latex' />. As we already mentioned, steps 2 and 3 are parallelizable and frameworks such as map reduce (Hadoop) can be used ( in fact, a variation of this model for explicit feedback has been implemented on <a href="https://issues.apache.org/jira/browse/MAHOUT-542" target="_blank">Mahout 0.5</a> ). This is of great importance since it makes it suitable for huge number of users and items. So as to recommend <img src='http://s.wordpress.com/latex.php?latex=K&#038;bg=T&#038;fg=000000&#038;s=0' alt='K' title='K' class='latex' /> items for user <img src='http://s.wordpress.com/latex.php?latex=u&#038;bg=T&#038;fg=000000&#038;s=0' alt='u' title='u' class='latex' />, a list of the <img src='http://s.wordpress.com/latex.php?latex=K&#038;bg=T&#038;fg=000000&#038;s=0' alt='K' title='K' class='latex' /> items which maximise <img src='http://s.wordpress.com/latex.php?latex=x_u%5ETy_i&#038;bg=T&#038;fg=000000&#038;s=0' alt='x_u^Ty_i' title='x_u^Ty_i' class='latex' /> is returned.</p>
<p>An important and desired feature of recommendation algorithms is explanations. The current method gives explanations of recommendations by using a clever trick which transform the preferences prediction to linear function of past behaviours, weighted by item-to-item similarity. This item-to-item similarity is specified by the user <img src='http://s.wordpress.com/latex.php?latex=u&#038;bg=T&#038;fg=000000&#038;s=0' alt='u' title='u' class='latex' /> and denoted by <img src='http://s.wordpress.com/latex.php?latex=s%5Eu_%7Bij%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='s^u_{ij}' title='s^u_{ij}' class='latex' />.</p>
<p>For the evaluation of the method they used the average number of predicted positions of items weighted by the observations <img src='http://s.wordpress.com/latex.php?latex=r_%7Bui%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='r_{ui}' title='r_{ui}' class='latex' /> ( denoted <img src='http://s.wordpress.com/latex.php?latex=%5Coverline%7Brank%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='\overline{rank}' title='\overline{rank}' class='latex' /> ). Lower values of <img src='http://s.wordpress.com/latex.php?latex=%5Coverline%7Brank%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='\overline{rank}' title='\overline{rank}' class='latex' /> means more desired predictions but not more accurate predictions. Thus, this is not to be confused with accuracy.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/sbZIjd2JZi8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/03/02/review-collaborative-filtering-for-implicit-feedback-datasets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/03/02/review-collaborative-filtering-for-implicit-feedback-datasets/</feedburner:origLink></item>
		<item>
		<title>name2gender in python</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/wbVAVuoAMPQ/</link>
		<comments>http://ptigas.com/blog/2012/01/21/name2gender-in-python/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 19:26:39 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[data]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=218</guid>
		<description><![CDATA[Given a name or an email address how can you guess the gender ? The answer is as simple as that. It&#8217;s all about data. US Social Security Administration office http://www.ssa.gov/ created a service where you can request the number of births of a babies per name per gender since 1880. However, their pages are<div class="read-more"><a href="http://ptigas.com/blog/2012/01/21/name2gender-in-python/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>Given a name or an email address how can you guess the gender ? The answer is as simple as that. It&#8217;s all about data.</p>
<p>US Social Security Administration office <a href="http://www.ssa.gov/" target="_blank">http://www.ssa.gov/</a> created a service where you can request the number of births of a babies per name per gender since 1880. However, their pages are not easily accessible. After searching a while I found this page <a href="http://www.infochimps.com/datasets/popular-baby-names-by-year-top-1000-us-social-security-administr" target="_blank"> http://www.infochimps.com/datasets/popular-baby-names-by-year-top-1000-us-social-security-administr</a> where is given a method of how to scrape SAS name list with a bash script.</p>
<pre>
!#/bin/sh
url=http://www.ssa.gov/cgi-bin/popularnames.cgi
mkdir -p names
for ((yr=1879 ; $yr <= 2010 ; yr++)) ; do
	echo $yr
	curl -d "year=$yr&#038;top=1000&#038;number=n" $url > names/$yr.html
done
</pre>
<p>Alternatively you can just download the data from infochimp&#8217;s website.</p>
<p>However this is raw information and thus not very helpful. What we need is to have them in a data structure which will enable us to query the gender per name (ideally a distribution).</p>
<p>I used BeautifulSoup to parse the page and extract <name,number of births,gender,year> records.</p>
<pre class="brush: python">import glob
from BeautifulSoup import BeautifulSoup

files = glob.glob('names/*.html')

for f in files :
	html_data = open( f ,'r').read()

	soup = BeautifulSoup(html_data)
	year = soup.find(id="yob")['value']
	tables = soup.findAll('table')
	trs = tables[2].findAll('tr')
	for tr in trs[1:-1]:
		tds = tr.findAll('td')
		print "%s,%s,%s,%s" % (
                       tds[1].contents[0],
                       tds[0].contents[0].replace(',',''),
                       'male',
                       year)
		print "%s,%s,%s,%s" % (
                       tds[3].contents[0],
                       tds[2].contents[0].replace(',',''),
                       'female',
                       year)</pre>
<p>I stored the output to <a href="https://gist.github.com/raw/1965524/d41bca0e4c5adc47bad461597b5576c2efac5365/names.csv" target="_blank">names.csv</a> .</p>
<p>Then, to transform this information to a probability distribution per name I used the following code.</p>
<pre class="brush:python">import json 

def prob( m, f ) :
    s = m + f
    return {'male':m/(1.0*s), 'female':f/(1.0*s)}

def load_data( file ) :
    names = {}
    f = open( file, 'r' )
    for l in f :
        d = l.rstrip().split(',')

        name = d[0]
        counter = d[1]
        gender = d[2]
        year = d[3]

        if name not in names :
            names[name] = { 'male':0, 'female':0 }

        if gender == 'male' :
            names[name]['male'] += int(counter)
        else :
            names[name]['female'] += int(counter)

    return names

db = load_data('names.csv')

names = {}
for d in db:
    p = prob( db[d]['male'], db[d]['female'])
    if p['male'] &gt; p['female'] :
        gender = 'male'
    elif p['female'] &gt; p['male'] :
        gender = 'female'
    else:
        gender = 'both'

    names[d] = gender

print json.dumps( names )</pre>
<p>I saved this to <a href="https://raw.github.com/gist/1965523/b4c164301144730b202d3b3b46663df5ade2f577/names.json" target="_blank">names.json</a>.</p>
<p>Finally, to query the gender of a name you just compare the probabilities</p>
<pre class='brush:python'>
f = open('names.json','r')
names = json.loads( f.read() );
print names
def check_name( name ):
	if name in names :
                if names[name]['male']>names[name]['female']:
                         return 'male';
                elif names[name]['male']>names[name]['female']:
                         return 'female';
                else :
                         return 'unknown';
	else :
		return 'unknown'
</pre>
<p>The probability approach gives the flexibility to compute also a confidence of the gender for a given name [TODO].</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/wbVAVuoAMPQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2012/01/21/name2gender-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2012/01/21/name2gender-in-python/</feedburner:origLink></item>
		<item>
		<title>Robot rock</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/LGkOd027JvQ/</link>
		<comments>http://ptigas.com/blog/2011/05/28/robot-rock/#comments</comments>
		<pubDate>Sat, 28 May 2011 22:24:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NoN]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=199</guid>
		<description><![CDATA[After three weeks of development we finally finished our project in robotics. The task was to construct a robot from lego parts from the lego NXT kit. It wasn&#8217;t so straighforward how to achieve that since a bad design could cause a lot of trouble in the end. Finally, we ended up with a simple<div class="read-more"><a href="http://ptigas.com/blog/2011/05/28/robot-rock/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>After three weeks of development we finally finished our project in robotics. The task was to construct a robot from lego parts from the lego NXT kit. It wasn&#8217;t so straighforward how to achieve that since a bad design could cause a lot of trouble in the end. Finally, we ended up with a simple tripod.</p>
<p>Then, the software part was to programm the robot to perform a taks, known as global localization. In simple words, imagine that you are given a map but you have no idea about the position of the robot in the map. Then, by using a ultrasound sensor you have to find the position and the using this information to drive the robot to a target. As you see, an error in the estimation can be disastrous. </p>
<p>The method we used for the localization part is know as Monte Carlo Localization. This method (uniformly) samples a number of hypotheses (particles) and then by comparing the measurements with the hypothetical measurements we re-sample hypotheses by weightening them so as the more consistent a measurement is, the more probabilities are that a hypothesis will be re-chosen.</p>
<p>After a serveral iterations of the MCL, an estimation of the position is found. Then, using this estimation, with the hope that it is also the correct position, we guide the robot to the target. So, how do we chose how to get from a position to another ? A simple and safe solution to that was to compute the Delauney triangulation of the map and then compute the dual graph of it, <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=000000&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' />. Then we find the closest node of the dual graph to the robot and to the target. So, we have a starting position, a node (A) of <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=000000&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' /> that is close to the starting point, a node (B) of <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=000000&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' /> that is close to the target and a path from A to B on <img src='http://s.wordpress.com/latex.php?latex=G%27&#038;bg=T&#038;fg=000000&#038;s=0' alt='G&#039;' title='G&#039;' class='latex' />. Voila, this is the path that we have to follow so as to be as safe as possible that we won&#8217;t hit on a wall.</p>
<p>Well, here it is on action.</p>
<p><iframe src="http://player.vimeo.com/video/24313757?title=0&amp;byline=0&amp;portrait=0" width="643" height="361" frameborder="0"></iframe></p>
<p>That was fun <img src='http://ptigas.com/blog/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/LGkOd027JvQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/05/28/robot-rock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/05/28/robot-rock/</feedburner:origLink></item>
		<item>
		<title>Sol Robeson wise words</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/y6528CScfac/</link>
		<comments>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/#comments</comments>
		<pubDate>Sun, 15 May 2011 16:03:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NoN]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=182</guid>
		<description><![CDATA[Sol Robeson: Have you met Archimedes? The one with the black spots, you see? You remember Archimedes of Syracuse, eh? The king asks Archimedes to determine if a present he&#8217;s received is actually solid gold. Unsolved problem at the time. It tortures the great Greek mathematician for weeks &#8211; insomnia haunts him and he twists<div class="read-more"><a href="http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<blockquote><p><strong>Sol Robeson:</strong> Have you met Archimedes? The one with the black spots, you see? You remember Archimedes of Syracuse, eh? The king asks Archimedes to determine if a present he&#8217;s received is actually solid gold. Unsolved problem at the time. It tortures the great Greek mathematician for weeks &#8211; insomnia haunts him and he twists and turns in his bed for nights on end. Finally, his equally exhausted wife &#8211; she&#8217;s forced to share a bed with this genius &#8211; convinces him to take a bath to relax. While he&#8217;s entering the tub, Archimedes notices the bath water rise. Displacement, a way to determine volume, and that&#8217;s a way to determine density &#8211; weight over volume. And thus, Archimedes solves the problem. He screams &#8220;Eureka&#8221; and he is so overwhelmed he runs dripping naked through the streets to the king&#8217;s palace to report his discovery</p>
<p><strong>Sol Robeson:</strong> Now, what is the moral of the story?</p>
<p><strong>Maximillian Cohen:</strong> That a breakthrough will come.</p>
<p><strong>Sol Robeson:</strong> Wrong! The point of the story is the wife. You listen to your wife, she will give you perspective, meaning. You need a break, you have to take a bath or you will get nowhere.</p></blockquote>
<p>From Aronovsky’s Pi. I’ve forgotten how much I enjoyed that movie.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/y6528CScfac" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/05/15/sol-robeson-wise-words/</feedburner:origLink></item>
		<item>
		<title>Real-time jam session system</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/m1CtQ1-wTL0/</link>
		<comments>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 00:28:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[MSc Thesis]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=99</guid>
		<description><![CDATA[I&#8217;ve always been fascinated by arts and science. But, what was getting me excited most was their combination and one of my ambitions is to achieve that. As a computer scientist I couldn&#8217;t find a better way to fulfill this ambition. I finally chose my MSc project which combines my two passions. Algorithms and music.<div class="read-more"><a href="http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve always been fascinated by arts and science. But, what was getting me excited most was their combination and one of my ambitions is to achieve that. As a computer scientist I couldn&#8217;t find a better way to fulfill this ambition. I finally chose my MSc project which combines my two passions. <strong>Algorithms</strong> and <strong>music</strong>.</p>
<p>As the title suggests, my project is to research and develop a system which will make use of <a href="http://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> and <a href="http://en.wikipedia.org/wiki/Music_information_retrieval" target="_blank">music information retrieval</a> so as to understand and then generate music automatically, simulating an improvising musician. In short, a system which will improvise in a jam session.</p>
<p>There are several ideas to begin with. The first milestone is to construct a system which will use simple midi input and fixed tempo, like the system suggested in [1]. However, the goal is to be able to use real instruments and beat to be extracted by rhythm instruments (take a look at [2] and [3] for real-time feature extraction).</p>
<p>Here is a list of the tools and languages i am planning to use:</p>
<ul>
<li><a href="http://chuck.cs.princeton.edu/" target="_blank">ChucK</a> and <a href="http://wekinator.cs.princeton.edu/" target="_blank">weakinator</a>.</li>
<li><a href="http://en.wikipedia.org/wiki/Pure_Data" target="_blank">Pure Data</a> and/or <a href="http://en.wikipedia.org/wiki/Max_(software)" target="_blank">Max/MSP</a> for interaction, programming, generation (using it with <a href="http://www.ableton.com/maxforlive" target="_blank">max for live</a>).</li>
<li><a href="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank">WEKA</a> for machine learning.</li>
<li><a href="ableton.com" target="_blank">Ableton</a> for audio synthesis (generative part)</li>
</ul>
<p>Feel free to leave a comment, to make a suggestion or just tell me what you think.<br />
<strong> </strong></p>
<p><strong>References</strong></p>
<p>1. <strong>Kitahara, Tetsuro, Naoyuki Totani, R. Tokuami, and H. Katayose</strong>. 2010. BayesianBand: Jam Session System Based on Mutual Prediction by User and System.<em> Entertainment Computing ICEC 2009</em></p>
<p>2.<strong> Stark, A.M., and M.D. Plumbley. </strong>2009. “Real-time chord recognition for live performance.”in <em>Proceedings of International Computer Music Conference</em>.</p>
<p>3. <strong>Stark, Adam M, Matthew E P Davies, and Mark D Plumbley</strong>. 2009. “REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO Centre for Digital Music Queen Mary University of London London , United Kingdom.” <em>Analysis</em> 1-6.</p>
<p>&nbsp;</p>
<p><em><br />
</em></p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/m1CtQ1-wTL0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/02/24/real-time-jam-session-system/</feedburner:origLink></item>
		<item>
		<title>Simple CAPTCHA solver in python</title>
		<link>http://feedproxy.google.com/~r/ptigas/~3/K2io6wCBtik/</link>
		<comments>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 00:13:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://ptigas.com/blog/?p=11</guid>
		<description><![CDATA[This is a very simple method for exploiting very simple CAPTCHAs  like those proposed here and here . In this example we are going to use the following images. test image It&#8217;s easy to observe the followings. First of all, a fixed size (monospace) font has been used. This makes extracting all the letters and<div class="read-more"><a href="http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/">read more »</a></div>]]></description>
			<content:encoded><![CDATA[<p>This is a very simple method for exploiting very simple CAPTCHAs  like those proposed <a href="http://www.white-hat-web-design.co.uk/articles/php-captcha.php" target="_blank">here</a> and <a href="http://www.white-hat-web-design.co.uk/articles/php-captcha.php" target="_blank">here</a> .</p>
<p>In this example we are going to use the following images.</p>
<p style="text-align: center;"><img class="alignnone size-full wp-image-29" title="test" src="http://ptigas.com/blog/wp-content/uploads/2011/02/test1.jpg" alt="" width="75" height="25" /> <img class="alignnone size-full wp-image-30" title="test2" src="http://ptigas.com/blog/wp-content/uploads/2011/02/test2.jpg" alt="" width="75" height="25" /><br/><span style="font-size:13px">test image</span></p>
<p>It&#8217;s easy to observe the followings. First of all, a fixed size (monospace) font has been used. This makes extracting all the letters and using them as masks to check each digit, one by one, very easy. Also, the alphabet is simple lowercase hexadecimal letters. Thus, we had to extract only 16 letters.</p>
<p>The first part was to to extract all the letters. To achieve that, first of all we sampled several images so as to be sure that the images we have contains all the 16 letters. Then, using a simple image editor we cropped all the letters, one by one. We had to be careful so all the letters be aligned properly. Here is the final mask.</p>
<p style="text-align: center;"><img class="size-full wp-image-38   aligncenter" title="letters" src="http://ptigas.com/blog/wp-content/uploads/2011/02/letters.bmp" alt="" width="250" /><br/><span style="font-size:13px">mask</span></p>
<p style="text-align: left;">As you can notice there is some noise which we have to remove. After playing with several techniques we finally ended to the following. We turned the image to greyscale. Then we used a threshold to remove some of the noise. Here is the example after the filtering (cropping also applied).</p>
<p style="text-align: center;"><img class="size-full wp-image-34 aligncenter" title="source" src="http://ptigas.com/blog/wp-content/uploads/2011/02/source.bmp" alt="" width="100" /><br/><span style="font-size:13px">after filtering</span></p>
<p style="text-align: left;">So, now we have the image almost cleared and some letters to play with.</p>
<p style="text-align: left;">
<h2>Procedure</strong></h2>
<p style="text-align: left;">Move each letter across the image and take the difference of the pixels for each position and sum them. Thus for each position we have a score of how much the letter (mask) fits the letter behind it. Then, store for each letter the position where the maximum score found. Then sort by score, take the top five results (our captcha is five letters) and finally sort by position. The result is the CAPTCHA text.</p>
<p style="text-align: left;">More formally</p>
<p style="text-align: left;">Let</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=d%28I%2Cl%2Co%29%3D%5Csum_%7B0%5Cleq%20i%20%5Cleq%20W%20%5C%5C%200%20%5Cleq%20j%20%5Cleq%20H%7D%7B%5BI%28o%2Bi%2C%20j%29-l%28i%2Cj%29%5D%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='d(I,l,o)=\sum_{0\leq i \leq W \\ 0 \leq j \leq H}{[I(o+i, j)-l(i,j)]}' title='d(I,l,o)=\sum_{0\leq i \leq W \\ 0 \leq j \leq H}{[I(o+i, j)-l(i,j)]}' class='latex' /></p>
<p style="text-align: left;">be the distance of the image <img src='http://s.wordpress.com/latex.php?latex=I&#038;bg=T&#038;fg=000000&#038;s=0' alt='I' title='I' class='latex' />, with the letter <img src='http://s.wordpress.com/latex.php?latex=l&#038;bg=T&#038;fg=000000&#038;s=0' alt='l' title='l' class='latex' /> in position <img src='http://s.wordpress.com/latex.php?latex=o&#038;bg=T&#038;fg=000000&#038;s=0' alt='o' title='o' class='latex' /></p>
<p style="text-align: left;">Then</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=p%28I%2Cl%29%20%3D%20%5Carg%5Cmax_%7Bo%7Dd%28I%2Cl%2Co%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='p(I,l) = \arg\max_{o}d(I,l,o)' title='p(I,l) = \arg\max_{o}d(I,l,o)' class='latex' /></p>
<p style="text-align: left;">Thus, we need 5 letters <img src='http://s.wordpress.com/latex.php?latex=l_%7B1%7D%2Cl_%7B2%7D%2Cl_%7B3%7D%2Cl_%7B4%7D%2Cl_%7B5%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='l_{1},l_{2},l_{3},l_{4},l_{5}' title='l_{1},l_{2},l_{3},l_{4},l_{5}' class='latex' /> with maximum <img src='http://s.wordpress.com/latex.php?latex=d%28l_%7Bi%7D%2CI%2Co%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='d(l_{i},I,o)' title='d(l_{i},I,o)' class='latex' /> ordered by <img src='http://s.wordpress.com/latex.php?latex=p%28l_%7Bi%7D%2C%20I%29&#038;bg=T&#038;fg=000000&#038;s=0' alt='p(l_{i}, I)' title='p(l_{i}, I)' class='latex' />.</p>
<pre class="brush: python">def p(img, letter):
		A = img.load()
		B = letter.load()
		mx = 1000000
		max_x = 0
		x = 0
		for x in xrange(img.size[0]-letter.size[0]):
			sum = 0
			for i in xrange(letter.size[0]):
			    for j in xrange(letter.size[1]):
					sum = sum + abs(A[x+i, j][0] - B[i, j][0])
			if sum &lt; mx :
				mx = sum
				max_x = x
		return (mx, max_x)</pre>
<p style="text-align: left;">Here is the code which implements this method. You can browse and download everything from <a href="https://github.com/ptigas/simple-CAPTCHA-solver" target="_blank">https://github.com/ptigas/simple-CAPTCHA-solver</a></p>
<p style="text-align: left;">
<p style="text-align: left;">
<pre class="brush: python">from PIL import Image

def ocr(im, threshold = 200, aplhabet = "0123456789abcdef"):
	img = Image.open(im)
	img = img.convert("RGB")
	box = (8, 8, 58, 18)
	img = img.crop(box)
	pixdata = img.load()

	letters = Image.open('letters.bmp')
	ledata = letters.load()

	# Clean the background noise, if color != black, then set to white.
	for y in xrange(img.size[1]):
	    for x in xrange(img.size[0]):
			if not(pixdata[x, y][0] &gt; threshold \
			and pixdata[x, y][1] &gt; threshold \
			and pixdata[x, y][2] &gt; threshold):
				pixdata[x, y] = (0, 0, 0, 255)
			else:
				pixdata[x, y] = (255, 255, 255, 255)

	counter = 0;
	old_x = -1;

	letterlist = []

	for x in xrange(letters.size[0]):
		black = True
		for y in xrange(letters.size[1]):
			if ledata[x, y][0] &lt;&gt; 0 :
				black = False
				break
		if black :
			if True :
				box = (old_x+1, 0, x, 10)
				letter = letters.crop(box)
				t = p(img, letter);
				print counter, x, t
				letterlist.append((t[0],aplhabet[counter], t[1]))
			old_x = x
			counter = counter + 1

	box = (old_x+1, 0, 140, 10)
	letter = letters.crop(box)
	t = p(img, letter)
	letterlist.append((t[0],aplhabet[counter], t[1]))

	t = sorted(letterlist)
	t = t[0:5] # 5-letter captcha

	final = sorted(t, key=lambda x: x[2])
	answer = ""
	for l in final:
		answer = answer + l[1]
	return answer

print ocr('test.jpg')</pre>
<p style="text-align: left;"><strong>[update]</strong></p>
<p style="text-align: left;">Today I found <a href="http://www.wausita.com/captcha/" target="_blank">this</a>. Very nice tutorial for CAPTCHA solving using python and vector space searching.</p>
<img src="http://feeds.feedburner.com/~r/ptigas/~4/K2io6wCBtik" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/</feedburner:origLink></item>
	</channel>
</rss>

