<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Daniel Lemire's blog</title>
	
	<link>http://www.daniel-lemire.com/blog</link>
	<description>I am a Computer Science professor at UQAM: Web, OLAP, Databases, Time Series, Collaborative Filtering, Information Retrieval, e-Learning.</description>
	<lastBuildDate>Wed, 17 Mar 2010 17:02:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/daniel-lemire/atom" /><feedburner:info uri="daniel-lemire/atom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><feedburner:emailServiceId>daniel-lemire/atom</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><feedburner:feedFlare href="http://www.bloglines.com/sub/http://feeds.feedburner.com/daniel-lemire/atom" src="http://www.bloglines.com/images/sub_modern11.gif">Subscribe with Bloglines</feedburner:feedFlare><feedburner:feedFlare href="http://fusion.google.com/add?feedurl=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://buttons.googlesyndication.com/fusion/add.gif">Subscribe with Google</feedburner:feedFlare><feedburner:feedFlare href="http://www.plusmo.com/add?url=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://plusmo.com/res/graphics/fbplusmo.gif">Subscribe with Plusmo</feedburner:feedFlare><feedburner:feedFlare href="http://www.thefreedictionary.com/_/hp/AddRSS.aspx?http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://img.tfd.com/hp/addToTheFreeDictionary.gif">Subscribe with The Free Dictionary</feedburner:feedFlare><feedburner:feedFlare href="http://www.bitty.com/manual/?contenttype=rssfeed&amp;contentvalue=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.bitty.com/img/bittychicklet_91x17.gif">Subscribe with Bitty Browser</feedburner:feedFlare><feedburner:feedFlare href="http://www.newsalloy.com/?rss=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.newsalloy.com/subrss3.gif">Subscribe with NewsAlloy</feedburner:feedFlare><feedburner:feedFlare href="http://www.live.com/?add=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://tkfiles.storage.msn.com/x1piYkpqHC_35nIp1gLE68-wvzLZO8iXl_JMledmJQXP-XTBOLfmQv4zhj4MhcWEJh_GtoBIiAl1Mjh-ndp9k47If7hTaFno0mxW9_i3p_5qQw">Subscribe with Live.com</feedburner:feedFlare><feedburner:feedFlare href="http://mix.excite.eu/add?feedurl=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://image.excite.co.uk/mix/addtomix.gif">Subscribe with Excite MIX</feedburner:feedFlare><feedburner:feedFlare href="http://download.attensa.com/app/get_attensa.html?feedurl=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.attensa.com/blogs/attensa/WindowsLiveWriter/BadgeredintoBadges_10C02/attensa_feed_button5.gif">Subscribe with Attensa for Outlook</feedburner:feedFlare><feedburner:feedFlare href="http://www.webwag.com/wwgthis.php?url=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.webwag.com/images/wwgthis.gif">Subscribe with Webwag</feedburner:feedFlare><feedburner:feedFlare href="http://www.podcastready.com/oneclick_bookmark.php?url=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.podcastready.com/images/podcastready_button.gif">Subscribe with Podcast Ready</feedburner:feedFlare><feedburner:feedFlare href="http://www.flurry.com/pushRssFeed.do?r=fb&amp;url=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.flurry.com/images/flurry_rss_logo2.gif">Subscribe with Flurry</feedburner:feedFlare><feedburner:feedFlare href="http://www.wikio.com/subscribe?url=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.wikio.com/shared/img/add2wikio.gif">Subscribe with Wikio</feedburner:feedFlare><feedburner:feedFlare href="http://www.dailyrotation.com/index.php?feed=http%3A%2F%2Ffeeds.feedburner.com%2Fdaniel-lemire%2Fatom" src="http://www.dailyrotation.com/rss-dr2.gif">Subscribe with Daily Rotation</feedburner:feedFlare><item>
		<title>External-memory shuffling in linear time?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/6v98jzOB3Mc/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/03/15/external-memory-shuffling-in-linear-time/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 18:43:30 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2454</guid>
		<description>You can sort large files while using little memory. The Unix sort tool is a widely available implementation of this idea. Files are written to disk sequentially, without random access. Thus, you can also sort variable-length records, such as lines of text.
What about shuffling? Using the Fisher-Yates algorithm also known as Knuth algorithm, you can [...]</description>
			<content:encoded><![CDATA[<p>You can sort <a href="http://en.wikipedia.org/wiki/External_sorting">large files while using little memory</a>. The <a href="http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html">Unix </a><tt><a href="http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html">sort</a></tt> tool is a widely available implementation of this idea. Files are written to disk sequentially, without random access. Thus, you can also sort variable-length records, such as lines of text.</p>
<p>What about shuffling? Using the <a href="http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Fisher-Yates algorithm</a> also known as Knuth algorithm, you can shuffle large files while using almost no memory. But you need <a href="http://en.wikipedia.org/wiki/Random_access">random access</a> to your files. Thus it is not applicable to variable-length records. And indeed, the Unix <tt>sort</tt> command cannot shuffle. (It has a random-sort option, but it is not a shuffle. Meanwhile, the <a href="http://www.gnu.org/software/coreutils/manual/html_node/shuf-invocation.html#shuf-invocation"><tt>shuf</tt></a> command runs in RAM.)</p>
<p><strong>A solution:</strong> Tag each record with a random number. Pick random numbers from a very large set so that the probability that any two lines have the same random number is small. Then use external-memory sorting. You can implement something similar as <a href="http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/">a single line in Unix</a>.</p>
<p><strong>A better solution?</strong> Shuffling is possible in linear time O(<em>n</em>). Sorting is a harder problem (in <em>O</em>(<em>n</em> log <em>n</em>)). Thus, using a sort algorithm for shuffling—as we just did—is inelegant. Can we shuffle in linear time without random access with variable-length records?</p>
<p>Maybe we could try something concrete? Consider this algorithm:</p>
<ul>
<li>Read the original file in small blocks. Shuffle each block in RAM. Write them to temporary files. View each shuffled block as a stack of records.</li>
<li>Select a non-empty block at random. Pick and remove the record on top of the stack. Append it to the result set. Repeat. (The correct probability assignment for each block is the number of records left in the block divided by the total number of records left.)</li>
</ul>
<p>(As a variation on this algorithm, you can merge the blocks two-by-two.)</p>
<p>Unfortunately, I doubt this algorithm can run in linear time.</p>
<p><strong>Your challenge:</strong> Consider variable-length records. Prove or disprove that we can implement an external-memory shuffle in linear time. Alternatively, come up with an algorithm faster than the sorting-based one.</p>
<p><strong>Update:</strong> Preston L. Bannister proposed an algorithm which solves the problem to my satisfaction. The same algorithm was described by P. Sanders in<a href="http://www.mpi-inf.mpg.de/~sanders/papers/randperm.ps.gz"> Random Permutations on Distributed, External and Hierarchical Memory</a> (Information Processing Letters, 1998).</p>
<p><strong>Reference: </strong> This is a follow-up to my blog post <a href="http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/">External-Memory Shuffles?</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=6v98jzOB3Mc:zYFWoREbEsM:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=6v98jzOB3Mc:zYFWoREbEsM:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/6v98jzOB3Mc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/03/15/external-memory-shuffling-in-linear-time/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/03/15/external-memory-shuffling-in-linear-time/</feedburner:origLink></item>
		<item>
		<title>Which is fastest: integer addition or XOR?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/X02432oOd0s/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 00:26:40 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Science and Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452</guid>
		<description>The bitwise exclusive or (e.g., 1110 XOR 1001 = 0111) looks simpler to compute than integer addition (e.g., 2 + 9 = 11). Some research articles claim that XOR is faster. It appears to be Computer Science folklore. But is it true?
Which line runs faster? (The symbol &amp;#8220;^&amp;#8221; is the XOR.)
for(int k = 0; k &amp;#60; N; [...]</description>
			<content:encoded><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Exclusive_or#Bitwise_operation">bitwise exclusive or</a> (e.g., 1110 XOR 1001 = 0111) looks simpler to compute than <a href="http://en.wikipedia.org/wiki/Addition#Performing_addition">integer addition</a> (e.g., 2 + 9 = 11). Some research articles claim that XOR is faster. It appears to be Computer Science folklore. But is it true?</p>
<p>Which line runs faster? (The symbol &#8220;^&#8221; is the XOR.)</p>
<p><code>for(int k = 0; k &lt; N; ++k) sum+= k;</code></p>
<p><code>for(int k = 0; k &lt; N; ++k) sum^= k;</code></p>
<p><strong>My result: <span style="font-weight: normal;">In C++ and Java, both run at the same speed (within 1%).</span></strong></p>
<p><strong>Disclaimer:<span style="font-weight: normal;"> I&#8217;d be delighted if you could prove me wrong. Please provide Java or C++ source code.</span></strong></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=X02432oOd0s:MuPEbTOgcrQ:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=X02432oOd0s:MuPEbTOgcrQ:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/X02432oOd0s" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/</feedburner:origLink></item>
		<item>
		<title>Language, Mathematics and Programming</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/CNYs9uMTW8M/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/03/08/language-mathematics-and-programming/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 15:41:15 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Software design]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2442</guid>
		<description>Even if you have extensive training in Mathematics, the average Mathematics paper is undistinguishable from the ramblings of a madman. Many of these papers seek to solve narrow problems. And yet, we respect Mathematicians.
Software programming is a form of communication, usually between human beings and machines. While different in style, programming is a subset of [...]</description>
			<content:encoded><![CDATA[<p>Even if you have extensive training in Mathematics, the <a href="http://arxiv.org/list/math/new">average Mathematics paper</a> is undistinguishable from the ramblings of a madman. Many of these papers seek to solve narrow problems. And yet, we respect Mathematicians.</p>
<p>Software programming is a form of communication, usually between human beings and machines. While different in style, programming is a subset of the language of Mathematics. If you dig into the <a href="http://github.com/explore">average source code</a>, it is undistinguishable from ramblings, even if you are an expert developer.</p>
<p>Yet, we denigrate programming. Many will even deny that it is a Mathematical language. But Mathematics and Programming are not so different:</p>
<table style="width: 75%;" border="1">
<tbody>
<tr>
<th>Mathematics</th>
<th>Programming</th>
</tr>
<tr>
<td>Building on the previous research papers requires you to dig through endless piles of boring, badly written research papers.</td>
<td>Maintaining millions of lines of codes written by various people over the years is difficult, boring, error-prone.</td>
</tr>
<tr>
<td>Inventing new theorems or new mathematical theories requires much creativity.</td>
<td>Coming up with the next best iPhone application requires much creativity.</td>
</tr>
<tr>
<td>For most people, mastering even part of Mathematics requires a decade or more.</td>
<td>Please read <a href="http://norvig.com/21-days.html">Teach yourself programming in ten years</a> by Peter Norvig.</td>
</tr>
<tr>
<td>The language of Mathematics has directly contributed to technological progress. Electricity, engines, nuclear power, space travel all required extensive use of Mathematics.</td>
<td>Google changed the world through the brilliance of its software engineers. The open source revolution has changed how people think about collaboration.</td>
</tr>
<tr>
<td>Some Mathematicians are widely recognized as being extremely smart.</td>
<td>Some famous people have done a fair share of difficult and technical programming : <a href="http://en.wikipedia.org/wiki/Donald_Knuth">Donald Knuth and TeX</a>, <a href="http://en.wikipedia.org/wiki/Tim_Berners-Lee">Tim Berners-Lee and the Web</a>, <a href="http://en.wikipedia.org/wiki/Linus_Torvalds">Linus Tovarlds and Linux</a>.</td>
</tr>
</tbody>
</table>
<p>Why is programming getting so little respect?</p>
<ul>
<li>The intense commercialization of programming has commoditized it. As <a href="http://www.amazon.com/Hackers-Painters-ebook/dp/B0026OR2NQ/ref=sr_1_1?ie=UTF8&amp;s=digital-text&amp;qid=1268061219&amp;sr=8-1">Paul Graham might say</a> : painters where initially &#8220;portrait takers&#8221;. It is only when painting lost its commercial function that it became recognized as a noble art. However, just like painters always used their free time to create great art, the best programmers are open sourcing beautiful code all the time.</li>
<li>The study  of programming itself remains rather informal. You can get degrees in Computer Science, Computing Engineering or Software Engineering, but there is no degree in Programming. Programming is taught in universities, but generally only in the first few courses of a degree. Yet, there are degrees in Communication, Fine Art, Architecture, Music or Dance. While a degree in Computer Science or Software Engineer can make you a better programmer, the fact remains that your professors are not expert practitioners.</li>
</ul>
<p>How can we fix this? I have this secret dream of setting up the equivalent of &#8220;Creative Writing&#8221; program, but for programmers. Call it &#8220;Creative Programming&#8221;. Basically, students would come together to write great code. Yes, such code might be useful commercially, but that would be a secondary consideration. The pursuit of greatness would be the only goal that matters. It would treat programming as a bona fide language. It would attract the best programmers as guest lecturers. Would this ever work out? I do not know.</p>
<p>I am sure that many will point out that my secret dream is impractical. Beauty should not come first : we want cheap, reliable, maintainable code. We also want programmers to be replaceable, inexpensive and practical. However, human beings can both pursue greatness while being practical. Compromise is possible.</p>
<p>Let me conclude <a href="http://www.paulgraham.com/knuth.html">by quoting Donald Knuth</a>:</p>
<blockquote><p>(&#8230;) computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. A programmer who subconsciously views himself as an artist will enjoy what he does and will do it better.</p></blockquote>
<p><strong>Further reading:</strong> <a href="http://www.daniel-lemire.com/blog/archives/2010/02/12/the-best-software-developers-are-great-at-mathematics/">The best software developers are great at Mathematics?</a> and <a href="http://www.daniel-lemire.com/blog/archives/2010/02/12/the-best-software-developers-are-great-at-mathematics/">Is programming “technical”?</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=CNYs9uMTW8M:JKtJZ8xXJGo:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=CNYs9uMTW8M:JKtJZ8xXJGo:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/CNYs9uMTW8M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/03/08/language-mathematics-and-programming/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/03/08/language-mathematics-and-programming/</feedburner:origLink></item>
		<item>
		<title>Who the heck got Universities into the email business?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/ZokNitm0BTw/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/03/02/who-the-heck-got-universities-into-the-email-business/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 01:27:07 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2433</guid>
		<description>My current employer, UQAM, refuses to allow email forwarding. Students would rather forward their emails to their existing GMail accounts, for example. And the IT Department (the SITEL) agrees that it would have several benefits. However, they refuse to allow it for the following reasons:

Email forwarding may create infinite email loops. These may disrupt services [...]</description>
			<content:encoded><![CDATA[<p>My current employer, <a href="http://www.uqam.ca/">UQAM</a>, refuses to allow email forwarding. Students would rather forward their emails to their existing GMail accounts, for example. And the IT Department (the <a href="http://www.sitel.uqam.ca/">SITEL</a>) agrees that it would have several benefits. However, they refuse to allow it for the following reasons:</p>
<ul>
<li>Email forwarding may create infinite email loops. These may disrupt services and require human intervention.</li>
<li>Invalid or failing remote servers may saturate the local servers as they are unable to forward the emails.</li>
<li>Professors and management send confidential information by email. Yet, without full control of the email service, the University cannot ensure the needed confidentiality.</li>
<li>With email forwarding, it may be impossible to ensure and prove that an email was received and read. Thus, homework assignments, administrative inquiries or security advisories may never reach the students, or we may be unable to prove that they reach the students because of email forwarding.</li>
<li>As a Canadian University, email forwarding puts us at risk that the emails may transit on American servers, where the Canadian law on privacy is not applicable.</li>
<li>Email forwarding may put students at risk if remote accounts are stolen or lost.</li>
</ul>
<p>Can you help me debunk or mitigate these arguments? I know that some of these arguments are bogus, but I am looking for solid references. (Not that I expect to change their mind.)</p>
<p>A larger issue: shouldn&#8217;t universities stick with research and teaching? I understand that we must have networks, cables, computers, firewalls, but do we need to provide our students with email services?</p>
<p><strong>Update:</strong> Turns out that our IT people encourage students who want forwarding to GMail (say) to use the POP3 protocol. It is unclear to me how email forwarding can be a dangerous practice whereas POP3 &#8220;forwarding&#8221; can be safe.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=ZokNitm0BTw:8ly9BFcOyTE:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=ZokNitm0BTw:8ly9BFcOyTE:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/ZokNitm0BTw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/03/02/who-the-heck-got-universities-into-the-email-business/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/03/02/who-the-heck-got-universities-into-the-email-business/</feedburner:origLink></item>
		<item>
		<title>Is programming “technical”?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/-bf_QKVXDTQ/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/03/01/is-programming-technical/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 21:35:02 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Science and Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2430</guid>
		<description>According to student evaluations, most of my students appreciate short programming assignments. Yet, every year, some students think that programming is below them or unimportant.
Maybe I should start my courses with this theorem:
Theorem. If you understand an idea, you can implement it in software.
There is no denying that programming requires a lot of technical knowledge. [...]</description>
			<content:encoded><![CDATA[<p>According to student evaluations, most of my students appreciate short programming assignments. Yet, every year, some students think that programming is below them or unimportant.</p>
<p>Maybe I should start my courses with this theorem:</p>
<blockquote><p><strong>Theorem. </strong>If you understand an idea, you can implement it in software.</p></blockquote>
<p>There is no denying that programming requires a lot of technical knowledge. Most programmers do technical jobs, involving testing, building or refactoring code. But programming is ultimately a communication form. And it is as noble as Mathematics or English. Let us compare:</p>
<ul>
<li>Writers are considered sexy and non-technical people. Yet, grammar and spelling are technical. Moreover, most writers earn a living by writing ads for boring products. Some of them make a living with grand novels, but fewer than you think.</li>
<li>Physicists are great thinkers. Yet, their mathematical derivations are often mind-numbing and technical. Many physicists spend years running extremely technical experiments. And when they don&#8217;t, they program extremely complex (and technical) simulations.</li>
</ul>
<p>For some reason, being a writer is somehow considered more prestigious than being a programmer. If you ask me, <a href="http://en.wikipedia.org/wiki/Linus_Torvalds">Linus Torvalds</a> is every bit as cool <a href="http://en.wikipedia.org/wiki/J._K._Rowling">J. K. Rowling</a>. And I&#8217;d rather have a lunch date with Linus.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=-bf_QKVXDTQ:QVo_MeZ9oVU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=-bf_QKVXDTQ:QVo_MeZ9oVU:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/-bf_QKVXDTQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/03/01/is-programming-technical/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/03/01/is-programming-technical/</feedburner:origLink></item>
		<item>
		<title>Most common questions about recommender systems…</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/HKWP-OABvss/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/02/22/most-common-questions-about-recommender-systems/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 15:27:03 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Science and Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2421</guid>
		<description>I get ten to fifteen questions a week on recommender systems from entrepreneurs and engineers. Sometimes, I help people find their way in the literature. On occasion—for a consulting fee—I get my hands dirty and evaluate, design or code specific algorithms.  But mostly, I answer the same questions again and again:
1. How much data do [...]</description>
			<content:encoded><![CDATA[<p>I get ten to fifteen questions a week on recommender systems from entrepreneurs and engineers. Sometimes, I help people find their way in the literature. On occasion—for a consulting fee—I get my hands dirty and evaluate, design or code specific algorithms.  But mostly, I answer the same questions again and again:</p>
<p><strong>1. How much data do I need? </strong></p>
<p>Given your data, you can use <a href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)">cross-validation</a> or <a href="http://en.wikipedia.org/wiki/A/B_testing">A/B testing</a> to measure objectively the effectiveness of a recommender system.</p>
<p><strong>2. We have this system in place, how do we know whether it is sane?</strong></p>
<p>See previous question.</p>
<p><strong>3. My online recommender system is slow!</strong></p>
<p>Laziness is your friend: don&#8217;t recompute the recommendations each time you have new data.</p>
<p><strong>4. My customers don&#8217;t like the recommendations!</strong></p>
<ul>
<li>Keep expectations in check: recommending products is difficult and even human beings have trouble doing it,</li>
<li>Explain the recommendations: nobody trusts a black box,</li>
<li>Allow your users to freely explore your data and products in convenient and exciting ways.</li>
</ul>
<p><strong>5. Which algorithm is best?</strong></p>
<p>You should start with <a href="http://en.wikipedia.org/wiki/Slope_One">simple algorithms</a>: it worked well enough for Amazon. To do better, a mix of different algorithms is probably best. You can combine them using <a href="http://en.wikipedia.org/wiki/Ensemble_learning">ensemble methods</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=HKWP-OABvss:z2qa6_3GDAY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=HKWP-OABvss:z2qa6_3GDAY:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/HKWP-OABvss" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/02/22/most-common-questions-about-recommender-systems/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/02/22/most-common-questions-about-recommender-systems/</feedburner:origLink></item>
		<item>
		<title>The best software developers are great at Mathematics?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/vhI9izt49oI/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/02/12/the-best-software-developers-are-great-at-mathematics/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 16:46:15 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2419</guid>
		<description>One of the upsides of working for a university are the stimulating academic discussions. Yesterday, a philosopher challenged me a question:
Beyond the fact that software is expressed in Mathematics artefacts (bits, algorithms), are Information Systems fundamentally Mathematical?
For my convenience, I temporarily rephrase the question to something simpler and more concrete:
How are Software Developers limited by [...]</description>
			<content:encoded><![CDATA[<p>One of the upsides of working for a university are the stimulating academic discussions. Yesterday, a philosopher challenged me a question:</p>
<blockquote><p>Beyond the fact that software is expressed in Mathematics artefacts (bits, algorithms), <strong>are Information Systems fundamentally Mathematical</strong>?</p></blockquote>
<p>For my convenience, I temporarily rephrase the question to something simpler and more concrete:</p>
<blockquote><p><strong>How are Software Developers limited by their mathematical weaknesses?</strong></p></blockquote>
<p>I plan several blog posts around this question, but let me start with an example.</p>
<p>A common and powerful language to process XML is <a href="http://en.wikipedia.org/wiki/XPath">XPath</a>. XPath is used within web applications, scripts, databases, and so on. I often ask students the following question about XPath. Are these two expressions equivalent?</p>
<blockquote><p><tt>$x="some string"</tt></p></blockquote>
<p>and</p>
<blockquote><p><tt>not($x!="some string")</tt>.</p></blockquote>
<p>(The symbol &#8220;<tt>!=</tt>&#8221; means &#8220;different from&#8221;.)</p>
<p>Invariably, most students conclude that they are equivalent. <strong>Wrong!</strong></p>
<p>Let us examine the semantics.</p>
<ul>
<li>The expression <tt>$x="some string"</tt> means that at least one element of <tt>$x</tt> is equal to <tt>"some string"</tt>.</li>
<li>The expression <tt>$x!="some string"</tt> means that some element of <tt>$x</tt> is different from <tt>"some string"</tt>.</li>
<li>The <strong>negation</strong> of <tt>$x!="some string"</tt> is that all elements of <tt>$x</tt> are equal to <tt>"some string"</tt>. (Sorry if it sounds confusing.)</li>
</ul>
<p>Thus, the expression <tt>not($x!="some string")</tt> is a  more restrictive condition than the expression <tt>$x="some string"</tt>.</p>
<p>Great software developers routinely think through far more complex mathematical problems. Yet, they do not think of them as being Mathematics.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=vhI9izt49oI:RI4KJ8ekejU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=vhI9izt49oI:RI4KJ8ekejU:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/vhI9izt49oI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/02/12/the-best-software-developers-are-great-at-mathematics/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/02/12/the-best-software-developers-are-great-at-mathematics/</feedburner:origLink></item>
		<item>
		<title>Open Sourcing your software hurts your competitiveness as a researcher?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/yCTh62CVGi0/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/02/10/open-sourcing-your-software-hurts-your-competitiveness-as-a-researcher/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 19:15:08 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2411</guid>
		<description>Almost all software I write for my research is open sourced. Some fellow researcher argued today that I risk reducing the gap between and my pursuers. Similarly, I should keep my data to myself (and avoid listing good sources of research data).
Here is my take on this issue.

Sharing can&amp;#8217;t hurt the small fish. Almost nobody [...]</description>
			<content:encoded><![CDATA[<p>Almost all software I write for my research is <a href="http://code.google.com/u/lemire/">open sourced</a>. Some fellow researcher argued today that I risk <em>reducing the gap between and my pursuers</em>. Similarly, I should keep my data to myself (and avoid <a href="http://www.daniel-lemire.com/blog/data-for-data-mining/">listing good sources of research data</a>).</p>
<p>Here is my take on this issue.</p>
<ol>
<li><strong>Sharing can&#8217;t hurt the small fish.</strong> Almost nobody sets out to beat Daniel Lemire at some conference next year. I have no pursuer. And guess what? You probably don&#8217;t. But if you do, you are probably doing quite well already, so stop worrying. Yes, yes, they will give you a grant even if you don&#8217;t actively sabotage your competitors. Relax already!</li>
<li><strong>Sharing your code makes you more convincing</strong>. By making your work easier to reproduce, you are instantly more credible. Trust is important in science. Why would anyone trust that I actually wrote the code and ran the experiments? Because I published my code, that&#8217;s why!</li>
<li><strong>Source code helps spread your ideas faster. </strong>On the long run, you should not care about getting papers accepted at some hot conference. What matters is the impact you have had. Make it easy for me to use your ideas! Help yourself!</li>
<li><strong>Sharing raises your profile in industry.</strong> Having open source software makes your more attractive to software engineers.</li>
<li><strong>You write better software if you share it.</strong> While not all code I publish is bug-free, documented or even usable, I care slightly more about my code because I publish it.</li>
</ol>
<p>Finally, does sharing code works? Do people download and use my software? Here are download statistics for my latest source-code publications:</p>
<table border="1">
<tbody>
<tr>
<td><a href="http://code.google.com/p/javaewah/">A compressed alternative to the Java BitSet class</a></td>
<td>over 280 downloads</td>
</tr>
<tr>
<td><a href="http://code.google.com/p/ngramhashing/">Rolling Hash C++ Library</a></td>
<td>over 200 downloads</td>
</tr>
<tr>
<td><a href="http://code.google.com/p/lemurbitmapindex/">Lemur Bitmap Index C++ Library</a></td>
<td>over 2 000 downloads</td>
</tr>
<tr>
<td><a href="http://code.google.com/p/lbimproved/">Fast Nearest-Neighbor Retrieval under the Dynamic Time Warping</a></td>
<td>over 1400 downloads</td>
</tr>
</tbody>
</table>
<p><strong>Related reading:</strong> <a href="http://geomblog.blogspot.com/2010/02/good-prototyping-software.html">Good prototyping software</a> and <a href="http://geomblog.blogspot.com/2010/02/challenge-of-doing-good-experimental.html">The challenge of doing good experimental work</a> by Suresh Venkatasubramanian. And  <a href="http://mybiasedcoin.blogspot.com/2009/01/more-on-algorithms-and-implementation.html">More on algorithms and implementation</a> by  Michael Mitzenmacher.</p>
<p><strong>Update:</strong> Joachim Wuttke pointed out another potential benefit: your users will debug your code.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=yCTh62CVGi0:bQjmBte2yMs:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=yCTh62CVGi0:bQjmBte2yMs:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/yCTh62CVGi0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/02/10/open-sourcing-your-software-hurts-your-competitiveness-as-a-researcher/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/02/10/open-sourcing-your-software-hurts-your-competitiveness-as-a-researcher/</feedburner:origLink></item>
		<item>
		<title>Trading latency for quality in research</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/wujou371Nd8/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/02/08/trading-latency-for-quality-in-research/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 14:36:49 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2406</guid>
		<description>I am not opposed to the Publish or Perish mantra. I am an academic writer. I am what I publish. We all think of researchers as people wearing laboratory coats, working on exotic devices. And my own laboratory includes a one-million-dollar computer cluster with a SAN server as large as a fridge. I also generate [...]</description>
			<content:encoded><![CDATA[<p>I am not opposed to the <a href="http://en.wikipedia.org/wiki/Publish_or_perish">Publish or Perish</a> mantra. I am an academic writer. I am what I publish. We all think of researchers as people wearing laboratory coats, working on exotic devices. And my own laboratory includes a one-million-dollar computer cluster with a <a href="http://en.wikipedia.org/wiki/Storage_area_network">SAN server</a> as large as a fridge. I also <a href="http://code.google.com/u/lemire/">generate much software</a>. But you know what? <strong>The writing is what matters.</strong></p>
<p>And publishing is easy. Write and submit many papers  conforming to the expectations of the editors. Eventually, some of your work will be accepted. And there are thousands of journals, conferences and workshops. <strong>Just </strong><a href="http://www.amazon.ca/How-Write-Lot-Practical-Productive/dp/1591477433"><strong>write a lot</strong></a><strong>.</strong></p>
<p>Yet, <a href="http://www.daniel-lemire.com/blog/archives/2009/01/22/why-i-write-bad-papers-sometimes/">don&#8217;t publish everything you write</a>—even when what you wrote looks like a research paper. Hold on to it.  Because, publishing everything that looks like a research paper leads to what Feynman famously described as <a href="http://en.wikipedia.org/wiki/Cargo_cult_science">Cargo Cult Science</a>. Indeed, there is a real danger that we become so good at faking science that we are no longer doing science at all! <strong>We become dishonest.</strong></p>
<p>In our haste to be published&#8230;</p>
<ul>
<li>we cut corners in our experiments, when we validate our ideas at all;</li>
<li>we pretend that our work is applicable in the real world, when it isn&#8217;t;</li>
<li>we don&#8217;t take the time to reproduce and reflect on known results;</li>
<li>we give the positive aspects of our research while <strong>omitting</strong> to mention the negatives;</li>
<li>we complexify the issues so that our research looks fancier;</li>
<li>we get lost in abstract nonsense.</li>
</ul>
<p>If you want your work to really matter, you should be honest. You should not fool yourself and others. So what do we do? Maybe we should <strong>publish carefully</strong>. While barely reducing our output rate as academic writers, we can introduce extra steps to keep us more honest. What do we need?</p>
<ul>
<li>Diverse point of views: it is easy to fool a small group of like-minded experts, but comparatively more difficult to fool the readers of my blog.</li>
<li>Time to reflect: if you read what you wrote months ago, and you don&#8217;t feel the urgency to communicate it more broadly, maybe it wasn&#8217;t all that good to begin with?</li>
</ul>
<p>The problem is that once a paper is published in a journal or a conference, we tend to move on. Anyhow, we cannot easily revise our published work. Are there other models? Economists regularly publish working papers—commonly known in Computer Science as technical reports. But the difference between computer scientists and economists is that economists revise their working papers. And only when their work has stood the test of time, that is, has been available freely for months or years, do they submit it to conventional peer review.</p>
<p>This year, I will try the following experiment. Both on this blog and on my publication page, I will &#8220;publish&#8221; working papers and specifically ask readers to be critical of my work. Only after a couple of months have passed (or more) will I submit my work to a journal or conference.</p>
<p>This will introduce some latency in my publication output. Can I trade latency for quality? I plan to report back in a year on this (very public) experiment.</p>
<p><strong>Further reading</strong>: <a href="http://cacm.acm.org/magazines/2009/8/34492-viewpoint-time-for-computer-science-to-grow-up/fulltext">Time for computer science to grow up</a> by Lance Fortnow.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=wujou371Nd8:DboPf9wLBdA:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=wujou371Nd8:DboPf9wLBdA:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/wujou371Nd8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/02/08/trading-latency-for-quality-in-research/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/02/08/trading-latency-for-quality-in-research/</feedburner:origLink></item>
		<item>
		<title>Where to get your ebooks?</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/YtxaSIdHQLE/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/02/03/where-to-get-your-ebooks/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 15:36:18 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Science and Technology]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2401</guid>
		<description>If you read my blog, you probably like to read in general. Thus, if you don&amp;#8217;t own an ebook device, you will soon. The choice is growing: the Amazon Kindle, the Sony Reader, the Apple iPad,&amp;#8230; I bought a kindle because my wife won&amp;#8217;t let me fill the house with books. And I hate to [...]</description>
			<content:encoded><![CDATA[<p>If you read my blog, you probably like to read in general. Thus, if you don&#8217;t own an ebook device, you will soon. The choice is growing: the <a href="http://en.wikipedia.org/wiki/Amazon_Kindle">Amazon Kindle</a>, the <a href="http://en.wikipedia.org/wiki/Sony_Reader">Sony Reader</a>, the <a href="http://en.wikipedia.org/wiki/Apple_iPad">Apple iPad</a>,&#8230; I bought a kindle because my wife won&#8217;t let me fill the house with books. And I hate to throw away perfectly good paper books.</p>
<p>Amazon has most of the market for now. Yet, using the kindle store—on the kindle—is painful. Moreover, Amazon ebooks are protected by <a href="http://en.wikipedia.org/wiki/Digital_rights_management">Digital Right Management</a> (DRM). Amazon sells you crippled ebooks that can <a href="http://gizmodo.com/5298749/kindle-drm-surfaces-to-deny-user-the-books-hes-bought-and-paid-for">stop working</a> if you copy them too often. There are often better alternatives elsewhere.</p>
<p>And, in Canada, there is a two-dollar surcharge for every wireless download using the Kindle. Since most ebooks are 0.5MB or less, <strong>the wireless costs 4$ per megabyte!</strong> This is insulting! Moreover, if you buy a book by mistake—which is annoying common—Amazon will reimburse the cost of the book itself, but not the fee for the wireless download.</p>
<p>Thankfully, you can grab books compatible with the kindle (in <a href="http://en.wikipedia.org/wiki/Mobipocket">Mobipocket format</a>) elsewhere. Then you can drop the file on the kindle using the USB port.</p>
<ul>
<li> You can get nearly 2000 of the great French classic for free on <a href="http://www.ebooksgratuits.com/ebooks.php">ebookgratuits</a>. This include a large fraction of the work of Honoré de Balzac.</li>
<li><a href="http://www.gutenberg.org/">Project Gutenberg</a> offers 30,000 free e-books in various languages (mostly English).</li>
<li><a href="http://www.webscription.net/">WebScription</a> sells DRM-free ebooks in various format. Most books fall into the scifi, young adults and fantasy genres.</li>
</ul>
<p>I am currently reading <a href="http://www.goodreads.com/book/show/88070.You_re_Not_Fooling_Anyone_When_You_Take_Your_Laptop_to_a_Coffee_Shop_Scalzi_on_Writing">You&#8217;re Not Fooling Anyone When You Take Your Laptop to a Coffee Shop</a> by Scalzi. <a href="http://www.webscription.net/p-958-youre-not-fooling-anyone-when-you-take-your-laptop-to-a-coffee-shop.aspx">I bought it at WebScription</a> for six dollars. It is a compilation of Scalzi&#8217;s blog posts on his life as a writer. I am fascinated by how much it ressembles my own life. Well&#8230; Except for the fact that I don&#8217;t get paid when I publish a paper. Maybe I should put together a compilation of posts about my silly work life. Would anyone buy it for six dollars?</p>
<p>I am also reading <a href="http://en.wikipedia.org/wiki/Halting_State">Halting State</a> by Stross which I bought on Amazon for ten dollars. I haven&#8217;t yet gotten into the mood of the novel.</p>
<p><strong>Further reading</strong>:</p>
<ul>
<li>According to a Computer Scientist, the iPad <a href="http://geomblog.blogspot.com/2010/02/could-ipad-make-computer-science.html">could make Computer Science obsolete</a>.</li>
<li>While I don&#8217;t think academic journals will be available on the Kindle any time soon, I think that has mostly to do with <a href="The insane world of academic publishing">how insane academic publishing is</a>.</li>
</ul>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=YtxaSIdHQLE:s1QZsn9X3mY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=YtxaSIdHQLE:s1QZsn9X3mY:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/YtxaSIdHQLE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/02/03/where-to-get-your-ebooks/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/02/03/where-to-get-your-ebooks/</feedburner:origLink></item>
		<item>
		<title>Getting serious about online teaching</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/qqjRxVOvES0/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/01/30/getting-serious-about-online-teaching/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 15:27:17 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2397</guid>
		<description>Earlier this month, Michael Mitzenmacher told us about the record number of students attending his Harvard class online-only. Yesterday, Dick Lipton predicted that online learning will replace campus learning : &amp;#8220;I see no reason that On [Online Universities] could not do as good a job as Un [Campus Universities] with this basic goal [Educate Students].&amp;#8221; In [...]</description>
			<content:encoded><![CDATA[<p>Earlier this month, <a href="http://www.eecs.harvard.edu/~michaelm/">Michael Mitzenmacher</a> told us about the <a href="http://mybiasedcoin.blogspot.com/2010/01/teaching-day-one.html">record number of students attending his Harvard class online-only</a>. Yesterday, <a href="http://en.wikipedia.org/wiki/Richard_J._Lipton">Dick Lipton</a> predicted that online learning will replace campus learning : &#8220;I see no reason that On [Online Universities] could not do as good a job as Un [Campus Universities] with this basic goal [Educate Students].&#8221; In the comments, Lipton questions the importance of credentials and whether social interactions really need the campus.</p>
<p>I have already <a href="http://www.daniel-lemire.com/blog/archives/2008/02/27/online-teaching-is-the-future/">written much on the topic</a> but let me reiterate my message:</p>
<ul>
<li>In this new online world, <strong>professors are not content providers</strong>. They provide structure and motivation. They are role models. And most importantly, by their reputation, <strong>professors can provide certification</strong>. If someone gets a reference letter from Michael Mitzenmacher or Dick Lipton, I trust they know something about Computer Science, because I trust  Michael Mitzenmacher and Dick Lipton. I suspect it is not easy to get these fellows to write fake reference letters because they have a high degree of independence (job security, good money, and so on) and their greatest asset is their reputation.</li>
<li>Students are trained to expect classrooms. <strong>Many students need structure and constant attention. That is not a good thing!</strong> We are effectively training students to be good employees working in large organizations with much structure. Yet, this world made of large and stable organizations has already fallen apart. We urgently need to teach students to learn on their own, using the Web.</li>
<li>Yes, there will always be campus classes, the same way there will always be physical libraries with actual books, and newspapers printed on paper.</li>
</ul>
<p><strong>Further reading</strong>:</p>
<ul>
<li><a href="http://www.news.wisc.edu/15640">College students prefer classes with online learning</a>, University of Wisconsin-Madison</li>
<li><a href="http://www.daniel-lemire.com/blog/archives/2008/09/17/stanford-offers-10-free-online-computer-science-courses/">Stanford offers 10 free online Computer Science courses</a></li>
<li><a href="http://www.daniel-lemire.com/blog/archives/2008/09/11/from-online-courses-to-automated-teaching/">From online courses to… automated teaching</a></li>
<li><a href="http://www.daniel-lemire.com/blog/archives/2008/07/29/some-myths-about-online-teaching/">Some myths about online teaching</a></li>
<li><a href="http://www.daniel-lemire.com/blog/archives/2007/09/26/why-dont-people-use-university-libraries/">Why don’t people use university libraries?</a></li>
</ul>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=qqjRxVOvES0:SrUf_s3c0Vc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=qqjRxVOvES0:SrUf_s3c0Vc:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/qqjRxVOvES0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/01/30/getting-serious-about-online-teaching/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/01/30/getting-serious-about-online-teaching/</feedburner:origLink></item>
		<item>
		<title>You know your research is original when…</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/b8W5hl7u570/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/01/20/you-know-your-research-is-original-when/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 14:17:47 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2393</guid>
		<description>Many consider Frank Hebert&amp;#8217;s Dune the most important work of science-fiction ever written. Consider that Star Wars is just a variation on Dune. Yet, it was rejected by more than twenty publishers, before being finally published. It is likely that publishers rejected Dune precisely because it was such a radical departure for the genre.
Of course, [...]</description>
			<content:encoded><![CDATA[<p>Many consider Frank Hebert&#8217;s <a href="http://en.wikipedia.org/wiki/Dune_(novel)">Dune</a> the most important work of science-fiction ever written. Consider that <a href="http://www.moongadget.com/origins/dune.html">Star Wars is just a variation on Dune</a>. Yet, it was rejected by more than twenty publishers, before being finally published. It is likely that publishers rejected Dune precisely because it was such a radical departure for the genre.</p>
<p>Of course, being rejected does not mean you are original. It could also mean that you are sloppy or uninteresting. However, there may be valid indications of your originality such as:</p>
<ul>
<li><strong>You have no competitor</strong>. Nobody quite does what you do.</li>
<li><strong>You cannot be scooped.</strong> You read new issues of journals looking for fresh ideas, but without fear that someone made you irrelevant.</li>
</ul>
<p><span style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 14px; vertical-align: baseline; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: transparent; background-position: initial initial; background-repeat: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;">As MacLeod put it: <a href="http://gapingvoid.com/2004/07/31/avoid-crowds-altogether/">Don’t try to stand out from the crowd; avoid crowds altogether.</a></span></p>
<p><strong>Further reading</strong>: <a href="http://www.daniel-lemire.com/blog/archives/2009/10/28/the-secret-behind-radical-innovation/">The secret behind radical innovation</a> and <a href="http://www.daniel-lemire.com/blog/archives/2009/08/19/a-recipe-for-interesting-computer-science-research-papers/">A recipe for interesting Computer Science research papers</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=b8W5hl7u570:ts0td-ryhLs:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=b8W5hl7u570:ts0td-ryhLs:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/b8W5hl7u570" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/01/20/you-know-your-research-is-original-when/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/01/20/you-know-your-research-is-original-when/</feedburner:origLink></item>
		<item>
		<title>Writing tools to improve your research productivity</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/MeqsgeOdrGE/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/01/18/writing-tools-to-improve-your-research-productivity/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 15:44:44 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2387</guid>
		<description>Researchers—at least in Computer Science—spend most of their days at a desk typing. Picking the right software for writing is important.
Most of my writing time is spent on LaTeX documents. I have tried typical Word processors in the past, but they get in my way. Indeed, by mixing document content and document presentation, Microsoft Word makes [...]</description>
			<content:encoded><![CDATA[<p>Researchers—at least in Computer Science—spend most of their days at a desk typing. Picking the right software for writing is important.</p>
<p>Most of my writing time is spent on <a href="http://en.wikipedia.org/wiki/LaTeX">LaTeX</a> documents. I have tried typical Word processors in the past, but they get in my way. Indeed, by mixing document content and document presentation, Microsoft Word makes it difficult to maintain consistency. Word is meant for short-lived (or throw-away) business documents. It is easy to get started and get 80% of the job done with Word. However, as the document gains complexity, as the number of revisions grow, as the number of collaborators expands, Microsoft Word becomes inadequate.</p>
<p>Oh! I still use <a href="http://en.wikipedia.org/wiki/OpenOffice">OpenOffice</a> or <a href="http://en.wikipedia.org/wiki/Google_docs">Google Docs</a> to produce quick-and-dirty documents. But for anything that is meant to have lasting value, that is research, I refuse to fall into the Word processor trap. It causes some friction with colleagues, but it is a price I am willing to pay.</p>
<p>I believe <strong>every single graduate student should learn to write without a word processor</strong>. And <strong>serious science students should learn LateX</strong>. Even if you do not care for LaTeX, at least explore alternatives to Word such as <a href="http://en.wikipedia.org/wiki/Scrivener_(software)">Scrivener</a>.</p>
<p>In any case, you are unlikely to need more than a text editor to write your prose:  Charles Stross, one of the best scifi writer alive, <a href="http://www.antipope.org/charlie/blog-static/2010/01/writing-tools.html">wrote many of his novels</a> using a primitive text editor (<a href="http://en.wikipedia.org/wiki/Vim_(text_editor)">Vim</a>). If you have never written without Microsoft Word, how do you know that Word is not holding you back?</p>
<p>Right now, I write using a regular text editor (<a href="http://sourceforge.net/projects/smultron/files/smultron/3.5.1/">Smultron</a> for MacOS) and the <a href="http://tug.org/index.html">TeX Live 2009</a> distribution. I save all my documents to a <a href="http://en.wikipedia.org/wiki/Subversion_(software)">subversion</a> tree. Using a version control tool such as Subversion makes collaboration easy, and it allows me to go back in time years ago. It is a good setup.</p>
<p>Programming is also a form of writing. For my experimental work, I program in C++, Java or Python, often using <a href="http://www.eclipse.org/">Eclipse</a>. I find it is slightly better for programming than my standard writing setup (using only a text editor). Eclipse has great qualities:</p>
<ul>
<li>It stays out of the way. In particular, you can collaborate with people who are not using Eclipse without any problem.  For example, it will happily let you use handcrafted makefiles to compile your C++ programs.</li>
<li>It offers incremental compilation of Java programs. Basically, it compiles as you type.</li>
<li>It suggests corrections for many common compilation errors.</li>
</ul>
<p>Essentially, while Java is still an awful language, Java with Eclipse is almost fun. Eclipse proves that sophisticated software can be helpful to programmers and writers.</p>
<p>Writing is hard and it will always be hard, no matter the tool. But at least, ease your pain!</p>
<p><strong>See also</strong> <a href="http://www.daniel-lemire.com/blog/archives/2009/06/23/physical-tools-to-improve-research-productivity/">Physical tools to improve research productivity</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=MeqsgeOdrGE:ft5pU3mS2t4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=MeqsgeOdrGE:ft5pU3mS2t4:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/MeqsgeOdrGE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/01/18/writing-tools-to-improve-your-research-productivity/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/01/18/writing-tools-to-improve-your-research-productivity/</feedburner:origLink></item>
		<item>
		<title>The fundamental properties of computing</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/TDK4dhjmYOI/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/01/13/the-fundamental-properties-of-computing/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 15:26:13 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2379</guid>
		<description>Physics works with fundamental properties such as mass, speed, acceleration, energy, and so on. Quantum mechanics has a well known trade-off between position and momentum: you can know where I am, or how fast I am going, but not both at the same time.
Algorithms (and their implementations) also have fundamental properties. Running time and memory [...]</description>
			<content:encoded><![CDATA[<p>Physics works with fundamental properties such as mass, speed, acceleration, energy, and so on. Quantum mechanics has a <a href="http://en.wikipedia.org/wiki/Heisenberg_uncertainty_principle">well known trade-off</a> between position and momentum: you can know where I am, or how fast I am going, but not both at the same time.</p>
<p>Algorithms (and their implementations) also have fundamental properties. <strong>Running time</strong> and <strong>memory usage</strong> are the obvious ones. In practice, there is often a trade-off between memory usage and the running time: you can a low memory usage, or a short running time, but not both. Michael Mitzenmacher <a href="http://mybiasedcoin.blogspot.com/2010/01/algorithms-and-data-structures-course.html">reminded me</a> this morning of another: <strong>correctness</strong>. On some difficult problems, you can get a low memory usage and a short running time if you accept an approximate solution.</p>
<p>I believe there are other fundamental properties like <strong>latency</strong>. Consider problems where the volume of the solution and of the input is large: statistics, image processing, finding some subgraph or sublist, text compression, and so on. In such instances, the solution comes out as a stream. You can measure the delay between the input and the output. For example, a program that compresses text by first scanning the whole text might have high latency, even if the overall running time is not large. Similarly, we can give the illusion that a Web browser is faster by beginning the Web page rendering faster, even if the overall running time of the rendering is the same. As another example, I once wrote a paper on computing the <a href="http://arxiv.org/abs/cs.DS/0610046">running maximum/minimum of an array</a> where latency was an issue.</p>
<p>It would be interesting to come up with a listing of all the fundamental properties of computing.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=TDK4dhjmYOI:pyiNQkJcak4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=TDK4dhjmYOI:pyiNQkJcak4:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/TDK4dhjmYOI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/01/13/the-fundamental-properties-of-computing/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/01/13/the-fundamental-properties-of-computing/</feedburner:origLink></item>
		<item>
		<title>The end of ‘mass universities’</title>
		<link>http://feedproxy.google.com/~r/daniel-lemire/atom/~3/drj2OA1X32M/</link>
		<comments>http://www.daniel-lemire.com/blog/archives/2010/01/11/the-end-of-mass-universities/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 18:50:55 +0000</pubDate>
		<dc:creator>Daniel Lemire</dc:creator>
				<category><![CDATA[Academia/Research]]></category>

		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2367</guid>
		<description>In the late sixties and seventies, we wanted universities to become more accessible. We founded the Open University, the Université du Québec, and many other universities with accessibility as part of their mandate.
The stated goal was to make degrees more accessible. We succeeded.
Yet, we are now facing an intriguing paradox due to this success. Technology, [...]</description>
			<content:encoded><![CDATA[<p>In the late sixties and seventies, we wanted universities to become more accessible. We founded the <a href="http://en.wikipedia.org/wiki/Open_university">Open University</a>, the <a href="http://en.wikipedia.org/wiki/Université_du_Québec">Université du Québec</a>, and many other universities with accessibility as part of their mandate.</p>
<p>The stated goal was to make <strong>degrees</strong> more accessible. We succeeded.</p>
<p>Yet, we are now facing an intriguing paradox due to this success. Technology, by making access easier than ever to access educational content, is also shaking the very foundation of the University. As an example of this transformation,  <a href="http://michaelnielsen.org/blog/michael-a-nielsen/">Michael Nielsen</a> was pointing out this morning that you can watch <a href="http://icio.us/eaqgaz">120 hours of lectures on Physics</a> by <a href="http://en.wikipedia.org/wiki/Leonard_Susskind">Lenny Susskind</a>, for free on YouTube. <strong>You are in deep trouble if what you are selling in 2009 are mass-produced lectures.</strong> The market price just went through the floor.</p>
<p>Lance Fortnow <a href="http://twitter.com/fortnow/status/7636722660">pointed us</a> to a <a href="http://www.edge.org/q2010/q10_2.html#rees">short essay</a> by <a href="http://www.ast.cam.ac.uk/~mjr/">Martin Rees</a> about technology and universities. Rees&#8217; point is that technology creates a more level playing field as far as location is concerned. A hundred years ago, airplanes made it possible for Indian Mathematicians to travel to Cambridge where they could be taken seriously. In some sense, airplanes made Indian Mathematicians more globally competitive, though only marginally so. The Web—with repositories such as <a href="http://www.arxiv.org">arXiv</a>—pushes this idea further, an order of magnitude further. After all, <a href="http://en.wikipedia.org/wiki/Grigori_Perelman">Gregori Perelman</a> won a million dollar and the equivalent of a Nobel prize by posting a few papers on arXiv.</p>
<p>The revolution is all around us, not just in Science. Recently, an unknown writer, <a href="http://samlandstrom.com/bio.aspx">Sam Landstrom</a>, posted his novel <a href="http://www.amazon.com/MetaGame-ebook/dp/B002L4EXB0/ref=sr_1_1?ie=UTF8&amp;s=digital-text&amp;qid=1263233957&amp;sr=1-1">MetaGame</a> on the Amazon Kindle. No publisher, no ad campain. Sales rank of his novel? 540. Considering that <a href="http://www.guardian.co.uk/business/2009/dec/28/amazon-ebook-kindle-sales-surge">Amazon reported selling more ebooks than paper books over Christmas</a>, I am sure many authors envy Landstrom success. Yet, Landstrom did not need an office New York City to either write or publish his book. For all I know, he lives in his parents&#8217; basement.</p>
<p>Thankfully, bona fide Universities have some form of monopoly on University degrees. Yet, like Rees, I think that we are coming to the end of the road for mass universities:</p>
<blockquote><p>Traditional universities will survive insofar as they offer mentoring and personal contact to their students. But it&#8217;s less clear that there will be a future for the &#8216;mass university&#8217; where the students are offered little more than a passive role in lectures (generally of mediocre quality) with minimal feedback.</p></blockquote>
<p>One thing is clear to me: <strong>The value of a lecture in front of 80 students—or the equivalent as a webcasted show—is exactly zero. </strong>(From an educational point of view.)</p>
<p><strong>Disclaimer:</strong> Yes, I know that I am making ennemies. Feel free to disagree with me.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/daniel-lemire/atom?a=drj2OA1X32M:JRYMOgNdPXc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/daniel-lemire/atom?i=drj2OA1X32M:JRYMOgNdPXc:D7DqB2pKExk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/drj2OA1X32M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.daniel-lemire.com/blog/archives/2010/01/11/the-end-of-mass-universities/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		<feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2010/01/11/the-end-of-mass-universities/</feedburner:origLink></item>
	</channel>
</rss>
