<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:yt="http://gdata.youtube.com/schemas/2007" version="2.0">
   <channel>
      <title>Shane's Blog</title>
      <description>Pipes Output</description>
      <link>http://pipes.yahoo.com/pipes/pipe.info?_id=HnQwlFCN3RGo3_KV1ZzWFw</link>
      <pubDate>Wed, 10 Mar 2010 02:18:12 -0800</pubDate>
      <generator>http://pipes.yahoo.com/pipes/</generator>
      <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sbutler" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="sbutler" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
         <title>Reducing Churn Through Social Network Analysis</title>
         <link>http://www.dataminingdownunder.com/2010/02/tim-manns-syddm/</link>
         <description>Earlier in the month local data miner Tim Manns presented at the Sydney Data Miners group. Tim spoke on some work been doing at Optus around using mobile call patterns to establish social networks and using these networks to reduce customer churn. Interestingly, there are also applications in many other areas, including data cleansing, for [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=307</guid>
         <pubDate>Tue, 23 Feb 2010 01:33:16 -0800</pubDate>
         <content:encoded><![CDATA[<p>Earlier in the month local data miner <a rel="nofollow" target="_blank" href="http://timmanns.blogspot.com/">Tim Manns</a> presented at the <a rel="nofollow" target="_blank" href="http://www.meetup.com/datarati/">Sydney Data Miners</a> group. Tim spoke on some work been doing at <a rel="nofollow" target="_blank" href="http://www.optus.com.au">Optus</a> around using mobile call patterns to establish social networks and using these networks to reduce customer churn. Interestingly, there are also applications in many other areas, including data cleansing, for example, where one person has purchased two mobile phones and given one to their spouse or child. Using this analysis we can try to determine which account is likely to be the actual account holder and infer the details (such as age) of the other customer.</p>
<p>For a full write up of Tim&#8217;s work, check out <a rel="nofollow" target="_blank" href="http://jtonedm.com/2009/10/20/know-your-customers-by-knowing-who-they-know-paw/">James Taylor&#8217;s PAW 2009 summary</a> or head over to Tim&#8217;s <a rel="nofollow" target="_blank" href="http://timmanns.blogspot.com/">data mining blog</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>PMML Tree Model to Code Converter</title>
         <link>http://www.dataminingdownunder.com/2010/01/pmml-tree-model-to-code-converter/</link>
         <description>Lately I&amp;#8217;ve been trying to come up with a generic way to deploy models on any platform. So I&amp;#8217;d like to share some early code that takes a PMML TreeModel and converts it to R code. The intention is to get the R code generation working right, then extend to support generation for other languages. [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=301</guid>
         <pubDate>Fri, 29 Jan 2010 16:33:58 -0800</pubDate>
         <content:encoded><![CDATA[<p>Lately I&#8217;ve been trying to come up with a generic way to deploy models on any platform. So I&#8217;d like to share some early code that takes a PMML TreeModel and converts it to R code. The intention is to get the R code generation working right, then extend to support generation for other languages. Anyway, <a rel="nofollow" target="_blank" href="http://www.dataminingdownunder.com/pmmltreemodel2R.R">here it is</a> (remember &#8212; early alpha, very rough still!!). Updates to follow soon!</p>]]></content:encoded>
      </item>
      <item>
         <title>AusDM 09 &amp; Analytic Challenge</title>
         <link>http://www.dataminingdownunder.com/2009/07/ausdm09-2/</link>
         <description>Australian Data Mining conference (AusDM09) will be held in Melbourne next December and Dr Phil Brierley of Tiberius Data Mining has put out the call for proposals for an analytic challenge to accompany the conference. Competitions are quite popular in data mining circles and provide a good training ground for new practitioners to get access [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=282</guid>
         <pubDate>Tue, 07 Jul 2009 06:47:52 -0700</pubDate>
         <content:encoded><![CDATA[<p>Australian Data Mining conference (AusDM09) will be held in Melbourne next December and Dr Phil Brierley of <a rel="nofollow" target="_blank" href="http://www.tiberius.biz/">Tiberius Data Mining</a> has put out the call for proposals for an analytic challenge to accompany the conference. Competitions are quite <a rel="nofollow" target="_blank" href="http://www.kdnuggets.com/datasets/competitions.html">popular</a> in data mining circles and provide a good training ground for new practitioners to get access to real data and solve real problems. They also often have surprising results, such as the team who used <a rel="nofollow" target="_blank" href="http://www.cybaea.net/Blogs/Data/How-to-win-the-KDD-Cup-Challenge-with-R-and-gbm.html">laptop with 2GB RAM</a> to beat IBM&#8217;s mighty clusters.</p>
<p>For businesses, this is a great opportunity to find out what is available by having others suggest new ideas and methods, or even to test your internally deployed models against the best of the best. <strong>So if you&#8217;re a business who has data, please consider being invloved!</strong> For further details, see the <a rel="nofollow" target="_blank" href="http://ausdm09.togaware.com/competition.html">competition webpage</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>PMML 4.0 Released</title>
         <link>http://www.dataminingdownunder.com/2009/06/pmml-40/</link>
         <description>The DMG has released a new version of the PMML open format for representing predictive models. The new version includes support for ensembles, new model types and more built in functions to name just a few of the enhancements. For a detailed summary, see the Zementis blog.</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=278</guid>
         <pubDate>Thu, 18 Jun 2009 02:50:37 -0700</pubDate>
         <content:encoded><![CDATA[<p>The <a rel="nofollow" target="_blank" href="http://www.dmg.org">DMG</a> has released a new version of the PMML open format for representing predictive models. The new version includes support for ensembles, new model types and more built in functions to name just a few of the enhancements. For a detailed summary, see the <a rel="nofollow" target="_blank" href="http://adapasupport.zementis.com/2009/06/pmml-40-is-here.html">Zementis blog</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>Ten Data Mining Mistakes to Avoid</title>
         <link>http://www.dataminingdownunder.com/2009/05/ten-mistakes-to-avoid/</link>
         <description>Some really good advice here from John Elder in a series of video tutorials on data mining mistakes to avoid. Tip #5, regarding contaminating the project with future data is a good one, although sometimes it can be quite tricky (if not impossible) to &amp;#8216;rewind&amp;#8217; the data! I believe the video series is [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=258</guid>
         <pubDate>Fri, 15 May 2009 03:19:37 -0700</pubDate>
         <content:encoded><![CDATA[<p>Some really good advice here from John Elder in a <a rel="nofollow" target="_blank" href="http://www.youtube.com/view_play_list?p=79E8168EA02996A3&#038;sort_field=title">series of video tutorials on data mining mistakes to avoid</a>. Tip #5, regarding contaminating the project with future data is a good one, although sometimes it can be quite tricky (if not impossible) to &#8216;rewind&#8217; the data! I believe the video series is a part of the launch of <a rel="nofollow" target="_blank" href="http://www.elsevierdirect.com/datamining">The Handbook of Statistical Analysis and Data Mining Applications</a>. You can watch part one below or head over to YouTube for the <a rel="nofollow" target="_blank" href="http://www.youtube.com/view_play_list?p=79E8168EA02996A3&#038;sort_field=title">entire series</a>.</p>
<p><iframe class="embeddedvideo" src="http://www.youtube.com/v/Rd60vmoMMRY&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" width="532" height="323"></iframe></p>]]></content:encoded>
      </item>
      <item>
         <title>RapidMiner to get dual GUIs</title>
         <link>http://www.dataminingdownunder.com/2009/05/rapidminer-v5-gui/</link>
         <description>A forum post by Ingo Mierswa of Rapid-I indicates the upcoming RapidMiner v5 will feature two GUIs: the existing tree-based designer and a new graph-based designer! I&amp;#8217;m quite excited about this because I&amp;#8217;ve personally found the existing UI a bit clunky. Details and screenshots over at the
user forum.</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=252</guid>
         <pubDate>Wed, 13 May 2009 09:55:44 -0700</pubDate>
         <content:encoded><![CDATA[<p>A forum post by Ingo Mierswa of Rapid-I indicates the upcoming RapidMiner v5 will feature two GUIs: the existing tree-based designer and a new graph-based designer! I&#8217;m quite excited about this because I&#8217;ve personally found the existing UI a bit <a rel="nofollow" target="_blank" href="http://www.dataminingdownunder.com/2008/11/rapidminer-43-released/">clunky</a>. Details and screenshots over at the<br />
<a rel="nofollow" target="_blank" href="http://rapid-i.com/rapidforum/index.php?topic=527.msg3324#msg3324">user forum</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>SAS hints at future R integration</title>
         <link>http://www.dataminingdownunder.com/2009/02/sas-hints-at-future-r-integration/</link>
         <description>In more R news, it appears SAS isn&amp;#8217;t as worried about airplane safety as originally thought, and has indicated they will include R support in an upcoming update to the SAS/IML product. For details see NYTimes &amp;#38; Adventures in Consulting.</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=238</guid>
         <pubDate>Tue, 17 Feb 2009 03:03:23 -0800</pubDate>
         <content:encoded><![CDATA[<p>In more R news, it appears SAS isn&#8217;t as worried about <a rel="nofollow" target="_blank" href="http://blogs.sas.com/sascom/index.php?/archives/434-This-post-is-rated-R.html">airplane safety</a> as originally thought, and has indicated they will include R support in an upcoming <a rel="nofollow" target="_blank" href="http://support.sas.com/rnd/app/studio/Rinterface2.html">update</a> to the SAS/IML product. For details see <a rel="nofollow" target="_blank" href="http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-one-letter-at-a-time/">NYTimes</a> &amp; <a rel="nofollow" target="_blank" href="http://minequest.com/WordPress/?p=109">Adventures in Consulting</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>R in the New York Times</title>
         <link>http://www.dataminingdownunder.com/2009/01/r-project-in-nyt/</link>
         <description>The New York Times has an interesting story on the increasing use of R for data analysis within academia and industry. Several large corporates are cited as having selected R over commercial conterparts such as S and SAS.
[via Slashdot]
Update: For more R news, see also Ajay Ohri&amp;#8217;s interview with Dr Graham Williams, the author of [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=232</guid>
         <pubDate>Wed, 07 Jan 2009 18:32:17 -0800</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://www.nytimes.com">The New York Times</a> has an interesting story on the <a rel="nofollow" target="_blank" href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html">increasing use of R</a> for data analysis within academia and industry. Several large corporates are cited as having selected <a rel="nofollow" target="_blank" href="http://www.r-project.org">R</a> over commercial conterparts such as <a rel="nofollow" target="_blank" href="http://www.insightful.com/">S</a> and <a rel="nofollow" target="_blank" href="http://www.sas.com">SAS</a>.</p>
<p style="text-align:right;">[<a rel="nofollow" target="_blank" href="http://developers.slashdot.org/article.pl?sid=09/01/07/2316227">via Slashdot</a>]</p>
<p><strong>Update:</strong> For more R news, see also <a rel="nofollow" target="_blank" href="http://www.decisionstats.com/2009/01/interview-dr-graham-williams/">Ajay Ohri&#8217;s interview</a> with <a rel="nofollow" target="_blank" href="http://www.togaware.com">Dr Graham Williams</a>, the author of <a rel="nofollow" title="Rattle data mining suite for R" target="_blank" href="http://rattle.togaware.com">Rattle</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>RapidMiner 4.3 Released</title>
         <link>http://www.dataminingdownunder.com/2008/11/rapidminer-43-released/</link>
         <description>Rapid-I has released an new and improved version of the open source data mining suite RapidMiner (formely called YALE). I&amp;#8217;ve been evaluating RapidMiner lately as a possible addition to my data mining toolbox. I&amp;#8217;ve found the biggest hurdle in learning how to use it is probably the GUI. It is a tree-based GUI which I [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=211</guid>
         <pubDate>Thu, 27 Nov 2008 20:19:56 -0800</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://rapid-i.com">Rapid-I</a> has released an new and <a rel="nofollow" target="_blank" href="http://rapid-i.com/content/view/133/1/">improved</a> version of the open source data mining suite <a rel="nofollow" target="_blank" href="http://rapidminer.com">RapidMiner</a> (formely called YALE). I&#8217;ve been evaluating RapidMiner lately as a possible addition to my data mining toolbox. I&#8217;ve found the biggest hurdle in learning how to use it is probably the GUI. It is a tree-based GUI which I find much harder to understand than the graph-style approach used by <a rel="nofollow" target="_blank" href="http://www.spss.com/clementine/">many</a> <a rel="nofollow" target="_blank" href="http://www.sas.com/technologies/analytics/datamining/miner/">others</a>. However RapidMiner is quite a powerful tool, and the Community Edition is free, so there is probably a lot of benefit in getting used to the strange GUI.</p>
<p>The built in tutorial is a really good way to get a grasp of the system and I highly recommend spending some time on this if you are interested in learning RapidMiner. I would also recommend a series of <a rel="nofollow" target="_blank" href="http://www.neuralmarkettrends.com/tutorials/">RapidMiner video turtorials</a> over at <a rel="nofollow" target="_blank" href="http://www.neuralmarkettrends.com/">Neural Market Trends</a> that are worth checking out too.</p>
<div id="attachment_216" class="wp-caption aligncenter" style="width:211px;"><a rel="nofollow" target="_blank" href="http://rapid-i.com/images/stories/rapidi/yale/releases/4_3/01_lift.jpg"><img class="size-full wp-image-216" title="RapidMiner 4.3" src="http://www.dataminingdownunder.com/wp-content/uploads/2008/11/rmnewsml.jpg" alt="RapidMiner 4.3 includes a 3d lift chart" width="201" height="150"/></a><p class="wp-caption-text">RapidMiner 4.3 includes a 3D lift chart</p></div>]]></content:encoded>
      </item>
      <item>
         <title>SAS Forum (Australia) presentations available online</title>
         <link>http://www.dataminingdownunder.com/2008/09/sas-forum-australia-presentations/</link>
         <description>The SAS Forum (Australia) was held in Sydney back in August. I was unable to attend but luckily the presentations have been put online. Here are some that I found interesting: Make Sure Your Insight is Insightful: Analytical Marketing at NAB by Antony Ugoni (National Australia Bank)
Model Deployment and Management &amp;#8211; The ATO Story by Warwick [...]</description>
         <guid isPermaLink="false">http://www.dataminingdownunder.com/?p=3</guid>
         <pubDate>Mon, 29 Sep 2008 16:30:52 -0700</pubDate>
         <content:encoded><![CDATA[<p>The <a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/index.php?option=com_content&amp;view=article&amp;id=151&amp;Itemid=93">SAS Forum (Australia)</a> was held in Sydney back in August. I was unable to attend but luckily the presentations have been put <a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/index.php?option=com_content&amp;view=article&amp;id=151&amp;Itemid=93">online</a>. Here are some that I found interesting:</p>
<ul>
<li><a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/presentations/NAB%20-%20Antony%20Ugoni.pdf">Make Sure Your Insight is Insightful: Analytical Marketing at NAB</a> by Antony Ugoni (National Australia Bank)</li>
<li><a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/presentations/Model%20Deployment%20and%20Management%20-%20The%20ATO%20Story.pdf">Model Deployment and Management &#8211; The ATO Story</a> by Warwick Graco (Australian Taxation Office)<a rel="nofollow" target="_blank" href="http://www.iapa.org.au"></a></li>
<li><a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/presentations/Offlode%20-%20Paul%20Bracewell.pdf">Putting Cheques in Place to Identify Fraud</a> by Dr Paul Bracewell (Offlode NZ) and Flavio Palaci (Marsh Australia)</li>
<li><a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/presentations/Customer%20Value%20Creation%20Using%20Analysis.pdf">Customer Value Creation Using Analytics</a> by Arun VS (Satyam)</li>
<li><a rel="nofollow" target="_blank" href="http://www.sasforum.com/anz/presentations/SAS%20-%20Bill%20Gibson.pdf">Analysing Performance and Tuning your SAS Application</a> by Bill Gibson (SAS)</li>
</ul>]]></content:encoded>
      </item>
      <item>
         <title>Data Mining the Financial Markets</title>
         <link>http://sbutler.com/blog/2008/04/data-mining-the-financial-markets/</link>
         <description>Thomas A. Rathburn has written a series of three articles on data mining the financial markets. Rathburn takes a detailed look into the success and failures of his efforts in the markets and with 10 year US bonds in particular. You can check it out here part 1, part 2, and part 3. [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2008/04/data-mining-the-financial-markets/</guid>
         <pubDate>Thu, 24 Apr 2008 23:32:10 -0700</pubDate>
         <content:encoded><![CDATA[<p>Thomas A. Rathburn has written a series of three articles on data mining the financial markets. Rathburn takes a detailed look into the success and failures of his efforts in the markets and with 10 year US bonds in particular. You can check it out here <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/view/6386">part 1</a>, <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/view/6655">part 2</a>, and <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/view/7189">part 3</a>. The articles are also available as a podcast here: <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/includes/audio/6386.mp3">1</a>, <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/includes/audio/6655.mp3">2</a>, <a rel="nofollow" target="_blank" href="http://www.b-eye-network.com/includes/audio/7189.mp3">3</a>.</p>
<p align="right">[via <a rel="nofollow" target="_blank" href="http://www.kdnuggets.com">KDnuggets</a>]</p>]]></content:encoded>
         <category>Data Mining</category>
         <enclosure length="14730638" url="http://www.b-eye-network.com/includes/audio/6386.mp3" type="audio/mpeg" />
         <enclosure length="13275487" url="http://www.b-eye-network.com/includes/audio/6655.mp3" type="audio/mpeg" />
         <enclosure length="11392788" url="http://www.b-eye-network.com/includes/audio/7189.mp3" type="audio/mpeg" />
      </item>
      <item>
         <title>Experian Bolsters Data With Hitwise Acqusition</title>
         <link>http://sbutler.com/blog/2007/05/experian-acquires-hitwise/</link>
         <description>Tim O&amp;#8217;Reilly points to the news that Experian has made a significant move to improve the quality of their online and demographic data with the acqusition of Hitwise for US$240 Million. Hitwise collects user traffic from ISPs in several countries including Australia and uses that information to provide companies with insight into their online [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2007/05/experian-acquires-hitwise/</guid>
         <pubDate>Fri, 04 May 2007 00:06:23 -0700</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://radar.oreilly.com/archives/2007/04/hitwise_acquire.html">Tim O&#8217;Reilly</a> points to the news that Experian has made a significant move to improve the quality of their online and demographic data with the acqusition of Hitwise for US$240 Million. Hitwise collects user traffic from ISPs in several countries including Australia and uses that information to provide companies with insight into their online marketshare. Although not mentioned in the <a rel="nofollow" target="_blank" href="http://www.hitwise.com/press-center/hitwiseHS2004/hitwise-acquisition.php">press release</a>, the Hitwise data will likely be a huge boon for Experian&#8217;s marketing services, and will probably allow them to develop more accurate geo-demographic profiles.</p>]]></content:encoded>
         <category>Business</category>
      </item>
      <item>
         <title>Winning the DARPA Grand Challenge</title>
         <link>http://sbutler.com/blog/2006/09/grand-challenge-video/</link>
         <description>Sebastian Thrun of Stanford Racing gives a great a talk on what it took build an autonomous vehicle to win the DARPA Grand Challenge. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video here.</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/09/grand-challenge-video/</guid>
         <pubDate>Sat, 16 Sep 2006 21:21:05 -0700</pubDate>
         <content:encoded><![CDATA[<p>Sebastian Thrun of <a rel="nofollow" target="_blank" href="http://www.stanfordracing.org/">Stanford Racing</a> gives a great a talk on <a rel="nofollow" target="_blank" href="http://video.google.com/videoplay?docid=8594517128412883394">what it took build an autonomous vehicle</a> to win the <a rel="nofollow" target="_blank" href="http://www.darpa.mil/grandchallenge/index.asp">DARPA Grand Challenge</a>. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video <a rel="nofollow" target="_blank" href="http://video.google.com/videoplay?docid=8594517128412883394">here</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>In-cell Graphing</title>
         <link>http://sbutler.com/blog/2006/08/in-cell-graphing/</link>
         <description>The guys from Juice Analytics have put together an interesting series on in cell graphing (parts 1, 2, &amp;#038; 3). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/08/in-cell-graphing/</guid>
         <pubDate>Fri, 11 Aug 2006 02:30:11 -0700</pubDate>
         <content:encoded><![CDATA[<p>The guys from <a rel="nofollow" target="_blank" href="http://juiceanalytics.com/weblog/">Juice Analytics</a> have put together an interesting series on in cell graphing (parts <a rel="nofollow" target="_blank" href="http://www.juiceanalytics.com/weblog/?p=236">1</a>, <a rel="nofollow" target="_blank" href="http://www.juiceanalytics.com/weblog/?p=239">2</a>, &#038; <a rel="nofollow" target="_blank" href="http://www.juiceanalytics.com/weblog/?p=240">3</a>). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, I can confirm it works in <a rel="nofollow" target="_blank" href="http://openoffice.org">OpenOffice.org</a>, <a rel="nofollow" target="_blank" href="http://www.gnome.org/projects/gnumeric/">Gnumeric</a> and even <a rel="nofollow" target="_blank" href="http://spreadsheets.google.com">Google Spreadsheets</a> (all to varying degrees).</p>]]></content:encoded>
      </item>
      <item>
         <title>Article: HCF gets a helping hand from predictive analytics</title>
         <link>http://sbutler.com/blog/2006/06/article-hcf-gets-a-helping-hand-from-predictive-analytics/</link>
         <description>From the ComputerWorld article:
Private health insurer HCF has implemented a predictive analytics suite to help weed out fraudulent claims, target individual members and streamline the monotonous labour of data analysis.</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/06/article-hcf-gets-a-helping-hand-from-predictive-analytics/</guid>
         <pubDate>Mon, 12 Jun 2006 20:43:59 -0700</pubDate>
         <content:encoded><![CDATA[<p>From the <a rel="nofollow" target="_blank" href="http://www.computerworld.com.au/index.php/id;378252446;fp;16;fpid;0">ComputerWorld article</a>:</p>
<blockquote><p>Private health insurer HCF has implemented a predictive analytics suite to help weed out fraudulent claims, target individual members and streamline the monotonous labour of data analysis.</p></blockquote>]]></content:encoded>
         <category>Data Mining</category>
      </item>
      <item>
         <title>Data Mining with Oracle</title>
         <link>http://sbutler.com/blog/2006/05/oracle-data-mining/</link>
         <description>If you are interested in data mining and haven&amp;#8217;t already seen the Oracle Data Mining and Analytics blog, it is worth checking out. It has some great how to&amp;#8217;s, including time series forcasting (parts 1, 2, 3) and real-time scoring &amp;#038; model management (parts 1, 2, 3).</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/oracle-data-mining/</guid>
         <pubDate>Tue, 30 May 2006 04:45:32 -0700</pubDate>
         <content:encoded><![CDATA[<p>If you are interested in data mining and haven&#8217;t already seen the <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/">Oracle Data Mining and Analytics blog</a>, it is worth checking out. It has some great how to&#8217;s, including time series forcasting (parts <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/01/time-series-forecasting-part-1_23.html">1</a>, <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/03/time-series-forecasting-2-single-step.html">2</a>, <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/05/time-series-forecasting-3-multi-step.html">3</a>) and real-time scoring &#038; model management (parts <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/02/real-time-scoring-model-management-1.html">1</a>, <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/02/real-time-scoring-model-management-2.html">2</a>, <a rel="nofollow" target="_blank" href="http://oracledmt.blogspot.com/2006/02/real-time-scoring-model-management-3.html">3</a>).</p>]]></content:encoded>
         <category>Data Mining</category>
      </item>
      <item>
         <title>Smart SPAM &amp; Fighting it</title>
         <link>http://sbutler.com/blog/2006/05/smart-spam/</link>
         <description>For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/smart-spam/</guid>
         <pubDate>Fri, 12 May 2006 19:26:28 -0700</pubDate>
         <content:encoded><![CDATA[<p>For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company&#8217;s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like <a rel="nofollow" target="_blank" href="http://mail.google.com">Gmail</a> and <a rel="nofollow" target="_blank" href="https://www.google.com/hosted">the hosted version</a>, should never have this problem because the filter learns from thousands of user&#8217;s SPAM folders.</p>
<p>Researchers from University of Calgary <a rel="nofollow" target="_blank" href="http://pharos.cpsc.ucalgary.ca/Dienst/UI/2.0/Describe/ncstrl.ucalgary_cs/2006-808-01">claim</a> that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and <a rel="nofollow" target="_blank" href="http://arstechnica.com/news.ars/post/20060502-6726.html">&#8216;mine&#8217; your emails</a>. By creating emails based on the your actual messages you&#8217;ve previously sent, the spammers hope they will be more believable to readers.</p>
<p>I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.</p>]]></content:encoded>
      </item>
      <item>
         <title>Data Mining Cup 2006</title>
         <link>http://sbutler.com/blog/2006/05/data-mining-cup-2006/</link>
         <description>The Data Mining Cup (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual sales revenue is higher than the average sales revenue of the product category.</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/data-mining-cup-2006/</guid>
         <pubDate>Thu, 04 May 2006 20:02:11 -0700</pubDate>
         <content:encoded><![CDATA[<p>The <a rel="nofollow" target="_blank" href="http://www.data-mining-cup.com/2006/Wettbewerb/Aufgabe/1146583837/">Data Mining Cup</a> (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual sales revenue is higher than the average sales revenue of the product category.</p>]]></content:encoded>
      </item>
      <item>
         <title>DARPA Grand Challenge</title>
         <link>http://sbutler.com/blog/2006/05/darpa-urban-challenge/</link>
         <description>Start your engines, the DARPA Grand Challenge is on again only this time its an urban challenge! The last two competitions were to race an autonomous vehicle through a desert, with the 2005 winner, Standford, taking home a US$2 million prize. Stanford&amp;#8217;s software in action: Input from GPS and many sensors [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/darpa-urban-challenge/</guid>
         <pubDate>Wed, 03 May 2006 19:18:23 -0700</pubDate>
         <content:encoded><![CDATA[<p><em>Start your engines</em>, the <a rel="nofollow" target="_blank" href="http://www.darpa.mil/grandchallenge">DARPA Grand Challenge</a> is on again only this time its an urban challenge! The last two competitions were to race an autonomous vehicle through a desert, with the 2005 winner, <a rel="nofollow" target="_blank" href="http://www-cs.stanford.edu/group/roadrunner/">Standford</a>, taking home a US$2 million prize.</p>
<p><a rel="nofollow" title="stanford1.png" class="imagelink" target="_blank" href="http://sbutler.com/blog/wp-content/uploads/stanford1.png"><img alt="stanford1.png" id="image132" src="http://sbutler.com/blog/wp-content/uploads/stanford1.thumbnail.png"/></a> <a rel="nofollow" title="stanford2.png" class="imagelink" target="_blank" href="http://sbutler.com/blog/wp-content/uploads/stanford2.png"><img alt="stanford2.png" id="image133" src="http://sbutler.com/blog/wp-content/uploads/stanford2.thumbnail.png"/></a><br clear="all"/><strong> Stanford&#8217;s software in action:</strong> Input from GPS and many sensors feed the algorithms to determine the safe path (see <a rel="nofollow" target="_blank" href="http://www.darpa.mil/grandchallenge05/TechPapers/Stanford.pdf">tech report</a>).</p>]]></content:encoded>
      </item>
      <item>
         <title>Using Gmail for Backups</title>
         <link>http://sbutler.com/blog/2006/05/gmail-for-backups/</link>
         <description>While writing a thesis it is obviously imperative to have foolproof backups in place. So why not backup to that free 2.7Gb Gmail account? Here&amp;#8217;s what you have to do: Install &amp;#8220;email&amp;#8221; (Gentoo users: emerge net-mail/email)
Edit /etc/email/email.conf (Gentoo users: as a minimum you must set REPLY_TO)
Test the commands. They are:
cd /path/to/your/thesis/
tar -czf /tmp/thesis.tar.gz *.*
email --blank-mail --smtp-server [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/gmail-for-backups/</guid>
         <pubDate>Tue, 02 May 2006 16:55:56 -0700</pubDate>
         <content:encoded><![CDATA[<p>While writing a thesis it is obviously imperative to have foolproof backups in place. So why not backup to that free 2.7Gb <a rel="nofollow" target="_blank" href="http://gmail.com">Gmail</a> account? Here&#8217;s what you have to do:</p>
<ol>
<li>Install &#8220;<em><a rel="nofollow" title="Command line email client called " target="_blank" href="http://email.cleancode.org/">email</a></em>&#8221; (Gentoo users: <code>emerge net-mail/email</code>)</li>
<li>Edit <code>/etc/email/email.conf</code> (Gentoo users: as a minimum you must set <code>REPLY_TO</code>)</li>
<li>Test the commands. They are:<br />
<code>cd /path/to/your/thesis/<br />
tar -czf /tmp/thesis.tar.gz *.*<br />
email --blank-mail --smtp-server <strong>mail.yourserver.com</strong> &#8211;from-name <strong>&#8220;your name&#8221;</strong> &#8211;from-addr <strong>you@youremail.com</strong> &#8211;subject &#8220;Cron: Thesis Backup (`date`)&#8221; <strong>you@gmail.com</strong> &#8211;attach /tmp/thesis.tar.gz &gt; /dev/null 2&gt;&#038;1<br />
rm -f /tmp/thesis.tar.gz<br />
</code></li>
<li>Now add this as a <a rel="nofollow" target="_blank" href="http://www.adminschoice.com/docs/crontab.htm#Crontab%20file"><code>/etc/crontab</code> entry</a>. This example sends the backup at 7am each day.<br />
<code>0 7 * * * <strong>unixusername</strong> cd /path/to/your/thesis/; tar -czf /tmp/thesis.tar.gz *.*; email &#8211;blank-mail &#8211;smtp-server <strong>mail.yourserver.com</strong> &#8211;from-name <strong>&#8220;your name&#8221;</strong> &#8211;from-addr <strong>you@youremail.com</strong> &#8211;subject &#8220;Cron: Thesis Backup (`date`)&#8221; <strong>you@gmail.com</strong> &#8211;attach /tmp/thesis.tar.gz &gt; /dev/null 2&gt;&#038;1; rm -f /tmp/thesis.tar.gz<br />
</code></li>
<li>Final step is to <a rel="nofollow" target="_blank" href="https://mail.google.com/support/bin/answer.py?answer=6579&#038;topic=1539">create a Gmail filter</a>! It would be nice if it was possible to stop the emails being downloaded via POP but I think this <a rel="nofollow" target="_blank" href="https://mail.google.com/support/bin/answer.py?answer=13291&#038;topic=1555">may require a filter that moves the incoming backup emails to Trash</a>.</li>
</ol>
<p>Obviously you don&#8217;t have to use this for backing up a thesis, it could easily be modified to backup whatever you want.<br />
Note: I can&#8217;t see mention of TLS support in the client <em>email</em>, so that&#8217;s why I&#8217;ve suggested you use your own SMTP server rather than Google&#8217;s.</p>]]></content:encoded>
         <category>Geek</category>
      </item>
      <item>
         <title>Visualising Digg</title>
         <link>http://sbutler.com/blog/2006/05/digg-graph/</link>
         <description>Digg, The Blog has info on a nice visualisation of activity on digg.com. Kevin mentions the zip-line effect in the videos are probably bots. Pretty cool!</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/05/visualising-digg/</guid>
         <pubDate>Tue, 02 May 2006 08:46:09 -0700</pubDate>
         <content:encoded><![CDATA[<p><em>Digg, The Blog</em> has info on a nice <a rel="nofollow" target="_blank" href="http://diggtheblog.blogspot.com/2006/05/visualizing-digg-data.html">visualisation of activity on digg.com</a>. Kevin mentions the zip-line effect in the videos are probably bots. Pretty cool!</p>]]></content:encoded>
         <category>Geek</category>
      </item>
      <item>
         <title>Google Scholar</title>
         <link>http://sbutler.com/blog/2006/04/google-scholar/</link>
         <description>I must say, having been a long time CiteSeer user, Google Scholar is a real breath of fresh air. It is yet another academic search interface, although this time its done right (unlike Rexa which is waaaay too inaccurate). Its a great interface and you can actually find whatever your looking for, its quite amazing!
Library [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/04/google-scholar/</guid>
         <pubDate>Sun, 30 Apr 2006 05:48:49 -0700</pubDate>
         <content:encoded><![CDATA[<p>I must say, having been a long time CiteSeer user, <a rel="nofollow" target="_blank" href="http://scholar.google.com">Google Scholar</a> is a real breath of fresh air. It is yet another academic search interface, although this time its done right (unlike <a rel="nofollow" target="_blank" href="http://rexa.info/">Rexa</a> which is waaaay too inaccurate). Its a great interface and you can actually find whatever your looking for, its quite amazing!</p>
<p>Library support is on the way, too. At the moment though, I could only find the National Library of Australia and <a rel="nofollow" target="_blank" href="http://www.deakin.edu.au">Deakin University</a>, but the level of integration is very promising.</p>
<p><strong> Tip</strong>: Maintaining your BibTeX reference database can be a pain sometimes. When using Google Scholar, make sure you enable BibTeX export in the preferences, it will save you heaps of time. Even when you already have the PDF, it is easy to do quick search, click &#8220;Import into BibTeX&#8221; then the entry into copy &#038; paste into your .bib file. There is a slight bug with the field &#8220;authors&#8221; instead of &#8220;author&#8221; but that is easy to fix on the fly.</p>]]></content:encoded>
         <category>Uni</category>
      </item>
      <item>
         <title>LaTeX Presentations</title>
         <link>http://sbutler.com/blog/2006/04/latex-presentations/</link>
         <description>Now you can design presentation templates in Inkscape and use them as a LaTeX style. From the LaTeX Presentation Designer website:
The package contains a documentclass called &amp;#8220;presentation&amp;#8221; which takes as an argument a slide style. The package also provides a simple python program that can interpret SVG files generated by Inkscape and build a slide [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/04/latex-presentations/</guid>
         <pubDate>Mon, 17 Apr 2006 19:18:05 -0700</pubDate>
         <content:encoded><![CDATA[<p>Now you can design presentation templates in Inkscape and use them as a LaTeX style. From the <a rel="nofollow" target="_blank" href="http://jedidiah.stuff.gen.nz/lpd.html">LaTeX Presentation Designer</a> website:</p>
<blockquote><p>The package contains a documentclass called &#8220;presentation&#8221; which takes as an argument a slide style. The package also provides a simple python program that can interpret SVG files generated by Inkscape and build a slide style, usable by the &#8220;presentation&#8221; document class, directly from it. This means that creating new custom slide designs is as simple as drawing what you want your slides to look like in Inkscape.</p></blockquote>
<p align="center"><a rel="nofollow" class="imagelink" title="LaTeX presentation designer screenshot" target="_blank" href="http://jedidiah.stuff.gen.nz/lpd.html"><img alt="lpd-gradient-screenshot.png" id="image116" src="http://sbutler.com/blog/wp-content/uploads/lpd-gradient-screenshot.thumbnail.png"/> <img id="image114" alt="LaTeX presentation designer screenshot" src="http://sbutler.com/blog/wp-content/uploads/lpd-marble-screenshot.thumbnail.png"/> <img id="image115" alt="LaTeX presentation designer screenshot" src="http://sbutler.com/blog/wp-content/uploads/lpd-forest-screenshot.thumbnail.png"/></a></p>
<p>LaTeX based presentation are handy when you want nice looking equations on your slides. Other good alternatives are <a rel="nofollow" target="_blank" href="http://latex-beamer.sourceforge.net/">Beamer</a> and <a rel="nofollow" target="_blank" href="http://amath.colorado.edu/documentation/LaTeX/prosper/">Prosper</a>.</p>]]></content:encoded>
         <category>General</category>
      </item>
      <item>
         <title>Getting to know R Graphs</title>
         <link>http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/</link>
         <description>Check out the R Graph Gallery which includes not only detailed descriptions of graphs you can produce in R, but also R source! Props to Martin for the link.</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/</guid>
         <pubDate>Thu, 06 Apr 2006 18:09:05 -0700</pubDate>
         <content:encoded><![CDATA[<p>Check out the <a rel="nofollow" target="_blank" href="http://addictedtor.free.fr/graphiques/">R Graph Gallery</a> which includes not only detailed descriptions of graphs you can produce in <a rel="nofollow" target="_blank" href="http://www.r-project.org/">R</a>, but also R source! Props to <a rel="nofollow" target="_blank" href="http://statgraphics.blog.com/644674/">Martin</a> for the link.</p>]]></content:encoded>
      </item>
      <item>
         <title>What’s in a name?</title>
         <link>http://sbutler.com/blog/2006/04/whats-in-a-name/</link>
         <description>Dennis Forbes gives a fantastic analysis of one of the biggest databases on the Internet - the DNS records. His analysis includes insights into domain name length, personal and family name usage and other characteristics. For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part [...]</description>
         <guid isPermaLink="false">http://sbutler.com/blog/2006/04/whats-in-a-name/</guid>
         <pubDate>Wed, 05 Apr 2006 02:19:27 -0700</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://www.yafla.com/dforbes/">Dennis Forbes</a> gives a fantastic analysis of one of the biggest databases on the Internet - the <a rel="nofollow" target="_blank" href="http://www.yafla.com/dforbes/2006/03/29.html">DNS records</a>. His analysis includes insights into domain name length, personal and family name usage and other characteristics. For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part so keep a look out for that too.</p>]]></content:encoded>
         <category>Geek</category>
      </item>
   </channel>
</rss><!-- fe5.pipes.re3.yahoo.com uncompressed/chunked Wed Mar 10 02:18:12 PST 2010 -->
