<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Data Mining Research</title>
	
	<link>http://www.dataminingblog.com</link>
	<description>Data mining crossroads - research, applications, news, list of blogs and customized search engine about data mining.</description>
	<lastBuildDate>Tue, 14 May 2013 18:05:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dataminingblog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="dataminingblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">dataminingblog</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>How To Gain A Competitive Advantage With Big Data</title>
		<link>http://www.dataminingblog.com/how-to-gain-a-competitive-advantage-with-big-data/</link>
		<comments>http://www.dataminingblog.com/how-to-gain-a-competitive-advantage-with-big-data/#comments</comments>
		<pubDate>Tue, 14 May 2013 18:05:39 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1918</guid>
		<description><![CDATA[Today&#8217;s post is a guest post from Dorian Travers, Marketing Analyst. He discusses ways to get competitive advantage using meaningful data. I thank him for his contribution.
How To Gain A Competitive Advantage With Big Data
The chief objective for both small and large companies is growth.  Business growth creates more jobs, more revenue, and keeps the [...]]]></description>
			<content:encoded><![CDATA[<p><em>Today&#8217;s post is a guest post from Dorian Travers, Marketing Analyst. He discusses ways to get competitive advantage using meaningful data. I thank him for his contribution.</em></p>
<p><strong>How To Gain A Competitive Advantage With Big Data</strong></p>
<p>The chief objective for both small and large companies is growth.  Business growth creates more jobs, more revenue, and keeps the corporation healthy and self-sustaining.  The process to achieving growth year over year can be somewhat of a conundrum though.  Some industries have complex data streams that pour in among unrelated departments but the marriage of each piece of data can be crucial in the creation of a feasible growth trajectory.  Getting a handle on all your data can help you to interpret sales patterns, customer behavior, product lifecycle, and more.  Once you have a better understanding of your data you can apply the appropriate b2b pricing<strong> </strong>strategy to create steady growth.</p>
<p><strong>Discovering Pearls</strong></p>
<p>Service and parts organizations may have transactions with thousands of vendors and customers in a given month.  These transactions produce data that can be buried within the organization’s infrastructure, and as the company moves forward each day it may be difficult to stop and organize the data in a meaningful way.  There are big data solutions that are available to help companies unlock these pearls so proper analysis and planning for future growth can occur.  Finding an application that marries data housed within departments like accounting and operations can improve sales and create more revenue.</p>
<p><strong>What Data Is Meaningful?</strong></p>
<p>For service companies there is usually lots of data available for analysis.  It can be housed within the same department or completely separate from one department to the next.  The data that is most meaningful displays how sales prices were arrived at and how the sales and marketing teams achieved those sales.</p>
<ul>
<li><strong>Operations Management data. </strong>Data from operations management teams may provide insight regarding products—how much it costs to purchase raw materials, the volume at which raw materials are consumed, the shelf life of all raw materials and finished goods.</li>
<li><strong>Sales and Marketing data.</strong> The sales and marketing team may track sales patterns and the buying behavior of its customers.  Access to that data can provide the company with valuable information to create a stronger sales strategy and product price optimization.  Sales and marketing personnel can also provide insight about the customers they sell products and services to.</li>
<li><strong>Accounting and Finance data. </strong>The accounting and finance departments use data to invoice customers and collect revenue.  In order to do so, however, they also need the date a sale was closed, the price of the goods or services being sold, the quantity sold, and any rebate or discounts being offered.</li>
</ul>
<p><strong> </strong></p>
<p style="text-align: center;"><a href="http://www.dataminingblog.com/wp-content/uploads/GPim2.png"><img class="aligncenter size-full wp-image-1920" title="GPim2" src="http://www.dataminingblog.com/wp-content/uploads/GPim2.png" alt="GPim2" width="448" height="167" /></a></p>
<p><strong>Making Meaning from the Data</strong></p>
<p>Data mining provides corporations with a multitude of insights that may be otherwise buried, inaccessible, and meaningless.  Marrying information that can be shared within multiple departments allows for the corporation to increase its ability to reach more customers and potential for more sales.  As sales and revenue increases the business can grow, create more jobs, and stays self-sustaining and solvent.  Large corporations in particular, ones that have multiple departments working on related tasks and without the ability to share data in meaningful ways, can gain a competitive advantage with big data interpretation.</p>
<p>For more information, you can visit <a href="http://www.pros.com">pros.com</a></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fhow-to-gain-a-competitive-advantage-with-big-data%2F&amp;linkname=How%20To%20Gain%20A%20Competitive%20Advantage%20With%20Big%20Data"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=rtkfqf30D2Y:99GkC_AHv5Y:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=rtkfqf30D2Y:99GkC_AHv5Y:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=rtkfqf30D2Y:99GkC_AHv5Y:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=rtkfqf30D2Y:99GkC_AHv5Y:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=rtkfqf30D2Y:99GkC_AHv5Y:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=rtkfqf30D2Y:99GkC_AHv5Y:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=rtkfqf30D2Y:99GkC_AHv5Y:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=rtkfqf30D2Y:99GkC_AHv5Y:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/rtkfqf30D2Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/how-to-gain-a-competitive-advantage-with-big-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Can we automate data mining?</title>
		<link>http://www.dataminingblog.com/can-we-automate-data-mining/</link>
		<comments>http://www.dataminingblog.com/can-we-automate-data-mining/#comments</comments>
		<pubDate>Sun, 14 Apr 2013 18:07:01 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1862</guid>
		<description><![CDATA[That&#8217;s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book&#8217;s authors. In 2010, we continued the discussion about specific aspects of data mining which could be automated.
Recently, I [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/AutoDM.png"><img class="alignright size-medium wp-image-1863" title="AutoDM" src="http://www.dataminingblog.com/wp-content/uploads/AutoDM-300x219.png" alt="AutoDM" width="300" height="219" /></a>That&#8217;s a big question! Back in 2006, we started the discussion on Data Mining Research, with the <a href="http://www.dataminingblog.com/garbage-in-garbage-out/">post</a> about the book <em>Java Data Mining</em>. We were fortunate to get opinions from experts and one of the book&#8217;s authors. In 2010, we continued the <a href="http://www.dataminingblog.com/automating-analytics/">discussion</a> about specific aspects of data mining which could be automated.</p>
<p>Recently, I re-launched the <a href="http://www.linkedin.com/groups/Is-it-possible-automate-data-4586163.S.208566696?qid=2f769702-0cb5-4fbb-88c4-86c74c84950c&amp;trk=group_most_recent_rich-0-b-ttl&amp;goback=.gmr_4586163">debate</a> on the <a href="http://www.swiss-analytics.ch">Swiss Association for Analytics</a>.  However, I think it is worth a dedicated blog post. In order to answer  this big question, we need to analyze the different phases of data  mining and estimate which one can be automated. For this purpose, I have  chosen the <a href="ftp://ftp.software.ibm.com/software/analytics/spss/support/Modeler/Documentation/14/UserManual/CRISP-DM.pdf">CRISP-DM</a> methodology (I guess any other data mining process would lead to similar conclusions).</p>
<p><strong>Business understanding</strong></p>
<p>In this critical step, we transform a business problem into a data  mining one. We need to understand what should be solved and why. Answers  will lead to the following steps. It is clear that this step cannot be  automated for a new project. The data miner has to interact with experts  to define the data mining problem to solve.</p>
<p><strong>Data understanding</strong></p>
<p>This step consist in understanding the data, the way they have been  collected, their particularities, etc. Again, the data miner works in  collaboration with field experts to derive knowledge useful for  preparing the data (next step). This is a manual task that cannot be  automated.</p>
<p><strong>Data preparation</strong></p>
<p>In this step, we transform raw data into meaningful information to  mine. An example is outlier detection (and removal). Some companies  argue that their tools can automate this step. This is true to a certain  extent, but there are limitations. Here is a simple example: what is  the threshold for the variable &#8220;age&#8221; to be an outlier? 100, 110, 150  years old? This is problem dependent. The same issue happens for missing  values. Detecting them is often straightforward, but deciding on the  action to take needs manual intervention.</p>
<p>Another important aspect of data preparation is feature selection and  extraction. While selection can be automated, extraction (through  aggregation) needs understanding of the data. Finally, any data mining  tool can automate the target variable detection. However, the final  choice is left to the data miner, who knows the business problem to  solve.</p>
<p><strong>Modeling</strong></p>
<p>This step is where we apply modeling algorithms to processed data.  Among others, it involves selecting a data mining algorithm and tuning  its parameters. This is certainly the task that can be the most easily  automated. Some vendors claim that their tools can automate the model  building process. The concept of testing several algorithms with  different sets of parameters (tuning) can be automated to a certain  extent. However, it supposes that there are enough data, that the choice  of the algorithm is not business dependent (which is usually not the  case) and that the evaluation criterion is known (see below).</p>
<p><a href="http://www.dataminingblog.com/wp-content/uploads/crisp3.jpg"><img class="aligncenter size-medium wp-image-1914" title="crisp3" src="http://www.dataminingblog.com/wp-content/uploads/crisp3-300x253.jpg" alt="crisp3" width="300" height="253" /></a></p>
<p style="text-align: center;"><em>Cross Industry Standard Process for Data Mining (CRISP-DM)</em></p>
<p><strong>Evaluation</strong></p>
<p>In order to validate our data mining results, we need evaluation  criteria. Although applying a criterion can be automated and different  modeling algorithm can be compared, the choice of the criterion may be  business dependent. In the case of forecasting, for example, different  evaluation criteria exist such as Root Mean Square Error (RMSE), Mean  Absolute Error (MAE) and Mean Absolute Scaled Error (MASE). If we  compare different forecasting algorithms on the same time series, we can  use RMSE. If the goal is to compare different time series, MASE is more  appropriate. This is business dependent and thus difficult to automate.</p>
<p><strong>Deployment</strong></p>
<p>In this phase, the goal is to transform our proof of concept or  prototype into an industrialized solution. This step involves  transforming our &#8220;one shot&#8221; project into a solution that can work with  as few manual interventions as possible. Although standards such as  Predictive Model Markup Language (<a href="http://www.dmg.org/v4-1/GeneralStructure.html">PMML</a>)  are appearing, this step stills requires manual intervention. Questions  such as where and how to integrate our data mining process within an  overall solution/tool need to be explored.</p>
<p>As a conclusion, we have seen that most data mining steps from the  CRISP-DM methodology cannot be automated and need manual intervention.  Data preparation and modeling, to a certain extent, could be automated.  However, as data mining professionals know, most of the effort in a data  mining project concerns business and data understanding. Here is an  excellent metaphor from Berry and Linoff (re-explained by David S.  Coppock):</p>
<p>&#8220;<em>The camera can relieve the photographer from having to set the  shutter speed, aperture and other settings every time a picture is  taken. This makes the process easier for expert photographers and makes  better photography accessible to people who are not experts. But this is  still automating only a small part of the process of producing a  photograph. Choosing the subject, perspective and lighting, getting to  the right place at the right time, printing and mounting, and many other  aspects are all important in producing a good photograph.</em>&#8221;</p>
<p>What about you? Do you think we can automate data mining?</p>
<p>For your information, here are other posts related to this topic:</p>
<p><a href="http://www.allanalytics.com/author.asp?section_id=1408&amp;doc_id=251420">Automation Will Change Data Science Beyond Recognition</a><br />
<a href="http://www.allanalytics.com/author.asp?section_id=1408&amp;doc_id=251426">Data Scientists Will Not Be Replaced by Automation</a><br />
<a href="http://smartdatacollective.com/node/111596">Data Scientist Scarcity: Automation Is the Answer</a><br />
<a href="http://www.information-management.com/news/4584-1.html">Data Mining Automation</a></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fcan-we-automate-data-mining%2F&amp;linkname=Can%20we%20automate%20data%20mining%3F"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=i80Y6HliGXY:G8imorn5t1Q:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=i80Y6HliGXY:G8imorn5t1Q:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=i80Y6HliGXY:G8imorn5t1Q:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=i80Y6HliGXY:G8imorn5t1Q:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=i80Y6HliGXY:G8imorn5t1Q:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=i80Y6HliGXY:G8imorn5t1Q:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=i80Y6HliGXY:G8imorn5t1Q:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=i80Y6HliGXY:G8imorn5t1Q:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/i80Y6HliGXY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/can-we-automate-data-mining/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Data Mining Book Review: Thinking Statistically</title>
		<link>http://www.dataminingblog.com/data-mining-book-review-thinking-statistically/</link>
		<comments>http://www.dataminingblog.com/data-mining-book-review-thinking-statistically/#comments</comments>
		<pubDate>Sat, 06 Apr 2013 19:05:05 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1856</guid>
		<description><![CDATA[A short review for a short book. Thinking Statistically, by Uri Bram,  has less than 50 pages. But short in length doesn&#8217;t mean short in  relevance or quality. Bram has written a concise and comprehensible  book about key statistical concepts: selection bias, endogeneity and  Bayes. This book is a must have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/thinkstat.jpg"><img class="alignleft size-full wp-image-1857" title="thinkstat" src="http://www.dataminingblog.com/wp-content/uploads/thinkstat.jpg" alt="thinkstat" width="100" height="152" /></a>A short review for a short book. Thinking Statistically, by Uri Bram,  has less than 50 pages. But short in length doesn&#8217;t mean short in  relevance or quality. Bram has written a concise and comprehensible  book about key statistical concepts: selection bias, endogeneity and  Bayes. This book is a must have as a repetition tool for both data miners  and people in any scientific field.</p>
<p><a href="http://www.amazon.com/gp/product/1481173502/ref=as_li_tf_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1481173502&amp;linkCode=as2&amp;tag=dataminirese-20">Thinking Statistically</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=dataminirese-20&amp;l=as2&amp;o=1&amp;a=1481173502" border="0" alt="" width="1" height="1" /></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fdata-mining-book-review-thinking-statistically%2F&amp;linkname=Data%20Mining%20Book%20Review%3A%20Thinking%20Statistically"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Z3ElpVB4UPc:qvZ46YZTqUg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Z3ElpVB4UPc:qvZ46YZTqUg:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Z3ElpVB4UPc:qvZ46YZTqUg:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Z3ElpVB4UPc:qvZ46YZTqUg:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Z3ElpVB4UPc:qvZ46YZTqUg:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Z3ElpVB4UPc:qvZ46YZTqUg:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Z3ElpVB4UPc:qvZ46YZTqUg:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Z3ElpVB4UPc:qvZ46YZTqUg:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/Z3ElpVB4UPc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/data-mining-book-review-thinking-statistically/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Swiss Association for Analytics: Feedback of the First Event</title>
		<link>http://www.dataminingblog.com/swiss-association-for-analytics-feedback-of-the-first-event/</link>
		<comments>http://www.dataminingblog.com/swiss-association-for-analytics-feedback-of-the-first-event/#comments</comments>
		<pubDate>Fri, 29 Mar 2013 18:03:49 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1860</guid>
		<description><![CDATA[On March 14th, the very first event of the Swiss Association for  Analytics took place in Lausanne. In the name of the committee, I would like to thank the participants, the speakers and our sponsor, SAS. More than 45 people attended, which was a big success. We are thus planning for a second event [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/saa_event.JPG"><img class="alignright size-full wp-image-1871" title="saa_event" src="http://www.dataminingblog.com/wp-content/uploads/saa_event.JPG" alt="saa_event" width="240" height="180" /></a>On March 14th, the very first event of the <a href="http://www.swiss-analytics.ch">Swiss Association for  Analytics</a> took place in Lausanne. In the name of the committee, I would like to thank the participants, the speakers and our sponsor, <a href="http://www.sas.com">SAS</a>. More than 45 people attended, which was a big success. We are thus planning for a second event this year! So, stay tuned&#8230;</p>
<p>Here was the program of the first event:</p>
<ul>
<li><strong>Can analysis of customer’s past purchase be used to predict the future ?</strong> - Vincent Schickel, Founder &amp; CEO, Prediggo (20 min)</li>
<li><strong>Analytics @ Nestlé: The Case of Statistical Forecasting</strong> - Marcel Baumgartner, Demand &amp; Supply Planning Expert, Nestlé (45 min)</li>
<li>Break (10 min)</li>
<li><strong>Sponsor presentation</strong> &#8211; SAS (5 min)</li>
<li><strong>Statistical methods for decision making</strong> - Jacques Zuber, HEIG-VD and Statoo Consulting (45 min)</li>
<li><strong>Networking apero</strong></li>
</ul>
<p>Pictures of the event are available <a href="http://event.swiss-analytics.ch">here</a>. To join the association, go to <a href="http://www.swiss-analytics.ch" target="_blank">www.swiss-analytics.ch</a>. If you are interested in giving a talk or sponsor a future event, please contact us at <a href="mailto:info@swiss-analytics.ch">info@swiss-analytics.ch</a></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fswiss-association-for-analytics-feedback-of-the-first-event%2F&amp;linkname=Swiss%20Association%20for%20Analytics%3A%20Feedback%20of%20the%20First%20Event"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=GzkmznxGulg:Ba7LOwcYDdE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=GzkmznxGulg:Ba7LOwcYDdE:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=GzkmznxGulg:Ba7LOwcYDdE:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=GzkmznxGulg:Ba7LOwcYDdE:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=GzkmznxGulg:Ba7LOwcYDdE:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=GzkmznxGulg:Ba7LOwcYDdE:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=GzkmznxGulg:Ba7LOwcYDdE:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=GzkmznxGulg:Ba7LOwcYDdE:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/GzkmznxGulg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/swiss-association-for-analytics-feedback-of-the-first-event/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Mining Book Review: Data Mining with Rattle and R</title>
		<link>http://www.dataminingblog.com/data-mining-book-review-data-mining-with-rattle-and-r/</link>
		<comments>http://www.dataminingblog.com/data-mining-book-review-data-mining-with-rattle-and-r/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 18:00:16 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1788</guid>
		<description><![CDATA[Do you use an open source language for data mining? If yes, then you certainly use R. The language is powerful but not straightforward to learn. The good news is that Graham Williams has the solution for you: Data Mining with Rattle and R &#8211; The Art of Excavating Data for Knowledge Discovery.
The book starts with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/dmwithrattle.jpg"><img class="alignleft size-full wp-image-1851" title="dmwithrattle" src="http://www.dataminingblog.com/wp-content/uploads/dmwithrattle.jpg" alt="dmwithrattle" width="153" height="232" /></a>Do you use an open source language for data mining? If yes, then you certainly use R. The language is powerful but not straightforward to learn. The good news is that Graham Williams has the solution for you: Data Mining with Rattle and R &#8211; The Art of Excavating Data for Knowledge Discovery.</p>
<p>The book starts with an introduction to data mining and R. Rattle is a graphical interface for data mining using R. Although Rattle is introduced and screenshots appear in the text, it&#8217;s not needed to use the tool to benefits from the book. To be noted that all pictures are in colors. In addition to standard data mining algorithms (decision tree, association rules, SVM, etc.), Graham discusses topics such as data preparation, model evaluation and deployment (PMML).</p>
<p>One really good thing about Graham&#8217;s book is that each choice is explained. From data separation to algorithm tuning, everything is justified. It&#8217;s appreciated to read comprehensive theory before digging into examples and codes. The book is an excellent step by step tutorial with all codes needed for your projects. Each chapter is concluded with a summary of the R commands used.</p>
<p>If you are using R or plan to use R for data mining, you should definitely have this book with you. You may already know or have the book by Luis Torgo, <a href="http://www.dataminingblog.com/data-mining-book-review-data-mining-with-r/">Data Mining with R</a>. These books are complementary. Graham&#8217;s book is a strong starting point for learning data mining using R. The book of Luis is an excellent continuation, full of case studies and explanations on how to avoid data mining pitfalls.</p>
<p><a href="http://www.amazon.com/gp/product/1441998896/ref=as_li_tf_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1441998896&amp;linkCode=as2&amp;tag=dataminirese-20">Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!)</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=dataminirese-20&amp;l=as2&amp;o=1&amp;a=1441998896" border="0" alt="" width="1" height="1" /></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fdata-mining-book-review-data-mining-with-rattle-and-r%2F&amp;linkname=Data%20Mining%20Book%20Review%3A%20Data%20Mining%20with%20Rattle%20and%20R"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=sVSgPVQOvVg:tjPL3M5x6qM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=sVSgPVQOvVg:tjPL3M5x6qM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=sVSgPVQOvVg:tjPL3M5x6qM:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=sVSgPVQOvVg:tjPL3M5x6qM:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=sVSgPVQOvVg:tjPL3M5x6qM:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=sVSgPVQOvVg:tjPL3M5x6qM:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=sVSgPVQOvVg:tjPL3M5x6qM:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=sVSgPVQOvVg:tjPL3M5x6qM:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/sVSgPVQOvVg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/data-mining-book-review-data-mining-with-rattle-and-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Guest Post: Vincent Schickel</title>
		<link>http://www.dataminingblog.com/guest-post-vincent-schickel/</link>
		<comments>http://www.dataminingblog.com/guest-post-vincent-schickel/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 18:00:06 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1769</guid>
		<description><![CDATA[It&#8217;s my pleasure to welcome Vincent Schickel for a guest post on Data Mining Research. Vincent is CEO of his own company, Prediggo, providing recommendations to websites. Today, Vincent writes about knowing the value of your data.
Know the value of your data
Data is all around us, and we are getting overwhelmed with it. Thanks to [...]]]></description>
			<content:encoded><![CDATA[<p><em>It&#8217;s my pleasure to welcome Vincent Schickel for a guest post on Data Mining Research. Vincent is CEO of his own company, Prediggo, providing recommendations to websites. Today, Vincent writes about knowing the value of your data.</em></p>
<p align="left"><strong>Know the value of your data</strong></p>
<p align="left">Data is all around us, and we are getting overwhelmed with it. Thanks to the internet, we have created in the past 15 years as much information as we did in over 100 years. The question is what to do with all this data? What value does it have?</p>
<p align="left">Usually raw data at its source has little interest and contains too much noise to be useful. A couple of months ago, it was announced at CERN that there is over 99.9% probability that we have finally proven the existence of the Higgs particle. This discovery is set to be as big as the discovery of the Newton theory, but it did not come cheap. The construction of the new particle colider cost billions of Swiss francs, and produces Terra Byte of data each day. Data mining was crucial in analyzing this huge amount of data and isolate useful information for scientists to work with.</p>
<p align="left">Data can also be worth more money than most people think. Take for example Amazon, the pioneer and leader in the eCommerce Industry. Amazon makes billions of dollars, but did you know that a few years ago, over 30% of its revenue came from its famous recommender system &#8220;Customers Also Bought&#8221;, or CAB. CAB is an unsupervised learning algorithm known to scientists as Item Collaborative Filtering. CAB works by analyzing the content of user&#8217;s shopping basket and compute the similarity between items. Then when a user looks at a specific product, CAB will predict what a user may like by proposing similar items. CAB can be seen as a personal shopping assistant.</p>
<p align="left">Data has huge value but unfortunately it rarely comes as fruits hanging from a tree. This is where Data mining comes in and helps us transform this data into gold.</p>
<p align="left">To know more about Vincent and his company, visit <a href="www.prediggo.com">Prediggo</a></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fguest-post-vincent-schickel%2F&amp;linkname=Guest%20Post%3A%20Vincent%20Schickel"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=aQ1MBXUvn1M:7IOPb6WEYVQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=aQ1MBXUvn1M:7IOPb6WEYVQ:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=aQ1MBXUvn1M:7IOPb6WEYVQ:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=aQ1MBXUvn1M:7IOPb6WEYVQ:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=aQ1MBXUvn1M:7IOPb6WEYVQ:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=aQ1MBXUvn1M:7IOPb6WEYVQ:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=aQ1MBXUvn1M:7IOPb6WEYVQ:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=aQ1MBXUvn1M:7IOPb6WEYVQ:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/aQ1MBXUvn1M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/guest-post-vincent-schickel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Mining Book Review: Fooled by Randomness</title>
		<link>http://www.dataminingblog.com/data-mining-book-review-fooled-by-randomness/</link>
		<comments>http://www.dataminingblog.com/data-mining-book-review-fooled-by-randomness/#comments</comments>
		<pubDate>Mon, 18 Feb 2013 19:00:23 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1824</guid>
		<description><![CDATA[Today&#8217;s book, Fooled by Randomness &#8211; The Hidden Role of Chance in Life and in the Markets, has been written by the author of The Black Swan, Nassim Nicolas Taleb. In your library, it will stands close to Dance with Chance. In the same idea, the role of randomness in everyday life (with a focus [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/fooled1.jpg"><img class="alignleft size-full wp-image-1826" title="fooled" src="http://www.dataminingblog.com/wp-content/uploads/fooled1.jpg" alt="fooled" width="139" height="209" /></a>Today&#8217;s book, Fooled by Randomness &#8211; The Hidden Role of Chance in Life and in the Markets, has been written by the author of <em>The Black Swan</em>, Nassim Nicolas Taleb. In your library, it will stands close to <a href="http://www.dataminingblog.com/data-mining-book-review-dance-with-chance/">Dance with Chance</a>. In the same idea, the role of randomness in everyday life (with a focus in finance) is shown. Taleb&#8217;s concept of &#8220;fooled by randomness&#8221; is similar to the &#8220;illusion of control&#8221; of Dance with Chance (or the opposite).</p>
<p>Taleb warns reader against people or situations that may seem impressive, but are due to luck (randomness). This concept is well summarized by the following quote: &#8220;<em>Mild success can be explainable by skills and labor. Wild success is attributable to variance.</em>&#8221; To demonstrate his ideas, Taleb uses the concept of &#8220;<em>black swans</em>&#8221; (rare events) and shows that we can&#8217;t predict them. One problem is that detecting risks and avoiding them is mediated by the emotional part of the brain, not the rational part (the one which could understand probabilities).</p>
<p>In his book, Taleb has a lot to say about journalists and traders (he is an option trader) for example, and how they are fooled by randomness (but he admits to be himself fooled by randomness: &#8220;<em>I am just like every single character whom I ridiculed in this book.</em>&#8220;). By the way, here is a good quote for traders: &#8220;<em>There is no point searching for patterns that are available to everyone with a brokerage account; once detected, they would be self-canceling</em>&#8220;. He also explains that we are not independence-friendly: &#8220;<em>When viewing two events A and B, it is hard not to assume that A causes B, B causes A, or both cause each other</em>&#8220;.</p>
<p>To conclude, Fooled by Randomness is an excellent reading full of examples where randomness is a trap. Any decision maker and anybody involved with probabilities should read this book.</p>
<p><a href="http://www.amazon.com/gp/product/1400067936/ref=as_li_tf_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1400067936&amp;linkCode=as2&amp;tag=dataminirese-20">Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=dataminirese-20&amp;l=as2&amp;o=1&amp;a=1400067936" border="0" alt="" width="1" height="1" /></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fdata-mining-book-review-fooled-by-randomness%2F&amp;linkname=Data%20Mining%20Book%20Review%3A%20Fooled%20by%20Randomness"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LOPR36EeDXI:UkKwdgejxu4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LOPR36EeDXI:UkKwdgejxu4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LOPR36EeDXI:UkKwdgejxu4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LOPR36EeDXI:UkKwdgejxu4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LOPR36EeDXI:UkKwdgejxu4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LOPR36EeDXI:UkKwdgejxu4:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LOPR36EeDXI:UkKwdgejxu4:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LOPR36EeDXI:UkKwdgejxu4:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/LOPR36EeDXI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/data-mining-book-review-fooled-by-randomness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>1st Swiss Association for Analytics Event</title>
		<link>http://www.dataminingblog.com/1st-swiss-association-for-analytics-event/</link>
		<comments>http://www.dataminingblog.com/1st-swiss-association-for-analytics-event/#comments</comments>
		<pubDate>Fri, 08 Feb 2013 12:28:17 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1833</guid>
		<description><![CDATA[
We are proud to announce the 1st Swiss Association for Analytics Event!
When: Thursday, March 14th at 6pm
What: Analytics event and networking organized by the Swiss Association for Analytics
Where: Hotel Continental Lausanne, Place de la Gare 2, 1003 Lausanne (in front of the train station). Parking Simplon at 3 minutes.
Program of the event

Can analysis of customer’s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://event.swiss-analytics.ch"><img class="aligncenter size-full wp-image-1842" title="SAAevent1banner_small" src="http://www.dataminingblog.com/wp-content/uploads/SAAevent1banner_small.png" alt="SAAevent1banner_small" width="468" height="83" /></a></p>
<p>We are proud to announce the <strong>1st Swiss Association for Analytics Event</strong>!</p>
<p><strong>When:</strong> Thursday, March 14th at 6pm<br />
<strong>What:</strong> Analytics event and networking organized by the Swiss Association for Analytics<br />
<strong>Where:</strong> Hotel Continental Lausanne, Place de la Gare 2, 1003 Lausanne (in front of the train station). Parking Simplon at 3 minutes.</p>
<p><strong>Program of the event</strong></p>
<ul>
<li><strong>Can analysis of customer’s past purchase be used to predict the future ?</strong> &#8211; Vincent Schickel, Founder &amp; CEO, Prediggo (20 min)</li>
<li><strong>Analytics @ Nestlé: The Case of Statistical Forecasting</strong> &#8211; Marcel Baumgartner, Demand &amp; Supply Planning Expert, Nestlé (45 min)</li>
<li>Break (10 min)</li>
<li><strong>Sponsor presentation by SAS</strong> (5 min)</li>
<li><strong>Competing on Analytics in Sports or What Is All the Fuss About Moneyball</strong> &#8211; Marcel Bieler, Senior Industry Consultant, SAS EMEA Region Risk (45 min)</li>
<li><strong>Networking apero</strong></li>
</ul>
<p>For more information, and to subscribe to this free event: <a href="http://event.swiss-analytics.ch">event.swiss-analytics.ch</a>.</p>
<p>We are looking forward to meeting you there!</p>
<p>This event is sponsored by <a href="http://www.sas.com">SAS</a></p>
<p><a href="http://www.dataminingblog.com/wp-content/uploads/logo_sas.gif"><img class="aligncenter size-full wp-image-1834" title="logo_sas" src="http://www.dataminingblog.com/wp-content/uploads/logo_sas.gif" alt="logo_sas" width="220" height="65" /></a></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2F1st-swiss-association-for-analytics-event%2F&amp;linkname=1st%20Swiss%20Association%20for%20Analytics%20Event"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LTjgMiRAqOc:2nCEjAcmni0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LTjgMiRAqOc:2nCEjAcmni0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LTjgMiRAqOc:2nCEjAcmni0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LTjgMiRAqOc:2nCEjAcmni0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LTjgMiRAqOc:2nCEjAcmni0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LTjgMiRAqOc:2nCEjAcmni0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=LTjgMiRAqOc:2nCEjAcmni0:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=LTjgMiRAqOc:2nCEjAcmni0:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/LTjgMiRAqOc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/1st-swiss-association-for-analytics-event/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Mining Interview: Wil van der Aalst</title>
		<link>http://www.dataminingblog.com/data-mining-interview-wil-van-der-aalst/</link>
		<comments>http://www.dataminingblog.com/data-mining-interview-wil-van-der-aalst/#comments</comments>
		<pubDate>Sat, 02 Feb 2013 12:36:13 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1797</guid>
		<description><![CDATA[Data Mining Research proposes today an interview with Wil van der Aalst, an expert in process mining. To know more about him and this emerging field of research, continue reading. Thanks Wil for your answers.
Data Mining Research: Could you introduce yourself? What is your journey in the field of data mining?
Wil van der Aalst: Before [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/process1.jpg"><img class="alignright size-full wp-image-1799" title="process" src="http://www.dataminingblog.com/wp-content/uploads/process1.jpg" alt="process" width="266" height="177" /></a><em>Data Mining Research proposes today an interview with Wil van der Aalst, an expert in process mining. To know more about him and this emerging field of research, continue reading. Thanks Wil for your answers.</em></p>
<p><strong>Data Mining Research: Could you introduce yourself? What is your journey in the field of data mining?</strong></p>
<p><strong>Wil van der Aalst:</strong> Before working on process mining, I worked on Petri nets, the modeling and analysis of workflow processes, workflow patterns, and process-aware information systems. I also started the main scientific conference on Business Process Management and I’m the founder and leader of the IEEE Task Force on Process Mining. I got into the topic of process mining because I got bored with purely model-based research. For example, it is unsatisfactory to work on the verification and performance analysis of process models knowing that these models have nothing to do with reality.  Process mining provides an important bridge between data mining and business process modeling and analysis. Process mining research at TU/e (Eindhoven University of Technology) started in 1999. At that time there was little event data available and the initial process mining techniques were extremely naïve and hence unusable. Over the last decade event data has become readily available and process mining techniques have matured. Moreover, process mining algorithms have been implemented in various academic and commercial systems. Today, there is an active group of researchers working on process mining and it has become one of the “hot topics” in BPM research. Moreover, there is a huge interest from industry in process mining and more and more software vendors started adding process mining functionality to their tools.</p>
<p><a href="http://springer.com/978-3-642-19344-6"><img class="aligncenter size-full wp-image-1800" title="processbook" src="http://www.dataminingblog.com/wp-content/uploads/processbook.jpg" alt="processbook" width="153" height="230" /></a></p>
<p><strong>DMR: What is process mining?</strong></p>
<p><strong>WvdA:</strong> The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today&#8217;s (information) systems. Process mining includes (automated) process discovery (i.e., extracting process models from an event log), conformance checking (i.e., monitoring deviations by comparing model and log), social network/organizational mining, automated construction of simulation models, model extension, model repair, case prediction, and history-based recommendations (cf. <a href="http://www.processmining.org">www.processmining.org</a>).</p>
<p>The most appealing form of process mining is <em>process discovery</em>. A discovery technique takes an event log and produces a model without using any a-priori information. For many organizations it is surprising to see that existing techniques are indeed able to discover real processes merely based on example behaviors recorded in event logs. Another form of process mining is <em>conformance checking</em>. Here, an existing process model is compared with an event log of the same process. Conformance checking can be used to check if reality, as recorded in the log, conforms to the model and vice versa. A third type of process mining is <em>model</em> <em>enhancement</em>. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. For instance, by using timestamps in the event log one can extend the model to show bottlenecks, service levels, throughput times, and frequencies.</p>
<p><strong>DRM: What are the benefits for companies to apply process mining?</strong></p>
<p><strong> </strong></p>
<p><strong>WvdA:</strong> Although event data are omnipresent, organizations lack a good understanding of their actual processes. Management decisions tend to be based on PowerPoint diagrams, local politics, or management dashboards rather than a careful analysis of event data. The knowledge hidden in event logs cannot be turned into actionable information. Advances in data mining made it possible to find valuable patterns in large datasets and to support complex decisions based on such data. However, classical data mining problems such as classification, clustering, regression, association rule learning, and sequence/episode mining are not process-centric. Therefore, Business Process Management (BPM) approaches tend to resort to hand-made models. Process mining research aims to bridge the gap between data mining and BPM. Metaphorically, process mining can be seen as taking X-rays to diagnose/predict problems and recommend treatment.</p>
<p><strong>DMR: What are the future works in this field from a research point of view?</strong></p>
<p><strong>WvdA:</strong> The growing maturity of process mining is illustrated by the Process Mining Manifesto recently released by the IEEE Task Force on Process Mining. This manifesto is supported by 53 organizations and 77 process mining experts contributed to it. The active contributions from end-users, tool vendors, consultants, analysts, and researchers illustrate the significance of process mining as a bridge between data mining and business process modeling. The manifesto lists six guiding principles and eleven challenges. As an example, consider Challenge C4 in the manifesto: “Dealing with Concept Drift.” The term concept drift refers to a situation in which the process is changing while we’re analyzing it. For instance, in the beginning of the event log, two activities might be concurrent, whereas later in the log, they become sequential. Processes might change because of periodic or seasonal changes (for example, “in December, there is more demand” or “on Friday afternoon, fewer employees are available”) or changing conditions (“the market is getting more competitive”). Such changes impact processes, and detecting and analyzing them is vital. However, most process-mining techniques analyze processes as if they’re in steady state. This is just one of many open problems in the process mining field.</p>
<p>Given these challenges and the interest of industry, I hope that more and more data miners will start working on this exiting topic. In short: “It’s the Process Stupid!”, so start mining processes rather than data.</p>
<p><a href="http://www.win.tue.nl/ieeetfpm/"><img class="aligncenter size-full wp-image-1802" title="manifesto" src="http://www.dataminingblog.com/wp-content/uploads/manifesto1.png" alt="manifesto" width="182" height="256" /></a></p>
<p><strong>About Wil van der Aalst</strong></p>
<p><em>Prof.dr.ir. Wil van der Aalst is a full professor of Information Systems at the Technische Universiteit Eindhoven (TU/e). Currently he is also an adjunct professor at Queensland University of Technology (QUT) working within the BPM group there. His research interests include workflow management, process mining, Petri nets, business process management, process modeling, and process analysis. More information: <a href="http://www.vdaalst.com">www.vdaalst.com</a></em></p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fdata-mining-interview-wil-van-der-aalst%2F&amp;linkname=Data%20Mining%20Interview%3A%20Wil%20van%20der%20Aalst"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Gf3XTuEdKqg:Yao8Px1P-jk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Gf3XTuEdKqg:Yao8Px1P-jk:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Gf3XTuEdKqg:Yao8Px1P-jk:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Gf3XTuEdKqg:Yao8Px1P-jk:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Gf3XTuEdKqg:Yao8Px1P-jk:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Gf3XTuEdKqg:Yao8Px1P-jk:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=Gf3XTuEdKqg:Yao8Px1P-jk:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=Gf3XTuEdKqg:Yao8Px1P-jk:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/Gf3XTuEdKqg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/data-mining-interview-wil-van-der-aalst/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Forecasting resources</title>
		<link>http://www.dataminingblog.com/forecasting-resources/</link>
		<comments>http://www.dataminingblog.com/forecasting-resources/#comments</comments>
		<pubDate>Sat, 26 Jan 2013 18:34:32 +0000</pubDate>
		<dc:creator>Sandro Saitta</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dataminingblog.com/?p=1816</guid>
		<description><![CDATA[This is the last post of the series on forecasting. Up to now, subjects  such as varying factors, methods and evaluation criteria have been  studied. This post lists key references in the field of time  series prediction, particularly using Support Vector Regression (SVR). Note that some of these resources are not free:

Time [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dataminingblog.com/wp-content/uploads/articless.jpg"><img class="alignright size-full wp-image-1820" title="articless" src="http://www.dataminingblog.com/wp-content/uploads/articless.jpg" alt="articless" width="275" height="183" /></a>This is the last post of the series on forecasting. Up to now, subjects  such as <a href="http://www.dataminingblog.com/forecasting-varying-factors/">varying factors</a>, <a href="http://www.dataminingblog.com/forecasting-standard-methods/">methods</a> and <a href="http://www.dataminingblog.com/forecasting-evaluation-criteria/">evaluation criteria</a> have been  studied. This post lists key references in the field of time  series prediction, particularly using Support Vector Regression (SVR). Note that some of these resources are not free:</p>
<ul>
<li><a href="http://www.me.utexas.edu/~jensen/ORMM/supplements/units/time_series/time_series.pdf">Time Series and Forecasting (chapter 22)</a>: different forecasting models  are described in details in this book chapter. Evaluation criteria are  also presented. Although several equations are given, business problems  such as planning are mentioned.</li>
<li><a href="http://">Support Vector Regression for link load prediction</a>: SVR are detailed  as well as specific studies on sensitivity to input data, SVR parameters  and training size. Other topics such as complexity and forecast horizon  are analyzed.</li>
<li><a href="http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;arnumber=1716528&amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1716528">Forecasting with computational intelligence &#8211; an evaluation of Support  Vector Regression and Artificial Neural Networks for time series  predictions</a>: a technical paper detailing what are times series and how  SVR and NN (Neural Networks) can be used for forecasting. Use of training, validation and  test set is advised.</li>
<li><a href="http://repository.upenn.edu/cgi/viewcontent.cgi?article=1075&amp;context=marketing_papers">Error measures for generalizing about forecasting methods &#8211; empirical  comparisons</a>: analysis of the different evaluation criteria. Clearly  separate two common goals that are to i) select the best forecasting  model and to ii) calibrate a given forecasting model.</li>
<li><a href="http://www.stefan-rueping.de/publications/rueping-morik-2003-a.pdf">Support Vector Machines and learning about time</a>: good introductory  article with several time series principles explained. Different ways to  approach forecasting are presented.</li>
<li><a href="http://otexts.com/fpp/">Forecasting &#8211; principles and practice</a>: this online textbook is my  favorite resource to learn forecasting. Focus is made on concepts, key ideas  and business challenges regarding time series prediction. If you should  read only one resource about principles of forecasting, it should be  this one.</li>
<li><a href="http://robjhyndman.com/papers/foresight.pdf">Another look at forecast &#8211; accuracy metrics for intermittent demand</a>:  concise article full of advices to choose the right evaluation criteria  for forecasting. If you want to read only one reference about evaluation  criteria, I strongly advise this one.</li>
</ul>
<p>Feel free to comment and add you own references!</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.dataminingblog.com%2Fforecasting-resources%2F&amp;linkname=Forecasting%20resources"><img src="http://www.dataminingblog.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/dataminingblog?a=9JleWSXiJdQ:NMV_owhRUSg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=9JleWSXiJdQ:NMV_owhRUSg:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=9JleWSXiJdQ:NMV_owhRUSg:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=9JleWSXiJdQ:NMV_owhRUSg:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=9JleWSXiJdQ:NMV_owhRUSg:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=9JleWSXiJdQ:NMV_owhRUSg:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/dataminingblog?i=9JleWSXiJdQ:NMV_owhRUSg:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/dataminingblog?a=9JleWSXiJdQ:NMV_owhRUSg:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/dataminingblog?d=dnMXMwOfBR0" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/dataminingblog/~4/9JleWSXiJdQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dataminingblog.com/forecasting-resources/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
