<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;DUIDQHs4fyp7ImA9WhVTFE0.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059</id><updated>2012-02-27T21:39:31.537-08:00</updated><category term="k-nn" /><category term="extract transform load" /><category term="data mining" /><category term="concept mining" /><category term="robots.txt" /><category term="knn" /><category term="importxml" /><category term="etl" /><category term="ajax web scraping scraper" /><category term="tutorial" /><category term="web crawling" /><category term="r" /><category term="text mining" /><category term="text analysis" /><category term="web scraping" /><category term="business intelligence" /><category term="web scraping rapidminer xpath web scrape rapid miner x-path" /><category term="google spreadsheets" /><category term="crawling rules" /><category term="rapid miner" /><category term="extjs ext js tutorial learn help" /><category term="google docs spreadsheets" /><category term="web crawl" /><category term="x-path" /><category term="xpath" /><category term="rapidminer" /><category term="how to scrape ajax web pages" /><category term="rapidminer data mining etl" /><category term="naive bayes" /><title>Vancouver Data Blog by Neil McGuigan</title><subtitle type="html">Some RapidMiner, some JMP, some Google Docs</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://vancouverdata.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>50</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/VancouverData" /><feedburner:info uri="vancouverdata" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>VancouverData</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><entry gd:etag="W/&quot;DUEMRX0-fyp7ImA9WhRaEU0.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6460311686727702164</id><published>2012-02-11T20:36:00.000-08:00</published><updated>2012-02-12T20:34:44.357-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-12T20:34:44.357-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="ajax web scraping scraper" /><title>Less Painful AJAX / Javascript Web Scraping</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6460311686727702164/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2012/02/less-painful-ajax-javascript-web.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6460311686727702164?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6460311686727702164?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/80uYthcCyyM/less-painful-ajax-javascript-web.html" title="Less Painful AJAX / Javascript Web Scraping" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/wB9-rRmjT2E/default.jpg" height="72" width="72" /><thr:total>2</thr:total><content type="html">If you read my previous post, you'll see that scraping ajax pages can be a pain. So I wrote a little Java program to make it easier. It takes a list of URLs to scrape, and will render them in a browser, and save the (normal and ajax) rendered HTML and screenshots to a folder. 

Here's the how-to video:



You need Firefox 3+ installed, as well as Java 1.6. This is a beta project, and no warranty 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/oFYVscJqgIbr4c_N1LWDaZWKBUg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oFYVscJqgIbr4c_N1LWDaZWKBUg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/oFYVscJqgIbr4c_N1LWDaZWKBUg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oFYVscJqgIbr4c_N1LWDaZWKBUg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=80uYthcCyyM:i_dAyYxzP80:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=80uYthcCyyM:i_dAyYxzP80:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=80uYthcCyyM:i_dAyYxzP80:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=80uYthcCyyM:i_dAyYxzP80:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=80uYthcCyyM:i_dAyYxzP80:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/80uYthcCyyM" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2012/02/less-painful-ajax-javascript-web.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4EQ3c6eCp7ImA9WhRbGE4.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-3679647273351338746</id><published>2012-02-09T16:01:00.000-08:00</published><updated>2012-02-09T17:55:02.910-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-09T17:55:02.910-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="how to scrape ajax web pages" /><title>Web Scraping AJAX Pages</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/3679647273351338746/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2012/02/web-scraping-ajax-pages.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3679647273351338746?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3679647273351338746?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/I3gpmcckeEg/web-scraping-ajax-pages.html" title="Web Scraping AJAX Pages" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><content type="html">This is part four of a series of video tutorials on web scraping and web crawling.

Part 1: Web scraping with Google Spreadsheets and XPathPart 2: Web Crawling with RapidMinerPart 3: Web Scraping with RapidMiner and Xpath
This post explains how to capture HTML from Ajax / Javascript generated pages.

Here is the accompanying video.

The first thing you should know is that it is a major, major 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/n7ESOqdwUCCnFrtFXxA-rnH_WoM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/n7ESOqdwUCCnFrtFXxA-rnH_WoM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/n7ESOqdwUCCnFrtFXxA-rnH_WoM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/n7ESOqdwUCCnFrtFXxA-rnH_WoM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=I3gpmcckeEg:qvxVZbTYzkM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=I3gpmcckeEg:qvxVZbTYzkM:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=I3gpmcckeEg:qvxVZbTYzkM:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=I3gpmcckeEg:qvxVZbTYzkM:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=I3gpmcckeEg:qvxVZbTYzkM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/I3gpmcckeEg" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2012/02/web-scraping-ajax-pages.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkAAQnk6fCp7ImA9WhRUGEU.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-8179698928628077273</id><published>2012-01-29T17:59:00.000-08:00</published><updated>2012-01-29T17:59:03.714-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-29T17:59:03.714-08:00</app:edited><title>On Making Videos</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/8179698928628077273/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2012/01/on-making-videos.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/8179698928628077273?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/8179698928628077273?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/wxX3buLBOrs/on-making-videos.html" title="On Making Videos" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">Here is what i use to make my videos:


1. CamStudio. This is a nice free and open-source desktop video capture program. Make sure to use their Lossless Codec, and go with these settings:

Set Keyframes Every 30 frames
Capture Frames Every = 50 milliseconds
Playback Rate = 20 frames per second
Video codec: CamStudio Lossless Codec 
Quality: 70%


2. Handbrake Video Transcoder. This will help you 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/iMD5ybmdLlb29I-9y3-PXqW5ulQ/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/iMD5ybmdLlb29I-9y3-PXqW5ulQ/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/iMD5ybmdLlb29I-9y3-PXqW5ulQ/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/iMD5ybmdLlb29I-9y3-PXqW5ulQ/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=wxX3buLBOrs:3MemtnIR_iA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=wxX3buLBOrs:3MemtnIR_iA:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=wxX3buLBOrs:3MemtnIR_iA:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=wxX3buLBOrs:3MemtnIR_iA:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=wxX3buLBOrs:3MemtnIR_iA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/wxX3buLBOrs" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2012/01/on-making-videos.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0YGSHw7eyp7ImA9WhRWE0U.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-3878839214082603433</id><published>2011-12-31T19:38:00.003-08:00</published><updated>2011-12-31T19:38:49.203-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-31T19:38:49.203-08:00</app:edited><title>Happy New Year</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/3878839214082603433/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/12/happy-new-year.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3878839214082603433?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3878839214082603433?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/o6utg6or2X0/happy-new-year.html" title="Happy New Year" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">75,000 pageviews this year! Thanks to everyone for visiting. I will post some new material in the new year.

Have a safe and fun 2012

Neil
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/XwoW0irfTpgm8OHsEBEn02Jlpwc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/XwoW0irfTpgm8OHsEBEn02Jlpwc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/XwoW0irfTpgm8OHsEBEn02Jlpwc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/XwoW0irfTpgm8OHsEBEn02Jlpwc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o6utg6or2X0:5pzvPX7VarM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o6utg6or2X0:5pzvPX7VarM:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o6utg6or2X0:5pzvPX7VarM:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=o6utg6or2X0:5pzvPX7VarM:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o6utg6or2X0:5pzvPX7VarM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/o6utg6or2X0" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/12/happy-new-year.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEANQnk-eSp7ImA9WhRTFEk.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-5679397571750587883</id><published>2011-11-04T14:39:00.001-07:00</published><updated>2011-11-04T14:39:53.751-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-04T14:39:53.751-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="extjs ext js tutorial learn help" /><title>My new blog about learning ExtJS</title><link rel="related" href="http://extjs-tutorials.blogspot.com/" title="My new blog about learning ExtJS" /><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/5679397571750587883/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/11/extjs-ext-js-learn-tutorial-help.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5679397571750587883?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5679397571750587883?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/o0hLNys3zPU/extjs-ext-js-learn-tutorial-help.html" title="My new blog about learning ExtJS" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">I have a new blog. It's about learning to use ExtJS, a great rich internet application library in javascript. Here it is:

http://extjs-tutorials.blogspot.com/

Check it out. Thanks
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/pQDHTRyfRj5ugn68ImjX4JK3Fps/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/pQDHTRyfRj5ugn68ImjX4JK3Fps/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/pQDHTRyfRj5ugn68ImjX4JK3Fps/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/pQDHTRyfRj5ugn68ImjX4JK3Fps/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o0hLNys3zPU:QJa7uBoBwEU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o0hLNys3zPU:QJa7uBoBwEU:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o0hLNys3zPU:QJa7uBoBwEU:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=o0hLNys3zPU:QJa7uBoBwEU:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=o0hLNys3zPU:QJa7uBoBwEU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/o0hLNys3zPU" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/11/extjs-ext-js-learn-tutorial-help.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkQNQHk_cCp7ImA9WhdbEk8.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6633676995435803511</id><published>2011-10-09T22:26:00.000-07:00</published><updated>2011-10-09T22:26:31.748-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-09T22:26:31.748-07:00</app:edited><title>How Obama's data-crunching prowess may get him re-elected</title><link rel="related" href="http://www.cnn.com/2011/10/09/tech/innovation/obama-data-crunching-election/index.html?hpt=hp_c1" title="How Obama's data-crunching prowess may get him re-elected" /><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6633676995435803511/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/10/how-obamas-data-crunching-prowess-may.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6633676995435803511?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6633676995435803511?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/kqE_a71PiTI/how-obamas-data-crunching-prowess-may.html" title="How Obama's data-crunching prowess may get him re-elected" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">An article on CNN about how the Obama 2012 campaign has hired many data miners and statisticians to help boost fundraising and support.

http://www.cnn.com/2011/10/09/tech/innovation/obama-data-crunching-election/index.html?hpt=hp_c1
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/071kafQdDN3pBbGweE0XdGae9hU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/071kafQdDN3pBbGweE0XdGae9hU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/071kafQdDN3pBbGweE0XdGae9hU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/071kafQdDN3pBbGweE0XdGae9hU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=kqE_a71PiTI:X7AUsDUdHzs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=kqE_a71PiTI:X7AUsDUdHzs:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=kqE_a71PiTI:X7AUsDUdHzs:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=kqE_a71PiTI:X7AUsDUdHzs:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=kqE_a71PiTI:X7AUsDUdHzs:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/kqE_a71PiTI" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/10/how-obamas-data-crunching-prowess-may.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0IFQ30_cCp7ImA9WhdbEUw.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-1939800431561533890</id><published>2011-10-08T15:31:00.001-07:00</published><updated>2011-10-08T16:11:52.348-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-08T16:11:52.348-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="text mining" /><category scheme="http://www.blogger.com/atom/ns#" term="text analysis" /><category scheme="http://www.blogger.com/atom/ns#" term="data mining" /><category scheme="http://www.blogger.com/atom/ns#" term="rapidminer" /><category scheme="http://www.blogger.com/atom/ns#" term="r" /><title>Text Analytics with RapidMiner Part 6 of 6 - Applying the Model to New Documents</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/1939800431561533890/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/10/rapidminer-text-mining-r-analytics.html#comment-form" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1939800431561533890?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1939800431561533890?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/OuOOsbQBKFI/rapidminer-text-mining-r-analytics.html" title="Text Analytics with RapidMiner Part 6 of 6 - Applying the Model to New Documents" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/9I0BcMuhPe8/default.jpg" height="72" width="72" /><thr:total>9</thr:total><content type="html">After my last series, I got a lot of questions about how to apply a model to new data, so here is the real final installment in the series.

I show how to save a wordlist and model to the repository. I use them later to read the wordlist and model and apply them to new documents that RapidMiner hasn't seen before. It correctly labels 11 of the 12 documents.


&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/kp3noAHzeeDgnDSNDJzE4qND2HI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/kp3noAHzeeDgnDSNDJzE4qND2HI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/kp3noAHzeeDgnDSNDJzE4qND2HI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/kp3noAHzeeDgnDSNDJzE4qND2HI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=OuOOsbQBKFI:QvK7TQFRJC0:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=OuOOsbQBKFI:QvK7TQFRJC0:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=OuOOsbQBKFI:QvK7TQFRJC0:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=OuOOsbQBKFI:QvK7TQFRJC0:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=OuOOsbQBKFI:QvK7TQFRJC0:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/OuOOsbQBKFI" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/10/rapidminer-text-mining-r-analytics.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEABRXg_cCp7ImA9WhdWEE8.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6512368803019856563</id><published>2011-09-02T21:06:00.000-07:00</published><updated>2011-09-02T21:05:54.648-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-02T21:05:54.648-07:00</app:edited><title>September sunset</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6512368803019856563/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/09/september-sunset.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6512368803019856563?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6512368803019856563?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/jg-_58jae-A/september-sunset.html" title="September sunset" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-Pbmmxtf1_nc/TmGnorFeRbI/AAAAAAAAAOI/ALClZf1ryCc/s72-c/%253D%253Futf-8%253FB%253FVmFuY291dmVyLTIwMTEwOTAyLTAwMDY3LmpwZw%253D%253D%253F%253D-754650" height="72" width="72" /><thr:total>1</thr:total><content type="html">
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/NyW25A61moEzckcr3fkxLD9Mthk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/NyW25A61moEzckcr3fkxLD9Mthk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/NyW25A61moEzckcr3fkxLD9Mthk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/NyW25A61moEzckcr3fkxLD9Mthk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=jg-_58jae-A:Ze9Sf_oAmCc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=jg-_58jae-A:Ze9Sf_oAmCc:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=jg-_58jae-A:Ze9Sf_oAmCc:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=jg-_58jae-A:Ze9Sf_oAmCc:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=jg-_58jae-A:Ze9Sf_oAmCc:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/jg-_58jae-A" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/09/september-sunset.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0cFQn47eyp7ImA9WhdXFUg.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-1128744730928246656</id><published>2011-08-27T20:01:00.002-07:00</published><updated>2011-08-28T11:10:13.003-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-28T11:10:13.003-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="extract transform load" /><category scheme="http://www.blogger.com/atom/ns#" term="data mining" /><category scheme="http://www.blogger.com/atom/ns#" term="rapidminer" /><title>RapidMiner ETL - Transforming Attributes with Functions</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/1128744730928246656/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-transforming-attributes.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1128744730928246656?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1128744730928246656?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/YoQcHcT5JEg/rapidminer-etl-transforming-attributes.html" title="RapidMiner ETL - Transforming Attributes with Functions" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/6uBKg9-EMRk/default.jpg" height="72" width="72" /><thr:total>2</thr:total><content type="html">In this video I show how to transform features in RapidMiner using operators such as log, sqrt, absolute value, and multiplying columns.


&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/JFdJSFnhFMJhRB6bDwhCJ2kXIkI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/JFdJSFnhFMJhRB6bDwhCJ2kXIkI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/JFdJSFnhFMJhRB6bDwhCJ2kXIkI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/JFdJSFnhFMJhRB6bDwhCJ2kXIkI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=YoQcHcT5JEg:2_XwZ6Jo19w:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=YoQcHcT5JEg:2_XwZ6Jo19w:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=YoQcHcT5JEg:2_XwZ6Jo19w:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=YoQcHcT5JEg:2_XwZ6Jo19w:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=YoQcHcT5JEg:2_XwZ6Jo19w:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/YoQcHcT5JEg" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-transforming-attributes.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkAMRH88cCp7ImA9WhdXFUg.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6208340498780104596</id><published>2011-08-27T20:01:00.000-07:00</published><updated>2011-08-28T11:06:25.178-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-28T11:06:25.178-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="extract transform load" /><category scheme="http://www.blogger.com/atom/ns#" term="data mining" /><title>RapidMiner ETL - Normalizing, Discretizing, Recoding</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6208340498780104596/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-normalizing-discretizing.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6208340498780104596?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6208340498780104596?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/_zvLLT_WfUQ/rapidminer-etl-normalizing-discretizing.html" title="RapidMiner ETL - Normalizing, Discretizing, Recoding" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/XfvSIgcTDZs/default.jpg" height="72" width="72" /><thr:total>0</thr:total><content type="html">In this video I show how to normalize an attribute, including z-normalization, how to discretize a column, and how to recode values



&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/-VH4lRM8EgKU2zMOLw3MIBNVLZw/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-VH4lRM8EgKU2zMOLw3MIBNVLZw/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/-VH4lRM8EgKU2zMOLw3MIBNVLZw/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-VH4lRM8EgKU2zMOLw3MIBNVLZw/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=_zvLLT_WfUQ:2nLI5Ll2EUk:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=_zvLLT_WfUQ:2nLI5Ll2EUk:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=_zvLLT_WfUQ:2nLI5Ll2EUk:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=_zvLLT_WfUQ:2nLI5Ll2EUk:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=_zvLLT_WfUQ:2nLI5Ll2EUk:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/_zvLLT_WfUQ" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-normalizing-discretizing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkEEQnY-eip7ImA9WhdXE0U.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-4366878170088799710</id><published>2011-08-25T18:18:00.000-07:00</published><updated>2011-08-26T10:43:23.852-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-26T10:43:23.852-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="rapidminer data mining etl" /><title>RapidMiner ETL - Sampling, Selecting Rows, Attributes</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/4366878170088799710/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-sampling-selecting-rows.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4366878170088799710?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4366878170088799710?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/aN44lmPekPA/rapidminer-etl-sampling-selecting-rows.html" title="RapidMiner ETL - Sampling, Selecting Rows, Attributes" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/DtKE2aaRhAU/default.jpg" height="72" width="72" /><thr:total>2</thr:total><content type="html">In this video I show how to sample rows, including balancing class labels, bootstrap sampling. I also show how to filter rows by value, and select a subset of attributes.



You can get the dataset here
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/fxKmuTErGxQrQGB0uwRHMiYjnyg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/fxKmuTErGxQrQGB0uwRHMiYjnyg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/fxKmuTErGxQrQGB0uwRHMiYjnyg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/fxKmuTErGxQrQGB0uwRHMiYjnyg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=aN44lmPekPA:OzlVnw3AItc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=aN44lmPekPA:OzlVnw3AItc:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=aN44lmPekPA:OzlVnw3AItc:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=aN44lmPekPA:OzlVnw3AItc:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=aN44lmPekPA:OzlVnw3AItc:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/aN44lmPekPA" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-sampling-selecting-rows.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkUMSHcycSp7ImA9WhdXE08.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-566762827856756137</id><published>2011-08-25T17:58:00.000-07:00</published><updated>2011-08-25T17:58:09.999-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-25T17:58:09.999-07:00</app:edited><title>RapidMiner ETL - Combining Datasets</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/566762827856756137/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-combining-datasets.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/566762827856756137?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/566762827856756137?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/ZXBS9x_IpUk/rapidminer-etl-combining-datasets.html" title="RapidMiner ETL - Combining Datasets" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/RioT2Z1QB9s/default.jpg" height="72" width="72" /><thr:total>1</thr:total><content type="html">In this video, I show how to combine multiple datasets into one, and join columns and append rows.



&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/l-vapXP5q8tMu0kqNh7UBJd1oIU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/l-vapXP5q8tMu0kqNh7UBJd1oIU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/l-vapXP5q8tMu0kqNh7UBJd1oIU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/l-vapXP5q8tMu0kqNh7UBJd1oIU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ZXBS9x_IpUk:lM11LuAVid4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ZXBS9x_IpUk:lM11LuAVid4:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ZXBS9x_IpUk:lM11LuAVid4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=ZXBS9x_IpUk:lM11LuAVid4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ZXBS9x_IpUk:lM11LuAVid4:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/ZXBS9x_IpUk" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-combining-datasets.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C04MRno_cCp7ImA9WhdVE0U.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-4666301277375180679</id><published>2011-08-25T17:33:00.000-07:00</published><updated>2011-09-18T14:39:47.448-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-18T14:39:47.448-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="etl" /><category scheme="http://www.blogger.com/atom/ns#" term="data mining" /><category scheme="http://www.blogger.com/atom/ns#" term="rapidminer" /><category scheme="http://www.blogger.com/atom/ns#" term="business intelligence" /><title>And We're Back. A video series on ETL with RapidMiner</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/4666301277375180679/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-extract-transform-load.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4666301277375180679?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4666301277375180679?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/UnfEGc210hI/rapidminer-etl-extract-transform-load.html" title="And We're Back. A video series on ETL with RapidMiner" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><content type="html">Back with some more videos! Sorry for the long wait, and thanks for your patience.

This series is on ETL: Extract, Transform, Load with Rapidminer.

The first video shows how to combine multiple datasets into one, by joining columns and appending rows.

The second videos is on sampling and selecting rows and attributes.

More videos coming soon.
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/tNMBiIPRmuT2aycuJUde9_5vR0M/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/tNMBiIPRmuT2aycuJUde9_5vR0M/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/tNMBiIPRmuT2aycuJUde9_5vR0M/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/tNMBiIPRmuT2aycuJUde9_5vR0M/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=UnfEGc210hI:WEIFrolQFXI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=UnfEGc210hI:WEIFrolQFXI:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=UnfEGc210hI:WEIFrolQFXI:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=UnfEGc210hI:WEIFrolQFXI:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=UnfEGc210hI:WEIFrolQFXI:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/UnfEGc210hI" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/08/rapidminer-etl-extract-transform-load.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUQMRXc9fyp7ImA9WhZRFUs.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-1960434316324224700</id><published>2011-04-10T18:13:00.001-07:00</published><updated>2011-04-11T17:16:24.967-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-04-11T17:16:24.967-07:00</app:edited><title>A rainy sunday in downtown Vancouver</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/1960434316324224700/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/04/rainy-sunday-in-downtown-vancouver.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1960434316324224700?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/1960434316324224700?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/0hasLYHMpsQ/rainy-sunday-in-downtown-vancouver.html" title="A rainy sunday in downtown Vancouver" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-2KIu8bmB7e4/TaJV1-JAk7I/AAAAAAAAAMw/pow9K1Km70s/s72-c/%253D%253Futf-8%253FB%253FVmFuY291dmVyLTIwMTEwNDEwLTAwMDM2LmpwZw%253D%253D%253F%253D-739348" height="72" width="72" /><thr:total>2</thr:total><content type="html">My blog should look better on mobile devices now.
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/vxQk_DcNEq0HPrryPmDhLT6jKnk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/vxQk_DcNEq0HPrryPmDhLT6jKnk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/vxQk_DcNEq0HPrryPmDhLT6jKnk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/vxQk_DcNEq0HPrryPmDhLT6jKnk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=0hasLYHMpsQ:n2zSbCFTCY4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=0hasLYHMpsQ:n2zSbCFTCY4:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=0hasLYHMpsQ:n2zSbCFTCY4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=0hasLYHMpsQ:n2zSbCFTCY4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=0hasLYHMpsQ:n2zSbCFTCY4:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/0hasLYHMpsQ" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/04/rainy-sunday-in-downtown-vancouver.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4NRX8-fyp7ImA9WhRbGE4.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6479993879712017327</id><published>2011-04-04T14:53:00.001-07:00</published><updated>2012-02-09T17:56:34.157-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-09T17:56:34.157-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="web scraping rapidminer xpath web scrape rapid miner x-path" /><title>Web Scraping with RapidMiner and XPath</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6479993879712017327/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6479993879712017327?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6479993879712017327?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/4yeVQ0WhfE4/web-scraping-rapidminer-xpath-web.html" title="Web Scraping with RapidMiner and XPath" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/vKW5yd1eUpA/default.jpg" height="72" width="72" /><thr:total>2</thr:total><content type="html">In this video I show how to load 500 html files from a previous web crawl, loop through each of them, and use XPath to grab values from each page, and put them in a data table for later analysis. 




Part 1: Web scraping with Google Spreadsheets and XPathPart 2: Web Crawling with RapidMinerPart 3: Web Scraping with RapidMiner and Xpath
Part 4: Web Scraping AJAX Pages 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/q_VlgBYxw2E-QxslBu3HJ2fdbS8/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/q_VlgBYxw2E-QxslBu3HJ2fdbS8/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/q_VlgBYxw2E-QxslBu3HJ2fdbS8/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/q_VlgBYxw2E-QxslBu3HJ2fdbS8/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=4yeVQ0WhfE4:SvsDIsyPFpg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=4yeVQ0WhfE4:SvsDIsyPFpg:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=4yeVQ0WhfE4:SvsDIsyPFpg:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=4yeVQ0WhfE4:SvsDIsyPFpg:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=4yeVQ0WhfE4:SvsDIsyPFpg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/4yeVQ0WhfE4" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4MRHkyeSp7ImA9WhRbGE4.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-4947263372591097903</id><published>2011-04-04T14:52:00.000-07:00</published><updated>2012-02-09T17:56:25.791-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-09T17:56:25.791-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="robots.txt" /><category scheme="http://www.blogger.com/atom/ns#" term="web crawling" /><category scheme="http://www.blogger.com/atom/ns#" term="rapidminer" /><category scheme="http://www.blogger.com/atom/ns#" term="rapid miner" /><category scheme="http://www.blogger.com/atom/ns#" term="web crawl" /><category scheme="http://www.blogger.com/atom/ns#" term="crawling rules" /><title>Web Crawling with RapidMiner</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/4947263372591097903/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html#comment-form" title="17 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4947263372591097903?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/4947263372591097903?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/ot2UUXLLkg0/rapidminer-web-crawling-rapid-miner-web.html" title="Web Crawling with RapidMiner" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/zMyrw0HsREg/default.jpg" height="72" width="72" /><thr:total>17</thr:total><content type="html">Here is part 2 of my series of videos on web crawling with RapidMiner. In this video I show how to crawl about 500 pages from a site, and discuss user agents, crawling rules, and robot exclusion files.




Part 1: Web scraping with Google Spreadsheets and XPathPart 2: Web Crawling with RapidMinerPart 3: Web Scraping with RapidMiner and Xpath
Part 4: Web Scraping AJAX Pages 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/1SWV1fB4wvlxs_ZQj8eqXijE3eE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1SWV1fB4wvlxs_ZQj8eqXijE3eE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/1SWV1fB4wvlxs_ZQj8eqXijE3eE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1SWV1fB4wvlxs_ZQj8eqXijE3eE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ot2UUXLLkg0:VaW60Yv85AY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ot2UUXLLkg0:VaW60Yv85AY:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ot2UUXLLkg0:VaW60Yv85AY:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=ot2UUXLLkg0:VaW60Yv85AY:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=ot2UUXLLkg0:VaW60Yv85AY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/ot2UUXLLkg0" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4HQngzeyp7ImA9WhZREE8.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6883121452003438732</id><published>2011-04-03T15:13:00.000-07:00</published><updated>2011-04-05T10:18:53.683-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-04-05T10:18:53.683-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="google docs spreadsheets" /><category scheme="http://www.blogger.com/atom/ns#" term="x-path" /><category scheme="http://www.blogger.com/atom/ns#" term="google spreadsheets" /><category scheme="http://www.blogger.com/atom/ns#" term="xpath" /><title>More X-Path Goodness</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6883121452003438732/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/04/more-x-path-goodness.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6883121452003438732?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6883121452003438732?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/KdN4hiMAEiM/more-x-path-goodness.html" title="More X-Path Goodness" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><content type="html">Got a RapidMiner crawling/scraping video coming up, but for now, here are some more X-Path ideas to play with:

//*
return all nodes

//*[contains(., 'Search Text')]
return all nodes that contain Search Text in their content. Case sensitive search.

//div[@id='div1']/following-sibling::*
return the next sibling of a specific node (not sure if this works in RapidMiner)

//div[@id='div1']/../

&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/qCVhMKBinG5S6Cy2x7HrTSBYcb0/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/qCVhMKBinG5S6Cy2x7HrTSBYcb0/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/qCVhMKBinG5S6Cy2x7HrTSBYcb0/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/qCVhMKBinG5S6Cy2x7HrTSBYcb0/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=KdN4hiMAEiM:Bnfv01jiiHk:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=KdN4hiMAEiM:Bnfv01jiiHk:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=KdN4hiMAEiM:Bnfv01jiiHk:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=KdN4hiMAEiM:Bnfv01jiiHk:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=KdN4hiMAEiM:Bnfv01jiiHk:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/KdN4hiMAEiM" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/04/more-x-path-goodness.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4DR3kyfSp7ImA9WhRbGE4.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-3573461486977156623</id><published>2011-02-27T17:36:00.000-08:00</published><updated>2012-02-09T17:56:16.795-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-09T17:56:16.795-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="web scraping" /><category scheme="http://www.blogger.com/atom/ns#" term="google docs spreadsheets" /><category scheme="http://www.blogger.com/atom/ns#" term="importxml" /><category scheme="http://www.blogger.com/atom/ns#" term="xpath" /><title>Web scraping with Google Spreadsheets and XPath</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/3573461486977156623/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/02/how-to-web-scraping-xpath-html-google.html#comment-form" title="16 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3573461486977156623?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3573461486977156623?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/zvOYHkNrHV4/how-to-web-scraping-xpath-html-google.html" title="Web scraping with Google Spreadsheets and XPath" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>16</thr:total><content type="html">This is part one of a series of video tutorials on web scraping and web crawling.

In this first video, I show how to grab parts of a web page (scraping) using Google Docs Spreadsheets and XPath.

Google Spreadsheets has a nice function called importXML which will read in a web page. You can then apply an XPath to that page, to grab various parts of it, such as one particular value, or all of the
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/yqNz0jzoZgiCjlHwsPwUo4ottb4/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/yqNz0jzoZgiCjlHwsPwUo4ottb4/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/yqNz0jzoZgiCjlHwsPwUo4ottb4/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/yqNz0jzoZgiCjlHwsPwUo4ottb4/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=zvOYHkNrHV4:g5ZF7F6KRW4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=zvOYHkNrHV4:g5ZF7F6KRW4:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=zvOYHkNrHV4:g5ZF7F6KRW4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=zvOYHkNrHV4:g5ZF7F6KRW4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=zvOYHkNrHV4:g5ZF7F6KRW4:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/zvOYHkNrHV4" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/02/how-to-web-scraping-xpath-html-google.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0cCSHc6eCp7ImA9Wx9XFE8.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-3703522330207871009</id><published>2011-01-07T10:31:00.000-08:00</published><updated>2011-01-07T10:31:09.910-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-01-07T10:31:09.910-08:00</app:edited><title>A Data Explosion Remakes Retailing</title><link rel="related" href="http://www.nytimes.com/2010/01/03/business/03unboxed.html?hpw" title="A Data Explosion Remakes Retailing" /><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/3703522330207871009/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/01/data-explosion-remakes-retailing.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3703522330207871009?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/3703522330207871009?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/BcxA4IuXErk/data-explosion-remakes-retailing.html" title="A Data Explosion Remakes Retailing" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total><content type="html">From the New York Times:

"Retailing is emerging as a real-world incubator for testing how computer firepower and smart software can be applied to social science — in this case, how variables like household economics and human behavior affect shopping."
http://www.nytimes.com/2010/01/03/business/03unboxed.html
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/M-viWPaXjmwGe8tHAss9Rku-lEg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M-viWPaXjmwGe8tHAss9Rku-lEg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/M-viWPaXjmwGe8tHAss9Rku-lEg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M-viWPaXjmwGe8tHAss9Rku-lEg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BcxA4IuXErk:LkrHWMzzH8A:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BcxA4IuXErk:LkrHWMzzH8A:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BcxA4IuXErk:LkrHWMzzH8A:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=BcxA4IuXErk:LkrHWMzzH8A:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BcxA4IuXErk:LkrHWMzzH8A:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/BcxA4IuXErk" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/01/data-explosion-remakes-retailing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUUCQnYzeCp7ImA9Wx9XEUo.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-5882826172891170860</id><published>2011-01-03T19:26:00.000-08:00</published><updated>2011-01-04T13:41:03.880-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-01-04T13:41:03.880-08:00</app:edited><title>Computers That Trade on the News</title><link rel="related" href="http://www.nytimes.com/2010/12/23/business/23trading.html?_r=3" title="Computers That Trade on the News" /><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/5882826172891170860/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2011/01/computers-that-trade-on-news.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5882826172891170860?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5882826172891170860?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/qBa-Ujwb68E/computers-that-trade-on-news.html" title="Computers That Trade on the News" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><content type="html">I missed this one around Christmas time...must have been the drive up to Prince George.

In the NY Times:

The number-crunchers on Wall Street are starting to crunch something else: the news.

Math-loving traders are using powerful computers to speed-read news reports, editorials, company Web sites, blog posts and even Twitter messages — and then letting the machines decide what it all means for 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/bBeJF9gOkzCh98le6pCtR7G3qf8/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/bBeJF9gOkzCh98le6pCtR7G3qf8/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/bBeJF9gOkzCh98le6pCtR7G3qf8/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/bBeJF9gOkzCh98le6pCtR7G3qf8/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=qBa-Ujwb68E:yeuxhMVRJog:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=qBa-Ujwb68E:yeuxhMVRJog:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=qBa-Ujwb68E:yeuxhMVRJog:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=qBa-Ujwb68E:yeuxhMVRJog:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=qBa-Ujwb68E:yeuxhMVRJog:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/qBa-Ujwb68E" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2011/01/computers-that-trade-on-news.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEECQ38-eyp7ImA9Wx9QFUU.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-6060832155594508169</id><published>2010-12-27T17:22:00.000-08:00</published><updated>2010-12-28T17:37:42.153-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-12-28T17:37:42.153-08:00</app:edited><title>10,000 views on my Youtube videos, 5,000 on my blog. Thanks everyone!</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/6060832155594508169/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2010/12/10000-views-on-my-youtube-videos-5000.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6060832155594508169?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/6060832155594508169?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/nOlIzCwX9no/10000-views-on-my-youtube-videos-5000.html" title="10,000 views on my Youtube videos, 5,000 on my blog. Thanks everyone!" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><content type="html">Blown away with the number of visitors!

While I'm here, here's a good article from Wired magazine about AI:

The A.I. Revolution Is On
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/xt_a7j15T_fvTHTczOzspCDYNAw/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/xt_a7j15T_fvTHTczOzspCDYNAw/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/xt_a7j15T_fvTHTczOzspCDYNAw/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/xt_a7j15T_fvTHTczOzspCDYNAw/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=nOlIzCwX9no:UKD2xBx9sXQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=nOlIzCwX9no:UKD2xBx9sXQ:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=nOlIzCwX9no:UKD2xBx9sXQ:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=nOlIzCwX9no:UKD2xBx9sXQ:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=nOlIzCwX9no:UKD2xBx9sXQ:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/nOlIzCwX9no" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2010/12/10000-views-on-my-youtube-videos-5000.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEINQ3wzeSp7ImA9Wx9RFU8.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-8817141448242410462</id><published>2010-12-16T11:09:00.000-08:00</published><updated>2010-12-16T11:09:52.281-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-12-16T11:09:52.281-08:00</app:edited><title>Next video series: Web crawling and scraping</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/8817141448242410462/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2010/12/next-video-series-web-crawling-and.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/8817141448242410462?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/8817141448242410462?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/rFO13UHNnjM/next-video-series-web-crawling-and.html" title="Next video series: Web crawling and scraping" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><content type="html">I'll be working on a video series on web crawling and scraping over Christmas, for release at the end of December or the first week of January at the latest. 

Web crawling and social network analysis were neck and neck on the poll, with social slightly ahead, but I am going to do web crawling first as I'm working on a web crawling project, so it will be fresher in my mind.
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/RfeVBdO4VLiwesJxastB1n5XgBo/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/RfeVBdO4VLiwesJxastB1n5XgBo/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/RfeVBdO4VLiwesJxastB1n5XgBo/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/RfeVBdO4VLiwesJxastB1n5XgBo/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=rFO13UHNnjM:DELZK9sJ0E4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=rFO13UHNnjM:DELZK9sJ0E4:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=rFO13UHNnjM:DELZK9sJ0E4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=rFO13UHNnjM:DELZK9sJ0E4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=rFO13UHNnjM:DELZK9sJ0E4:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/rFO13UHNnjM" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2010/12/next-video-series-web-crawling-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0ACQns6eip7ImA9Wx9RE0o.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-5834785606712444333</id><published>2010-12-14T16:09:00.000-08:00</published><updated>2010-12-14T16:09:23.512-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-12-14T16:09:23.512-08:00</app:edited><title>How to Filter By Value in RapidMiner</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/5834785606712444333/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2010/12/how-to-filter-by-value-in-rapidminer.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5834785606712444333?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/5834785606712444333?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/BBzN_eGq9JI/how-to-filter-by-value-in-rapidminer.html" title="How to Filter By Value in RapidMiner" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">Use the Filter Examples operator on an exampleset (Data Transformation &amp;gt; Filtering)

set condition class to attribute_value_filter

set parameter string like attribute=value

example:

Category=healthcare

or

Category=customer service|healthcare

Both attribute and value are case-sensitive
Spaces are allowed
Use the | character for the "or" operator.

&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/7N5vg_jYHpgLeBkO5jMfKNlB7o4/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/7N5vg_jYHpgLeBkO5jMfKNlB7o4/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/7N5vg_jYHpgLeBkO5jMfKNlB7o4/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/7N5vg_jYHpgLeBkO5jMfKNlB7o4/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BBzN_eGq9JI:dues6hePPto:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BBzN_eGq9JI:dues6hePPto:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BBzN_eGq9JI:dues6hePPto:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=BBzN_eGq9JI:dues6hePPto:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=BBzN_eGq9JI:dues6hePPto:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/BBzN_eGq9JI" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2010/12/how-to-filter-by-value-in-rapidminer.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0UMRHs5fCp7ImA9Wx9RE0s.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-18800208021422659</id><published>2010-12-14T15:25:00.000-08:00</published><updated>2010-12-14T15:28:05.524-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-12-14T15:28:05.524-08:00</app:edited><title>Custom stemming dictionary</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/18800208021422659/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2010/12/custom-stemming-dictionary.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/18800208021422659?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/18800208021422659?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/K0cN_1S8ajY/custom-stemming-dictionary.html" title="Custom stemming dictionary" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">You can create your own stemming dictionary in RapidMiner. 

Add the Text Processing -&amp;gt; Stemming -&amp;gt; Stem (Dictionary) operator, and choose your dictionary file (plain text). 

Your format should be like this:

stem:inflection
stem:inflection

example:

fish:fished

will turn fished into fish.

You can also use wildcards:

fish:fish.*

will turn fished, fishes, fishing or anything beginning with 
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/XOdBL5aGizMYkurM5oyuDuq5ewg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/XOdBL5aGizMYkurM5oyuDuq5ewg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/XOdBL5aGizMYkurM5oyuDuq5ewg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/XOdBL5aGizMYkurM5oyuDuq5ewg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=K0cN_1S8ajY:0KXWLjl9ihU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=K0cN_1S8ajY:0KXWLjl9ihU:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=K0cN_1S8ajY:0KXWLjl9ihU:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=K0cN_1S8ajY:0KXWLjl9ihU:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=K0cN_1S8ajY:0KXWLjl9ihU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/K0cN_1S8ajY" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2010/12/custom-stemming-dictionary.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0QNQXc9cSp7ImA9Wx9RE0s.&quot;"><id>tag:blogger.com,1999:blog-2523819181563716059.post-7263068348296461187</id><published>2010-12-14T15:09:00.000-08:00</published><updated>2010-12-14T15:29:50.969-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-12-14T15:29:50.969-08:00</app:edited><title>A regular expression to find "word A near word B" in RapidMiner</title><link rel="replies" type="application/atom+xml" href="http://vancouverdata.blogspot.com/feeds/7263068348296461187/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://vancouverdata.blogspot.com/2010/12/regular-expression-to-find-word-near.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/7263068348296461187?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2523819181563716059/posts/default/7263068348296461187?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/VancouverData/~3/k6rmmOIgLV8/regular-expression-to-find-word-near.html" title="A regular expression to find &quot;word A near word B&quot; in RapidMiner" /><author><name>Neil McGuigan</name><uri>http://www.blogger.com/profile/14122981831780837323</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><content type="html">You can use the Text Processing-&amp;gt;Extract Information operator to match regular expressions. 

If you put the Extract Information operator inside a Process Documents operator, it will add a column to your dataset with the results of the match. Turn on "add meta information" option on the Process Documents operator. 

Here's a simple regular expression to find a word near another word:

(word1\W+(?
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/zZq6EHhcLnyL8pNmOpqPcD5M4rM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/zZq6EHhcLnyL8pNmOpqPcD5M4rM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/zZq6EHhcLnyL8pNmOpqPcD5M4rM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/zZq6EHhcLnyL8pNmOpqPcD5M4rM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=k6rmmOIgLV8:AvnFBq82W6I:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=k6rmmOIgLV8:AvnFBq82W6I:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=k6rmmOIgLV8:AvnFBq82W6I:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?i=k6rmmOIgLV8:AvnFBq82W6I:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/VancouverData?a=k6rmmOIgLV8:AvnFBq82W6I:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/VancouverData?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/VancouverData/~4/k6rmmOIgLV8" height="1" width="1"/&gt;</content><feedburner:origLink>http://vancouverdata.blogspot.com/2010/12/regular-expression-to-find-word-near.html</feedburner:origLink></entry></feed>

