<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Tim Griffin</title>
	
	<link>http://www.timgriffin.com</link>
	<description>Developer &amp; search marketer</description>
	<lastBuildDate>Sat, 02 Jan 2010 00:32:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/timgriffin" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="timgriffin" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Tracking Google index performance with XML sitemaps</title>
		<link>http://www.timgriffin.com/google-index-performance-sitemaps-xml/</link>
		<comments>http://www.timgriffin.com/google-index-performance-sitemaps-xml/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 00:28:28 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Google Webmaster Tools]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.timgriffin.com/?p=132</guid>
		<description><![CDATA[I recently updated a script that generates sitemap XMLs for a web site I run.  It&#8217;s a local review business ditectory with hundreds of thousands of pages, but with only around 150k in the Google index.
Like any database driven site, those hundreds of thousands of pages can be boiled down to a few distinct [...]]]></description>
			<content:encoded><![CDATA[<p>I recently updated a script that generates sitemap XMLs for a web site I run.  It&#8217;s a local review business ditectory with hundreds of thousands of pages, but with only around 150k in the Google index.</p>
<p>Like any database driven site, those hundreds of thousands of pages can be boiled down to a few distinct templates, for me they are:</p>
<ol>
<li>Home page</li>
<li>&#8220;Article&#8221; pages (FAQs, online help, about us, contact us, etc.)</li>
<li>Locality page (e.g. best rated Sydney metro businesses)</li>
<li>Industry + locality page (e.g. Sydney hairdressers)</li>
<li>Business listing page (e.g. Toni &#038; Guy Bondi Beach)</li>
</ol>
<p>The bottom three templates (in reverse order) represent the most unique pages per template and make up 99% of organic traffic to the site.</p>
<p>Currently, my URL structure is pretty systematic, so using Google search filters such as  &#8220;site:abc.com inurl:business_listing&#8221; I am able to get total number of indexed pages on a template by template basis.</p>
<p>However, in the next few months I intend to improve some URLs from say <code>/business-listing/toni-guy-bondi-beach/12345-54321.html</code> to simply <code>/toni-guy-bondi-beach/</code> which will make make it impossible to track total indexed pages using my current method.</p>
<p>But, if you&#8217;re in the same position there is a solution, which so simple that I feel like a post dedicated to it is overkill.  Because it&#8217;s a database driven site, I store all the page aliases (e.g. &#8220;toni-guy-bondi-beach&#8221;) in my locality, industry and business listing database tables.</p>
<p>This allows me to update my script to separate each template into different sitemaps. Google Webmaster Tools will then show you the total # of URLs in the sitemap vs. the number indexed:</p>
<p><img src="http://www.timgriffin.com/wp-content/uploads/2010/01/Screen-shot-2010-01-02-at-10.37.57-AM.png" alt="" title="Google Webmaster Tools sitemaps screenshot" width="649" height="123" class="alignnone size-full wp-image-131" /></p>
<p>By dividing the numbers, I can easily see which templates aren&#8217;t performing as well and look to see if there is any factors causing it to be considered duplicate content or if it&#8217;s an internal linking issue, etc.</p>
<p>If you&#8217;re interested in learning more about large site indexation, there is an SEOmoz post by Rand on <a href="http://www.seomoz.org/blog/googles-indexation-cap">Google&#8217;s indexation cap</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.timgriffin.com/google-index-performance-sitemaps-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>301 redirects in a large site migration</title>
		<link>http://www.timgriffin.com/301-redirects-large-site-migration/</link>
		<comments>http://www.timgriffin.com/301-redirects-large-site-migration/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 14:17:32 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[htaccess]]></category>

		<guid isPermaLink="false">http://www.timgriffin.com/?p=117</guid>
		<description><![CDATA[Today I migrated one of my sites which has around 150k pages in the Google index and thought I&#8217;d share a relatively quick and easy way to check the migration went smoothly.
Migrating to a new platform or server is always a risky time for any site that relies on organic traffic.  There&#8217;s a big [...]]]></description>
			<content:encoded><![CDATA[<p>Today I migrated one of my sites which has around 150k pages in the Google index and thought I&#8217;d share a relatively quick and easy way to check the migration went smoothly.</p>
<p>Migrating to a new platform or server is always a risky time for any site that relies on organic traffic.  There&#8217;s a big risk of pages going missing and redirects not working properly.  </p>
<p>The new site featured a new design plus a different CMS platform but essentially had an identical URL structure, so ensuring existing URLs still worked was the primary goal.</p>
<p>I wanted to do the following on both the <strong>staging site and post-migration on the live site</strong>:</p>
<ol>
<li>Get the URLs indexed by search engines</li>
<li>Batch testing redirections</li>
</ol>
<h3>Indexed URLs</h3>
<p>On a database driven site with thousands of pages, it&#8217;s not always possible to get a complete list of possible URLs, so we need to prioritise the URLs that search engines are aware of.</p>
<p>For smaller sites (under 1,000 pages), <a href="http://gsitecrawler.com/">GSiteCrawler</a> does a reasonable job.  The downsides are that it puts unnecessary load on you web server and secondly I find it crashes for larger sites.</p>
<p>My preferred method is to get it from a search engine index.  Grabbing index data from the major engines can be a hassle.  Scraping the engines is cumbersome and it&#8217;s a hassle when you get thrown a <a href="http://www.captcha.net/">captcha</a>.</p>
<p>I prefer to use <a href="https://www.majesticseo.com/">Majestic SEO</a> which provides data from a smaller search engine they run.  It uses similar crawl algorithms to Google, so it&#8217;s going to be a very similar dataset, and best of all it&#8217;s free to use on your own site.</p>
<p>Once you&#8217;ve validated your site, go to Domain URLs > Download All and all the URLs you&#8217;ll need to redirect will be in the first column.</p>
<p>Note: I recommend against using the <a href="http://www.sitemaps.org">sitemap XML</a> as it&#8217;s likely to be an incomplete picture.</p>
<h3>Batch testing URLs</h3>
<p>When migrating a site, the kinds of errors you don&#8217;t want to see previously working URLs giving are 404 not found, 401 unauthorized and 500 internal server errors.</p>
<p>I was using a sub-domain on the staging site, so once I had my list of URLs all I needed to do was search and replace &#8220;http://www.&#8221; with &#8220;http://dev.&#8221; in Excel then get a good cross section and run it through an HTTP header checker.</p>
<p>I put through 500 URLs at a time through my own <a href="http://www.timgriffin.com/free-seo-tools/batch-http-header-check/">batch HTTP header checker</a> and fixed up any pesky 404s I found.</p>
<p>Post migration, I picked another set of URLs to test and again got positive results.  To be 100% sure, I will be logging into <a href="https://www.google.com/webmasters/tools/">Google Webmaster Tools</a> tomorrow morning to check for 404s.</p>
<p>Good luck with your site migration!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.timgriffin.com/301-redirects-large-site-migration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Analytics &amp; AdWords: 301s not passing gclid</title>
		<link>http://www.timgriffin.com/301s-not-passing-gclid/</link>
		<comments>http://www.timgriffin.com/301s-not-passing-gclid/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 08:43:00 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[AdWords]]></category>
		<category><![CDATA[SEM]]></category>
		<category><![CDATA[htaccess]]></category>

		<guid isPermaLink="false">http://www.timgriffin.com/?p=78</guid>
		<description><![CDATA[A good webmaster will implement 301 or 302 redirects to legacy URLs but it often results in paid search traffic being mis-categorised.  Whilst your destination URLs will still end up at the expected page, it often be tagged as organic traffic.
Specifically, when you link AdWords and Analytics, Google will append a gclid to destination [...]]]></description>
			<content:encoded><![CDATA[<p>A good webmaster will implement 301 or 302 redirects to legacy URLs but it often results in paid search traffic being mis-categorised.  Whilst your destination URLs will still end up at the expected page, it often be tagged as organic traffic.</p>
<p>Specifically, when you link AdWords and Analytics, Google will append a <code>gclid</code> to destination URLs which Google uses to match keywords, campaigns and other dimensions to give you richer data in GA.</p>
<div class="ui-state-default ui-corner-all" style="padding: 10px; margin-bottom: 10px;"><strong>e.g.</strong> <code>http://you.com/widgets/?gclid=CYC0uITG354CFYwtpAodiFBdJX</code></div>
<p>However, most webmasters won&#8217;t implement the right directives to pass on the <a href="http://en.wikipedia.org/wiki/Query_string">query string</a>, so the user clicks on the redirected URL, the following happens and the gclid is lost in the redirection.</p>
<h3>Are my redirects AdWords friendly?</h3>
<p>There&#8217;s a very simple way to check if your redirect is AdWords and Analytics friendly, simply try going to the old destination URL and append &#8220;?gclid=testing&#8221; on the end.  If it redirects to the new URL and keeps it on the end, you don&#8217;t need to be on the phone with your webmaster!</p>
<h3>Fixing with Apache &#038; .htaccess</h3>
<p>If the above tip didn&#8217;t pass on the gclid for you, the good news is there&#8217;s an easy way to fix this in Apache:</p>
<div class="ui-state-default ui-corner-all" style="padding: 10px; margin-bottom: 10px;"><code># incorrect<br />
RewriteRule ^widgets/?$ /new-widgets/</p>
<p># correct<br />
RewriteRule ^widgets/?$ /new-widgets/?%{QUERY_STRING}</code></div>
<h3>Further reading on .htaccess</h3>
<ul>
<li><a href="http://www.timgriffin.com/free-seo-tools/batch-http-header-check/">Batch HTTP header check</a></li>
<li><a href="http://www.askapache.com/htaccess/mod_rewrite-tips-and-tricks.html">Advanced .htaccess tips &#038; tricks</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.timgriffin.com/301s-not-passing-gclid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

