<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:series="http://unfoldingneurons.com/" version="2.0">
<channel>
	<title>Comments for mgm technology blog</title>
	
	<link>http://blog.mgm-tp.com</link>
	<description>We discuss software innovation</description>
	<lastBuildDate>Wed, 01 Feb 2012 12:42:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MgmTechBlogComments" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="mgmtechblogcomments" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">MgmTechBlogComments</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Comment on Ultra-Performant Dynamic Websites with Varnish by Dr. Christian Winkler</title>
		<link>http://blog.mgm-tp.com/2012/01/varnish-web-cache/comment-page-1/#comment-3121</link>
		<dc:creator>Dr. Christian Winkler</dc:creator>
		<pubDate>Wed, 01 Feb 2012 12:42:30 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1059#comment-3121</guid>
		<description>Very good question, Roelof.

In fact regionalization via Postleitzahl is handled purely on the client side via a cookie which is evaluated in Javascript. This cookie contains only information about the address of the store and no other personalized information. It is removed in vcl_recv.

Regionalization of the product line and product attributes is handled differently by a separate cookie which contains the region of the user (there are only a few). This cookie is used by Varnish to cache separate versions of the page.</description>
		<content:encoded><![CDATA[<p>Very good question, Roelof.</p>
<p>In fact regionalization via Postleitzahl is handled purely on the client side via a cookie which is evaluated in Javascript. This cookie contains only information about the address of the store and no other personalized information. It is removed in vcl_recv.</p>
<p>Regionalization of the product line and product attributes is handled differently by a separate cookie which contains the region of the user (there are only a few). This cookie is used by Varnish to cache separate versions of the page.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Ultra-Performant Dynamic Websites with Varnish by Roelof</title>
		<link>http://blog.mgm-tp.com/2012/01/varnish-web-cache/comment-page-1/#comment-3120</link>
		<dc:creator>Roelof</dc:creator>
		<pubDate>Wed, 01 Feb 2012 12:18:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1059#comment-3120</guid>
		<description>Thanks. Nice article. I'm wondering how personalization with respect to "Postleitzahl" is handled. This still would leave cacheable pages with regional differentiation. Means not removing "Postleitzahl" cookie in vcl_recv? And this for a selection of pages?</description>
		<content:encoded><![CDATA[<p>Thanks. Nice article. I&#8217;m wondering how personalization with respect to &#8220;Postleitzahl&#8221; is handled. This still would leave cacheable pages with regional differentiation. Means not removing &#8220;Postleitzahl&#8221; cookie in vcl_recv? And this for a selection of pages?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-3097</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Tue, 17 Jan 2012 15:40:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-3097</guid>
		<description>We are using servers with a single Intel Xeon X5540 CPU, 8 to 16 GB RAM and 2 x 300GB harddrives.

These are quite low spec Hadoop servers. Our main bottleneck is the RAM. So we are in the process of updating the cluster from 8 GB to 16 GB.</description>
		<content:encoded><![CDATA[<p>We are using servers with a single Intel Xeon X5540 CPU, 8 to 16 GB RAM and 2 x 300GB harddrives.</p>
<p>These are quite low spec Hadoop servers. Our main bottleneck is the RAM. So we are in the process of updating the cluster from 8 GB to 16 GB.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Aryan</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-3096</link>
		<dc:creator>Aryan</dc:creator>
		<pubDate>Tue, 10 Jan 2012 09:20:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-3096</guid>
		<description>Hi Peter,

May i know the node preference you are using i.e, about the ram and disk space and any other spec.</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>May i know the node preference you are using i.e, about the ram and disk space and any other spec.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Getting the most out of LiquiBase by Tomek Kaczanowski</title>
		<link>http://blog.mgm-tp.com/2011/04/data-modeling-part3/comment-page-1/#comment-3095</link>
		<dc:creator>Tomek Kaczanowski</dc:creator>
		<pubDate>Mon, 09 Jan 2012 08:21:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=887#comment-3095</guid>
		<description>I'm in the middle of planning how to use Liquibase for our deployment process, and found this blog post interesting. Thank you for sharing!</description>
		<content:encoded><![CDATA[<p>I&#8217;m in the middle of planning how to use Liquibase for our deployment process, and found this blog post interesting. Thank you for sharing!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-3055</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Thu, 15 Dec 2011 10:04:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-3055</guid>
		<description>Hi Peter,

I am afraid I can not give you a good answer to your questions. Keeping an index of 30 billion documents in Solr is definitely a challange. I guess you will need to use something like Solr Cloud. It might or might not work out. This really depends on your index schema.

Only thing I can answer is, that Solr is able to index 10 million documents per hour. Our highest peak reached 26 million documents in one hour in a single index. Although the query time was really bad during this peak.

I would really recommend building a prototype with a single Solr index and try to find the limits of a single index for your indexing schema and data. From there you can try to scale through multiple index shards.

Regards,
  Peter</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>I am afraid I can not give you a good answer to your questions. Keeping an index of 30 billion documents in Solr is definitely a challange. I guess you will need to use something like Solr Cloud. It might or might not work out. This really depends on your index schema.</p>
<p>Only thing I can answer is, that Solr is able to index 10 million documents per hour. Our highest peak reached 26 million documents in one hour in a single index. Although the query time was really bad during this peak.</p>
<p>I would really recommend building a prototype with a single Solr index and try to find the limits of a single index for your indexing schema and data. From there you can try to scale through multiple index shards.</p>
<p>Regards,<br />
  Peter</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Peter Lu</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-3048</link>
		<dc:creator>Peter Lu</dc:creator>
		<pubDate>Wed, 14 Dec 2011 20:49:47 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-3048</guid>
		<description>Hi Peter,

Great Post ! Thank  you. 
I am looking to build an index for transactional data (Carts). The source data is in multiple Oracle DBs and, at peak, will get created at upto 10 million rows per hour.  

I need the index to a) be kept up to date at near real time, b) be able to query based on 10 attributes and return a Key and pointer to the original RDBMS instance and c) keep enough history data to amount to about 30 billion rows. 

It looks like Solr is 1 option but am concerned about manageability and the near real time updating. What are your thoughts on using an RDBMS table for the index ? Anything other option I should consider ?

Thank You,
Peter</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>Great Post ! Thank  you.<br />
I am looking to build an index for transactional data (Carts). The source data is in multiple Oracle DBs and, at peak, will get created at upto 10 million rows per hour.  </p>
<p>I need the index to a) be kept up to date at near real time, b) be able to query based on 10 attributes and return a Key and pointer to the original RDBMS instance and c) keep enough history data to amount to about 30 billion rows. </p>
<p>It looks like Solr is 1 option but am concerned about manageability and the near real time updating. What are your thoughts on using an RDBMS table for the index ? Anything other option I should consider ?</p>
<p>Thank You,<br />
Peter</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Test-driving XForms with Orbeon by Wade English</title>
		<link>http://blog.mgm-tp.com/2010/09/rethinking-web-forms-xforms-part2/comment-page-1/#comment-2905</link>
		<dc:creator>Wade English</dc:creator>
		<pubDate>Tue, 29 Nov 2011 16:43:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=673#comment-2905</guid>
		<description>Great post!  I'm interested in the custom JSP tag that you had to write for including Orbeon forms in your JSPs. Would it be possible to share the code?</description>
		<content:encoded><![CDATA[<p>Great post!  I&#8217;m interested in the custom JSP tag that you had to write for including Orbeon forms in your JSPs. Would it be possible to share the code?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Realtime Search for Hadoop by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/06/hadoop-log-management-part3/comment-page-1/#comment-2755</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Mon, 14 Nov 2011 10:39:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=572#comment-2755</guid>
		<description>Lucene can not work with HDFS directly. The index files need to be located on the local HDD. We store only index backups in the HDFS.</description>
		<content:encoded><![CDATA[<p>Lucene can not work with HDFS directly. The index files need to be located on the local HDD. We store only index backups in the HDFS.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Realtime Search for Hadoop by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/06/hadoop-log-management-part3/comment-page-1/#comment-2754</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Mon, 14 Nov 2011 10:38:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=572#comment-2754</guid>
		<description>I can try, if you have specific questions.</description>
		<content:encoded><![CDATA[<p>I can try, if you have specific questions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-2753</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Mon, 14 Nov 2011 10:36:33 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-2753</guid>
		<description>It depends on the size of your search index. If the index fits into one Core, I would recommend using a dedicated Solr-Server seperated from the Hadoop-Cluster.

If on the other hand the index is too large for a single core and you need a kind of sharding you might be able to reuse your cluster also for Lucene / Solr. But first you need to evaluate the use of your Hadoop Cluster. If the Cluster is also heavily used for Map/Reduce-Jobs, you will not have enough resources for Solr.

Bottom line: If your Hadoop cluster is primarily used for storage and has only a light Map/Reduce load, you can reuse it for running Solr. In all other cases you are better off with a seperate Solr Cluster.</description>
		<content:encoded><![CDATA[<p>It depends on the size of your search index. If the index fits into one Core, I would recommend using a dedicated Solr-Server seperated from the Hadoop-Cluster.</p>
<p>If on the other hand the index is too large for a single core and you need a kind of sharding you might be able to reuse your cluster also for Lucene / Solr. But first you need to evaluate the use of your Hadoop Cluster. If the Cluster is also heavily used for Map/Reduce-Jobs, you will not have enough resources for Solr.</p>
<p>Bottom line: If your Hadoop cluster is primarily used for storage and has only a light Map/Reduce load, you can reuse it for running Solr. In all other cases you are better off with a seperate Solr Cluster.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by Peter Dikant</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-2751</link>
		<dc:creator>Peter Dikant</dc:creator>
		<pubDate>Mon, 14 Nov 2011 10:30:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-2751</guid>
		<description>Hi Ken,

yes, we have evaluated Katta when we started research on this project. At that time we could not get it to run stable. Please keep in mind, that this was at the beginning of 2009. We have filed bug reports for some of the issues we had and they have been fixed, but we decided at that time, that we feel more comfortable with our own solution.

Best regards,
  Peter</description>
		<content:encoded><![CDATA[<p>Hi Ken,</p>
<p>yes, we have evaluated Katta when we started research on this project. At that time we could not get it to run stable. Please keep in mind, that this was at the beginning of 2009. We have filed bug reports for some of the issues we had and they have been fixed, but we decided at that time, that we feel more comfortable with our own solution.</p>
<p>Best regards,<br />
  Peter</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Managing distributed Solr Servers by steve pauley</title>
		<link>http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/comment-page-1/#comment-2742</link>
		<dc:creator>steve pauley</dc:creator>
		<pubDate>Wed, 09 Nov 2011 23:54:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=615#comment-2742</guid>
		<description>we are planning a 10 node hadoop cluster. We plan to use Luence/solr for search.  I understand that one creates a MR job to extract the data from Hadoop that users would want to search on and feed that data to Luence to index.   

What is the best way to setup Luence / Solr in a Hadoop env. 

1. Should the Luence / Solr have their own servers and storage? 
2. Is it best to have Luence / Solr implemented in the same Hadoop cluster -- machines in the same rack with the hadoop nodes

Do you have any exemplar architecture diagrams showing a logical or a physical implementation of Luence/Solr with Hadoop 

Thanks</description>
		<content:encoded><![CDATA[<p>we are planning a 10 node hadoop cluster. We plan to use Luence/solr for search.  I understand that one creates a MR job to extract the data from Hadoop that users would want to search on and feed that data to Luence to index.   </p>
<p>What is the best way to setup Luence / Solr in a Hadoop env. </p>
<p>1. Should the Luence / Solr have their own servers and storage?<br />
2. Is it best to have Luence / Solr implemented in the same Hadoop cluster &#8212; machines in the same rack with the hadoop nodes</p>
<p>Do you have any exemplar architecture diagrams showing a logical or a physical implementation of Luence/Solr with Hadoop </p>
<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why XForms might be a Winner by Ryan R</title>
		<link>http://blog.mgm-tp.com/2010/03/rethinking-web-forms-xforms-part1/comment-page-1/#comment-2706</link>
		<dc:creator>Ryan R</dc:creator>
		<pubDate>Wed, 02 Nov 2011 21:35:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=21#comment-2706</guid>
		<description>I agree that the HTML forms are too cumbersome, while the Java forms are too complicated.  Let's hope this bridges the gap.</description>
		<content:encoded><![CDATA[<p>I agree that the HTML forms are too cumbersome, while the Java forms are too complicated.  Let&#8217;s hope this bridges the gap.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Realtime Search for Hadoop by tiru</title>
		<link>http://blog.mgm-tp.com/2010/06/hadoop-log-management-part3/comment-page-1/#comment-2705</link>
		<dc:creator>tiru</dc:creator>
		<pubDate>Wed, 02 Nov 2011 05:59:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=572#comment-2705</guid>
		<description>Hi All,
How can i configure Lucene with Hadoop,
It could be great if any one provide Steps to configure lucene with Hadoop HDFS.

cheers
tiru</description>
		<content:encoded><![CDATA[<p>Hi All,<br />
How can i configure Lucene with Hadoop,<br />
It could be great if any one provide Steps to configure lucene with Hadoop HDFS.</p>
<p>cheers<br />
tiru</p>
]]></content:encoded>
	</item>
</channel>
</rss><!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (enhanced)
Database Caching 7/33 queries in 1.749 seconds using disk

Served from: blog.mgm-tp.com @ 2012-02-02 20:44:37 -->

