<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
<channel>
	<title>Comments for ramblings of the village idiot</title>
	
	<link>http://www.natecarlson.com</link>
	<description>All geek, most of the time</description>
	<lastBuildDate>Tue, 09 Mar 2010 16:41:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/natecarlson-comments" /><feedburner:info uri="natecarlson-comments" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Comment on Some small wins.. by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/q9P11HggU5M/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Tue, 09 Mar 2010 16:41:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=892#comment-311688</guid>
		<description>I still believe in caffeine - just alternate sources of caffeine.  ;)</description>
		<content:encoded><![CDATA[<p>I still believe in caffeine &#8211; just alternate sources of caffeine.  ;)</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/q9P11HggU5M" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/03/09/some-small-wins/comment-page-1/#comment-311688</feedburner:origLink></item>
	<item>
		<title>Comment on Some small wins.. by CVelo</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/_y9O9t6wMPo/</link>
		<dc:creator>CVelo</dc:creator>
		<pubDate>Tue, 09 Mar 2010 16:40:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=892#comment-311687</guid>
		<description>Mucho congrats on the progress on the caffeine front. ;-)
I always struggle with that ;-)</description>
		<content:encoded><![CDATA[<p>Mucho congrats on the progress on the caffeine front. ;-)<br />
I always struggle with that ;-)</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/_y9O9t6wMPo" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/03/09/some-small-wins/comment-page-1/#comment-311687</feedburner:origLink></item>
	<item>
		<title>Comment on Help! Health insurance – Blue Cross PPO, or Blue Cross HSA? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/A5Pekpq1ZjQ/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Sun, 07 Mar 2010 01:40:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=271#comment-311686</guid>
		<description>Thanks for the opinion Roland.. I think.  ;)</description>
		<content:encoded><![CDATA[<p>Thanks for the opinion Roland.. I think.  ;)</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/A5Pekpq1ZjQ" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2009/11/06/help-blue-cross-ppo-or-blue-cross-hsa/comment-page-1/#comment-311686</feedburner:origLink></item>
	<item>
		<title>Comment on Help! Health insurance – Blue Cross PPO, or Blue Cross HSA? by Roland</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/vutAkyygwLc/</link>
		<dc:creator>Roland</dc:creator>
		<pubDate>Sat, 06 Mar 2010 13:21:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=271#comment-311685</guid>
		<description>nc you come off as pompous.  Your posts appear to be boastful of your current medical situation and ability to easily cover your deductible.  I'm sure that your arrogance rubs those around you the wrong way, perhaps you should ask them about it if you don't believe me.</description>
		<content:encoded><![CDATA[<p>nc you come off as pompous.  Your posts appear to be boastful of your current medical situation and ability to easily cover your deductible.  I&#8217;m sure that your arrogance rubs those around you the wrong way, perhaps you should ask them about it if you don&#8217;t believe me.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/vutAkyygwLc" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2009/11/06/help-blue-cross-ppo-or-blue-cross-hsa/comment-page-1/#comment-311685</feedburner:origLink></item>
	<item>
		<title>Comment on Types of VPN available on Linux by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/TjoIVTmdi4Q/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Wed, 03 Mar 2010 02:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=294#comment-311683</guid>
		<description>Indeed. Have you tried Openswan or Strongswan, or plain 'ol racoon?</description>
		<content:encoded><![CDATA[<p>Indeed. Have you tried Openswan or Strongswan, or plain &#8216;ol racoon?</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/TjoIVTmdi4Q" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2005/11/22/types-of-vpn-available-under-linux/comment-page-1/#comment-311683</feedburner:origLink></item>
	<item>
		<title>Comment on Types of VPN available on Linux by MadManBCN</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/LcA9Gk-eZwA/</link>
		<dc:creator>MadManBCN</dc:creator>
		<pubDate>Wed, 03 Mar 2010 02:37:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=294#comment-311682</guid>
		<description>The problem of FreeS/WAN are the development has stopped</description>
		<content:encoded><![CDATA[<p>The problem of FreeS/WAN are the development has stopped</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/LcA9Gk-eZwA" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2005/11/22/types-of-vpn-available-under-linux/comment-page-1/#comment-311682</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/ZAzRWLaPt2s/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Fri, 26 Feb 2010 15:47:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311681</guid>
		<description>Releases were current at the time of the issues.

For the rebooting box (DR node), it was running ak-2009.09.01.3.0 when we started working on it, and during the rebooting issue ak-2009.09.01.4.0 was released, and we upgraded to it. It didn't coincide with the reboots stopping.

The prod boxes were running ak-2009.09.01.3.0, and still are - we're basically not touching them until we get somewhere on the cases.

Thanks!</description>
		<content:encoded><![CDATA[<p>Releases were current at the time of the issues.</p>
<p>For the rebooting box (DR node), it was running ak-2009.09.01.3.0 when we started working on it, and during the rebooting issue ak-2009.09.01.4.0 was released, and we upgraded to it. It didn&#8217;t coincide with the reboots stopping.</p>
<p>The prod boxes were running ak-2009.09.01.3.0, and still are &#8211; we&#8217;re basically not touching them until we get somewhere on the cases.</p>
<p>Thanks!</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/ZAzRWLaPt2s" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311681</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Henkis</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/yPAKMO0UKqU/</link>
		<dc:creator>Henkis</dc:creator>
		<pubDate>Fri, 26 Feb 2010 14:37:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311680</guid>
		<description>Which Fishworks releases have you had these problems with, do you have all the issues on the latest (2009.Q3.4.1) release?</description>
		<content:encoded><![CDATA[<p>Which Fishworks releases have you had these problems with, do you have all the issues on the latest (2009.Q3.4.1) release?</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/yPAKMO0UKqU" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311680</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Bryan Cantrill</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/Jdho1MbldC4/</link>
		<dc:creator>Bryan Cantrill</dc:creator>
		<pubDate>Fri, 26 Feb 2010 05:11:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311678</guid>
		<description>Nate,

I am sympathetic to your plight, but you are suffering from a bad case of hindsight bias:  there is no reason to believe that a disk failure would induce system failure, and trust me that there is nothing in the logs that they were missing.  The missing piece of data -- and I feel I have said this several times over now -- was the console log that you didn't collect (not your fault, of course -- you weren't told to collect it).  Yes, "modern computers" need to have their console logged; how else is one supposed to debug failures in which no operating system dump is taken?  (And yes, such failures exist -- viz. yours.)  The best guess at the moment (and it's always going to remain a guess for the moment) is that the drive failure was inducing an HBA logic failure (that is, a failure in the HBA firmware itself), and that the HBA failure was in turn inducing both an operating system panic (which is -- or was -- our defined failure mode in such a case) and then (we hypothesize) a dump abort. (On the 7210 -- unlike the 7310 and 7410 -- we rely on the same HBA for both system disks and data disks.) We're guessing here, but your drive is making its way back to our team to see if we can reproduce this in house. One final note: because these HBA logic failures have been a real difficulty for us, we have added logic in our Q1 release (due out in the next few weeks) that will reset the IOC on failure.  So when we get your drive, we may well see the system recover now in a way it wouldn't previously; we'll let you know.</description>
		<content:encoded><![CDATA[<p>Nate,</p>
<p>I am sympathetic to your plight, but you are suffering from a bad case of hindsight bias:  there is no reason to believe that a disk failure would induce system failure, and trust me that there is nothing in the logs that they were missing.  The missing piece of data &#8212; and I feel I have said this several times over now &#8212; was the console log that you didn&#8217;t collect (not your fault, of course &#8212; you weren&#8217;t told to collect it).  Yes, &#8220;modern computers&#8221; need to have their console logged; how else is one supposed to debug failures in which no operating system dump is taken?  (And yes, such failures exist &#8212; viz. yours.)  The best guess at the moment (and it&#8217;s always going to remain a guess for the moment) is that the drive failure was inducing an HBA logic failure (that is, a failure in the HBA firmware itself), and that the HBA failure was in turn inducing both an operating system panic (which is &#8212; or was &#8212; our defined failure mode in such a case) and then (we hypothesize) a dump abort. (On the 7210 &#8212; unlike the 7310 and 7410 &#8212; we rely on the same HBA for both system disks and data disks.) We&#8217;re guessing here, but your drive is making its way back to our team to see if we can reproduce this in house. One final note: because these HBA logic failures have been a real difficulty for us, we have added logic in our Q1 release (due out in the next few weeks) that will reset the IOC on failure.  So when we get your drive, we may well see the system recover now in a way it wouldn&#8217;t previously; we&#8217;ll let you know.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/Jdho1MbldC4" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311678</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Bryan Cantrill</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/l6a0QWj4c2E/</link>
		<dc:creator>Bryan Cantrill</dc:creator>
		<pubDate>Fri, 26 Feb 2010 04:51:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311677</guid>
		<description>Jes,

First, sorry to hear that your experience has been rocky. And yes, all of the teams on the project -- support, development, test -- have been enormously overtaxed.  That, I'm afraid, is the curse of hypergrowth, which is exactly what we have experienced over the past year.  And we have experienced that hypergrowth because we were largely right about one critical point in the enterprise storage industry: customers are sick of paying high rents to a couple of companies.  What we attempted to do to address this was very ambitious: build enterprise-grade storage out of commodity parts.  Fourteen months later, I think we're finally approaching that goal -- but as you have seen, it's been a very rocky path (much rockier than we naively thought it would be).  All of which isn't meant to excuse any problems you've had with the product, but rather to give you an honest assessment of how we got here.

Now, that said:  I am quite troubled by the corruption that you believe that you are seeing. Can you please send me a case number (my first name dot my last name at sun dot com) so I can unravel what happened here?  (Jim can attest that my offer is in earnest.)

Finally, there actually &lt;i&gt;is&lt;/i&gt; a mechanism to provide holistic feedback about the product, but you could be forgiven for not noticing it:  in the BUI, there is a feedback link at the bottom of every page.  Mail sent via that link goes straight to the development team, and no one else -- and we have responded to everyone who has gotten in touch with us that way.  (And we've gotten some very useful feedback that way.) We provided that exactly so people like you could unload: we're technical and we'll listen to you -- so please don't hesitate to send us that A4 worth of feedback, with our thanks in advance for doing us the service.</description>
		<content:encoded><![CDATA[<p>Jes,</p>
<p>First, sorry to hear that your experience has been rocky. And yes, all of the teams on the project &#8212; support, development, test &#8212; have been enormously overtaxed.  That, I&#8217;m afraid, is the curse of hypergrowth, which is exactly what we have experienced over the past year.  And we have experienced that hypergrowth because we were largely right about one critical point in the enterprise storage industry: customers are sick of paying high rents to a couple of companies.  What we attempted to do to address this was very ambitious: build enterprise-grade storage out of commodity parts.  Fourteen months later, I think we&#8217;re finally approaching that goal &#8212; but as you have seen, it&#8217;s been a very rocky path (much rockier than we naively thought it would be).  All of which isn&#8217;t meant to excuse any problems you&#8217;ve had with the product, but rather to give you an honest assessment of how we got here.</p>
<p>Now, that said:  I am quite troubled by the corruption that you believe that you are seeing. Can you please send me a case number (my first name dot my last name at sun dot com) so I can unravel what happened here?  (Jim can attest that my offer is in earnest.)</p>
<p>Finally, there actually <i>is</i> a mechanism to provide holistic feedback about the product, but you could be forgiven for not noticing it:  in the BUI, there is a feedback link at the bottom of every page.  Mail sent via that link goes straight to the development team, and no one else &#8212; and we have responded to everyone who has gotten in touch with us that way.  (And we&#8217;ve gotten some very useful feedback that way.) We provided that exactly so people like you could unload: we&#8217;re technical and we&#8217;ll listen to you &#8212; so please don&#8217;t hesitate to send us that A4 worth of feedback, with our thanks in advance for doing us the service.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/l6a0QWj4c2E" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311677</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/9y9Rpg4d7xI/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Thu, 25 Feb 2010 23:36:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311676</guid>
		<description>I've sent them to the support team, who is reviewing him.. here is the relevant section though:

&lt;blockquote&gt;
7901   Tue Jan 19 20:29:56 2010  Audit     Log       minor
       root : Open Session : object = /session/type : value = shell : success
7900   Tue Jan 19 15:32:52 2010  System    Log       critical
       upgrade to version unknown failed
7899   Tue Jan 19 15:32:46 2010  IPMI      Log       critical
       ID =  15c : pre-init timestamp : System ACPI Power State : sys.acpi : S5/ G2: soft-off
7898   Tue Jan 19 15:32:41 2010  Audit     Log       minor
       KCS Command : Set ACPI Power State : system power state = no change : device power state = no change : success
7897   Tue Jan 19 15:32:41 2010  Audit     Log       minor
       KCS Command : Chassis Control : action = power down : success
7896   Tue Jan 19 15:04:52 2010  Audit     Log       minor
       KCS Command : Set SEL Time : time value = 0x4B55CA14 : success
&lt;/blockquote&gt;

The item at 15:04 was the last "normal" message (seems those occur about once per hour?); the next few messages almost make it sound like the system believes it was powered down for an update which failed? The shell session is when I logged in to boot it back up.

Other machine -- I asked if it's possible that it's a bad HBA or a midplane issue (as support had previously suspected but never replaced), but the senior support reps on the phones indicated that it was definitively a drive, and not in another part of the system. No console on the system - not used to needing console logging on modern computers.  ;)

One thing worth noting - I had no idea that the ILOM logs wouldn't be bundled with the support pack (I'd assume you have a way to retrieve them from within the OS), so I've never really looked at them beyond a glancing view, and only on the one that was rebooting itself daily. One interesting thing now that I am looking, is that the system that was rebooting has -far- more events in the SP log than the other two. It's logging things similar to the following on a seemingly regular basis - appears to be 15 minutes after the hour, ever hour on a quick lookover:

&lt;blockquote&gt;
27537  Thu Feb 25 17:15:38 2010  Audit     Log       minor
       KCS Command : OEM Set LED Mode : device address = 0x18 : LED = 0x3 : controller's address = 0x20 : HW info = 0x3 : mode = 0x0 : force = false : role = 0x41 : success
27536  Thu Feb 25 17:15:38 2010  Audit     Log       minor
       KCS Command : OEM Set LED Mode : device address = 0x18 : LED = 0x5 : controller's address = 0x20 : HW info = 0x5 : mode = 0x0 : force = false : role = 0x41 : success
27535  Thu Feb 25 17:15:38 2010  Audit     Log       minor
       KCS Command : OEM Set LED Mode : device address = 0x2C : LED = 0x2 : controller's address = 0x20 : HW info = 0x2 : mode = 0x0 : force = false : role = 0x41 : success
&lt;/blockquote&gt;

I see events like this on the other two systems, but extremely infrequently (ie - days between them.)

Regarding my comment on log analysis - that question came up because it was taking the engineers so long to get through the support packs.. I asked if the support reps had any tools to strip out extraneous information from the logs for them, and just leave the "unusual" stuff - the answer from the support manager was no, which is what surprised me. There's a /ton/ of data there, and having to strip out the important parts by hand is time consuming! As I mentioned, the drive errors were in each log that was sent, but were missed by the engineers reviewing them - a proper log analysis program would certainly help with that.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve sent them to the support team, who is reviewing him.. here is the relevant section though:</p>
<blockquote><p>
7901   Tue Jan 19 20:29:56 2010  Audit     Log       minor<br />
       root : Open Session : object = /session/type : value = shell : success<br />
7900   Tue Jan 19 15:32:52 2010  System    Log       critical<br />
       upgrade to version unknown failed<br />
7899   Tue Jan 19 15:32:46 2010  IPMI      Log       critical<br />
       ID =  15c : pre-init timestamp : System ACPI Power State : sys.acpi : S5/ G2: soft-off<br />
7898   Tue Jan 19 15:32:41 2010  Audit     Log       minor<br />
       KCS Command : Set ACPI Power State : system power state = no change : device power state = no change : success<br />
7897   Tue Jan 19 15:32:41 2010  Audit     Log       minor<br />
       KCS Command : Chassis Control : action = power down : success<br />
7896   Tue Jan 19 15:04:52 2010  Audit     Log       minor<br />
       KCS Command : Set SEL Time : time value = 0&#215;4B55CA14 : success
</p></blockquote>
<p>The item at 15:04 was the last &#8220;normal&#8221; message (seems those occur about once per hour?); the next few messages almost make it sound like the system believes it was powered down for an update which failed? The shell session is when I logged in to boot it back up.</p>
<p>Other machine &#8212; I asked if it&#8217;s possible that it&#8217;s a bad HBA or a midplane issue (as support had previously suspected but never replaced), but the senior support reps on the phones indicated that it was definitively a drive, and not in another part of the system. No console on the system &#8211; not used to needing console logging on modern computers.  ;)</p>
<p>One thing worth noting &#8211; I had no idea that the ILOM logs wouldn&#8217;t be bundled with the support pack (I&#8217;d assume you have a way to retrieve them from within the OS), so I&#8217;ve never really looked at them beyond a glancing view, and only on the one that was rebooting itself daily. One interesting thing now that I am looking, is that the system that was rebooting has -far- more events in the SP log than the other two. It&#8217;s logging things similar to the following on a seemingly regular basis &#8211; appears to be 15 minutes after the hour, ever hour on a quick lookover:</p>
<blockquote><p>
27537  Thu Feb 25 17:15:38 2010  Audit     Log       minor<br />
       KCS Command : OEM Set LED Mode : device address = 0&#215;18 : LED = 0&#215;3 : controller&#8217;s address = 0&#215;20 : HW info = 0&#215;3 : mode = 0&#215;0 : force = false : role = 0&#215;41 : success<br />
27536  Thu Feb 25 17:15:38 2010  Audit     Log       minor<br />
       KCS Command : OEM Set LED Mode : device address = 0&#215;18 : LED = 0&#215;5 : controller&#8217;s address = 0&#215;20 : HW info = 0&#215;5 : mode = 0&#215;0 : force = false : role = 0&#215;41 : success<br />
27535  Thu Feb 25 17:15:38 2010  Audit     Log       minor<br />
       KCS Command : OEM Set LED Mode : device address = 0&#215;2C : LED = 0&#215;2 : controller&#8217;s address = 0&#215;20 : HW info = 0&#215;2 : mode = 0&#215;0 : force = false : role = 0&#215;41 : success
</p></blockquote>
<p>I see events like this on the other two systems, but extremely infrequently (ie &#8211; days between them.)</p>
<p>Regarding my comment on log analysis &#8211; that question came up because it was taking the engineers so long to get through the support packs.. I asked if the support reps had any tools to strip out extraneous information from the logs for them, and just leave the &#8220;unusual&#8221; stuff &#8211; the answer from the support manager was no, which is what surprised me. There&#8217;s a /ton/ of data there, and having to strip out the important parts by hand is time consuming! As I mentioned, the drive errors were in each log that was sent, but were missed by the engineers reviewing them &#8211; a proper log analysis program would certainly help with that.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/9y9Rpg4d7xI" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311676</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/MBNHsErbhEc/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Thu, 25 Feb 2010 23:25:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311675</guid>
		<description>First of all, thanks for taking the time to comment!

Oh, you won't bore me at all.  ;)  I'm sorry to hear you are having issues too - I was really hoping we were the exception to the rule. I'm also hoping that some people who have had great success will post.

I'm also finding that it seems the support techs are over-busy -- seems like "squeaky wheel gets the grease" in my case as I'm now receiving quite prompt attention. I'm bummed to hear that EMEA doesn't send you directly to US-based techs when needed.. it makes sense to have local high-level support, but send things back to the people who are most familiar in extreme cases.

I'm also feeling like the product is still "beta", but it's not being sold that way.  ;(  As far as not getting influence on the dev team, have you posted your issues online somewhere? It seemed to help me.  ;)

ZFS corruption - that is extremely frightening. I'd love to hear details on what corruption you're seeing and exactly how you trigger it. As far as the logging aspect goes, that is *really* frightening - and unfortunately on par with what I'm finding.

Failover - the issue of not failing over because of service-level issues is something I've found to be very common on active/passive configurations like this.. it's a hard thing to get right, but I'd hope that would be something heavily tested and properly engineered before public release.

My title is meant to be tongue-in-cheek.. obviously the intentions were not to build a product that is disappointing, but that sure does seem to be the way that it's ended up.</description>
		<content:encoded><![CDATA[<p>First of all, thanks for taking the time to comment!</p>
<p>Oh, you won&#8217;t bore me at all.  ;)  I&#8217;m sorry to hear you are having issues too &#8211; I was really hoping we were the exception to the rule. I&#8217;m also hoping that some people who have had great success will post.</p>
<p>I&#8217;m also finding that it seems the support techs are over-busy &#8212; seems like &#8220;squeaky wheel gets the grease&#8221; in my case as I&#8217;m now receiving quite prompt attention. I&#8217;m bummed to hear that EMEA doesn&#8217;t send you directly to US-based techs when needed.. it makes sense to have local high-level support, but send things back to the people who are most familiar in extreme cases.</p>
<p>I&#8217;m also feeling like the product is still &#8220;beta&#8221;, but it&#8217;s not being sold that way.  ;(  As far as not getting influence on the dev team, have you posted your issues online somewhere? It seemed to help me.  ;)</p>
<p>ZFS corruption &#8211; that is extremely frightening. I&#8217;d love to hear details on what corruption you&#8217;re seeing and exactly how you trigger it. As far as the logging aspect goes, that is *really* frightening &#8211; and unfortunately on par with what I&#8217;m finding.</p>
<p>Failover &#8211; the issue of not failing over because of service-level issues is something I&#8217;ve found to be very common on active/passive configurations like this.. it&#8217;s a hard thing to get right, but I&#8217;d hope that would be something heavily tested and properly engineered before public release.</p>
<p>My title is meant to be tongue-in-cheek.. obviously the intentions were not to build a product that is disappointing, but that sure does seem to be the way that it&#8217;s ended up.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/MBNHsErbhEc" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311675</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Jes</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/Ypmvw1YhuiY/</link>
		<dc:creator>Jes</dc:creator>
		<pubDate>Thu, 25 Feb 2010 22:58:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311674</guid>
		<description>You raise two classes of issue: support and technical and I can sympathise with you on both counts as I've had very similar experiences.

Sun support has been absolutely abysmal because 1) the first two levels of support know nothing about the product (not necessarily their fault) but consequently they give incorrect diagnosis, bad advice, and send out engineers to fix things that aren't broken because they don't understand what has gone wrong; 2) there's only 3 (three) people in back line support for the whole of Europe, Middle-East and Africa (yes, just three!). They are so busy that there's no chance they will get time to work on your problem. And they are being unfairly given work that would be better done by the development team. Something that the developers could fix in an hour will take two weeks to investigate by someone not familiar with the source code, and as you can see they simply don't have that time to spend on each problem.

The product itself has been almost as abysmal, and I am sad to say this because I've been a fan of Sun technology. Not that the underlying components are bad: Solaris is great and ZFS is getting there. But the appliance software has so many bugs, even 14 months after release, that it's still only beta quality. Yet there's been no provision within Sun for customers to feed back a holistic view of their system and the support structure is totally inadequate for a new class of product such as the 7000 series. We've been so frustrated it's unbelievable, if only we could talk to someone technical that was willing to listen to us and had the ability to influence the development team. Our list of problems is so large that just the titles fill an A4 page; most of which are serious or very serious. People like Bryan prove that the engineers do care, but ...

For example we've had assurances that Sun will debug the ZFS corruption that occurs when a disk is replaced during a resilver but, 10 months later, the problem still exists and is easily repeatable. Would you trust any data to a system like that, or to a company which on the surface doesn't appear to take it seriously?

Not only that but when you suffer corruption the system almost completely fails to inform you. If you are looking hard you will find a one-line message in the BUI but there's no alert, no SNMP, no email, nothing, not even a permanent record in a log file of which what's been corrupted. And it gets worse, the system then forgets that any corruption has happened at all, leaving you with no message in the BUI, just a bunch of files which appear to be fine but when you access them you get corrupt data. In Sun terminology that's both data "corruption" and data "loss".

Another example of a problem: the Q3 update provided the missing functionality to backup and restore the appliance config, yet when you perform a restore it breaks the box so badly you have to boot into a previous firmware revision.

Yet another example: services can fail (including the appliance kit daemon itself) and the head won't fail-over to the other head (if you have a clustered pair of heads).

I could go on for hours about other serious problems but don't want to bore you ;-)

I was going to be charitable and say that your title is misleading ("designed to disappoint"), because on the whole the collection of technology inside the appliance is really good. Yet there are definitely aspects where the design simply hasn't been thought through properly (like the clustering and the reporting of corruption) so you're right.</description>
		<content:encoded><![CDATA[<p>You raise two classes of issue: support and technical and I can sympathise with you on both counts as I&#8217;ve had very similar experiences.</p>
<p>Sun support has been absolutely abysmal because 1) the first two levels of support know nothing about the product (not necessarily their fault) but consequently they give incorrect diagnosis, bad advice, and send out engineers to fix things that aren&#8217;t broken because they don&#8217;t understand what has gone wrong; 2) there&#8217;s only 3 (three) people in back line support for the whole of Europe, Middle-East and Africa (yes, just three!). They are so busy that there&#8217;s no chance they will get time to work on your problem. And they are being unfairly given work that would be better done by the development team. Something that the developers could fix in an hour will take two weeks to investigate by someone not familiar with the source code, and as you can see they simply don&#8217;t have that time to spend on each problem.</p>
<p>The product itself has been almost as abysmal, and I am sad to say this because I&#8217;ve been a fan of Sun technology. Not that the underlying components are bad: Solaris is great and ZFS is getting there. But the appliance software has so many bugs, even 14 months after release, that it&#8217;s still only beta quality. Yet there&#8217;s been no provision within Sun for customers to feed back a holistic view of their system and the support structure is totally inadequate for a new class of product such as the 7000 series. We&#8217;ve been so frustrated it&#8217;s unbelievable, if only we could talk to someone technical that was willing to listen to us and had the ability to influence the development team. Our list of problems is so large that just the titles fill an A4 page; most of which are serious or very serious. People like Bryan prove that the engineers do care, but &#8230;</p>
<p>For example we&#8217;ve had assurances that Sun will debug the ZFS corruption that occurs when a disk is replaced during a resilver but, 10 months later, the problem still exists and is easily repeatable. Would you trust any data to a system like that, or to a company which on the surface doesn&#8217;t appear to take it seriously?</p>
<p>Not only that but when you suffer corruption the system almost completely fails to inform you. If you are looking hard you will find a one-line message in the BUI but there&#8217;s no alert, no SNMP, no email, nothing, not even a permanent record in a log file of which what&#8217;s been corrupted. And it gets worse, the system then forgets that any corruption has happened at all, leaving you with no message in the BUI, just a bunch of files which appear to be fine but when you access them you get corrupt data. In Sun terminology that&#8217;s both data &#8220;corruption&#8221; and data &#8220;loss&#8221;.</p>
<p>Another example of a problem: the Q3 update provided the missing functionality to backup and restore the appliance config, yet when you perform a restore it breaks the box so badly you have to boot into a previous firmware revision.</p>
<p>Yet another example: services can fail (including the appliance kit daemon itself) and the head won&#8217;t fail-over to the other head (if you have a clustered pair of heads).</p>
<p>I could go on for hours about other serious problems but don&#8217;t want to bore you ;-)</p>
<p>I was going to be charitable and say that your title is misleading (&#8220;designed to disappoint&#8221;), because on the whole the collection of technology inside the appliance is really good. Yet there are definitely aspects where the design simply hasn&#8217;t been thought through properly (like the clustering and the reporting of corruption) so you&#8217;re right.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/Ypmvw1YhuiY" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311674</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Bryan Cantrill</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/p90QrDmzADE/</link>
		<dc:creator>Bryan Cantrill</dc:creator>
		<pubDate>Thu, 25 Feb 2010 01:11:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311670</guid>
		<description>The audit logs from the SP (which is what are needed here) are not available via IPMI; they have to be retrieved manually (i.e., by you).  If you want to send them to me, you certainly may, though I am not optimistic that they will contain a smoking gun.  (Unless, of course, someone did a "stop /SYS" on the SP -- which would be contained in the audit log.)

As for the other machine:  you may indeed have a bad HBA -- which would necessitate replacing the HBA, not the machine.  But we would need a console log to know what happened, and given that we don't seem to have that, it's going to be hard to know definitively.  I'm not sure what you mean about "no system in place to automatically identify anomalies"; I went through your logs and there's very little anomalous about them -- that's part of what makes this mysterious. The log that we need -- that of the SP console at the time of machine reset -- is not something that the appliance can gather for itself...</description>
		<content:encoded><![CDATA[<p>The audit logs from the SP (which is what are needed here) are not available via IPMI; they have to be retrieved manually (i.e., by you).  If you want to send them to me, you certainly may, though I am not optimistic that they will contain a smoking gun.  (Unless, of course, someone did a &#8220;stop /SYS&#8221; on the SP &#8212; which would be contained in the audit log.)</p>
<p>As for the other machine:  you may indeed have a bad HBA &#8212; which would necessitate replacing the HBA, not the machine.  But we would need a console log to know what happened, and given that we don&#8217;t seem to have that, it&#8217;s going to be hard to know definitively.  I&#8217;m not sure what you mean about &#8220;no system in place to automatically identify anomalies&#8221;; I went through your logs and there&#8217;s very little anomalous about them &#8212; that&#8217;s part of what makes this mysterious. The log that we need &#8212; that of the SP console at the time of machine reset &#8212; is not something that the appliance can gather for itself&#8230;</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/p90QrDmzADE" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311670</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/RyADUhuSd_k/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Wed, 24 Feb 2010 23:57:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311669</guid>
		<description>The one that we're wanting new hardware on is the DR unit that was rebooting itself every night - that's had multiple issues flagged by the system as definitive hardware issues, and no root cause in sight. I just had a conference call with the escalation team, and it appears that the DR unit was having LBA issues on a drive that caused itself to reboot, in ways that cannot yet be explained, and are unlike any other issue that's been seen before. To me, that says two things - 1) there is something unique about this system that is unlike any other one you have in the field and 2) if it's going to production it needs to be a new one.  ;)  (I should also note that these logs were in each system dump that was sent to Sun, but apparently there's no system in place to automatically identify anomalies in the logs to the support engineer, so they were missed by each tech that's looked at them up until a few days ago.)

Regarding the system you mentioned - the escalation team also said that it appeared to be a SP-level problem, but not that it was an explicit poweroff command - thanks for that info! They mentioned that they'd be looking at SP logs, but didn't ask me to provide them (I thought they already had them as part of the system dump?), but I'd be happy to grab those logs for you and anyone else that is interested in looking at it. Want me to fire them to the email address you've posted from?</description>
		<content:encoded><![CDATA[<p>The one that we&#8217;re wanting new hardware on is the DR unit that was rebooting itself every night &#8211; that&#8217;s had multiple issues flagged by the system as definitive hardware issues, and no root cause in sight. I just had a conference call with the escalation team, and it appears that the DR unit was having LBA issues on a drive that caused itself to reboot, in ways that cannot yet be explained, and are unlike any other issue that&#8217;s been seen before. To me, that says two things &#8211; 1) there is something unique about this system that is unlike any other one you have in the field and 2) if it&#8217;s going to production it needs to be a new one.  ;)  (I should also note that these logs were in each system dump that was sent to Sun, but apparently there&#8217;s no system in place to automatically identify anomalies in the logs to the support engineer, so they were missed by each tech that&#8217;s looked at them up until a few days ago.)</p>
<p>Regarding the system you mentioned &#8211; the escalation team also said that it appeared to be a SP-level problem, but not that it was an explicit poweroff command &#8211; thanks for that info! They mentioned that they&#8217;d be looking at SP logs, but didn&#8217;t ask me to provide them (I thought they already had them as part of the system dump?), but I&#8217;d be happy to grab those logs for you and anyone else that is interested in looking at it. Want me to fire them to the email address you&#8217;ve posted from?</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/RyADUhuSd_k" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311669</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Bryan Cantrill</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/oP3grNZ8rGs/</link>
		<dc:creator>Bryan Cantrill</dc:creator>
		<pubDate>Wed, 24 Feb 2010 23:15:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311668</guid>
		<description>Nate,

It's not clear (at all) that you need new hardware here -- we need to figure out why the machine is resetting, not throw hardware at it.  As for why the two units failed simultaneously, the only thing I can tell you is that I don't know.  You said that one of the nodes "turned itself off"; assuming that I'm looking at the right data (MD5 hash of the hostname is 3df2a064148a0dc90237d936ec52c8b7) on the right day and time (1/19, 15:32 GMT), it appears that this was the result of an explicit power off.  (Or rather, it is indistinguishable from an explicit power off.)  So at this point, we would need to see the entire audit log for the SP.  Unfortunately, there is not a way to get that via IPMI, so you would need to log into the SP, "cd /SP/logs/event/list" and then type "show".  This should at least tell us what was going on from the SP's perspective, and why/how it decided to power off the appliance.</description>
		<content:encoded><![CDATA[<p>Nate,</p>
<p>It&#8217;s not clear (at all) that you need new hardware here &#8212; we need to figure out why the machine is resetting, not throw hardware at it.  As for why the two units failed simultaneously, the only thing I can tell you is that I don&#8217;t know.  You said that one of the nodes &#8220;turned itself off&#8221;; assuming that I&#8217;m looking at the right data (MD5 hash of the hostname is 3df2a064148a0dc90237d936ec52c8b7) on the right day and time (1/19, 15:32 GMT), it appears that this was the result of an explicit power off.  (Or rather, it is indistinguishable from an explicit power off.)  So at this point, we would need to see the entire audit log for the SP.  Unfortunately, there is not a way to get that via IPMI, so you would need to log into the SP, &#8220;cd /SP/logs/event/list&#8221; and then type &#8220;show&#8221;.  This should at least tell us what was going on from the SP&#8217;s perspective, and why/how it decided to power off the appliance.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/oP3grNZ8rGs" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311668</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/0UB51GKN1Uw/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Wed, 24 Feb 2010 17:36:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311666</guid>
		<description>Makes a lot of sense. I've never been a huge SMB fan, or a huge NFS fan.. network filesystems suck.  ;P  Makes sense on your issue - hopefully Sun will be able to help you out.. I'd love to hear how it goes.</description>
		<content:encoded><![CDATA[<p>Makes a lot of sense. I&#8217;ve never been a huge SMB fan, or a huge NFS fan.. network filesystems suck.  ;P  Makes sense on your issue &#8211; hopefully Sun will be able to help you out.. I&#8217;d love to hear how it goes.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/0UB51GKN1Uw" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311666</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Jim</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/C2JiFYFMTtA/</link>
		<dc:creator>Jim</dc:creator>
		<pubDate>Wed, 24 Feb 2010 15:39:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311665</guid>
		<description>We've talked about using NFS and have had two things that have made us reluctant.  First, up until last year the only storage server we had was a Windows box, which while it could serve NFS, it was serving AFP, which worked just fine for us, other than the fact that the server was slow and management was a pain.  With the new Sun we started using Samaba/CIFS because we had some experience with doing our Mac work via Samba mounts, and we initially were really impressed with the Sun CIFS implementation; its handling of resource forks, for example, was done really well, especially as compared with the Linux CIFS servers we had messed with.  

And while resource forks have been factored out of our main workflow, knowing that they can be handled well if we need them is very comforting.

So while it is a poor excuse, to some degree, it's momentum.  We're in a bit of a holding pattern right now trying to decide what we want to do, depending on what Sun says on this issue.

We'd also feel a lot more comfortable if we could just know precisely what is causing this.  While I suspect it's this interaction, and heck, there's a chance it's not Sun's fault at all (perhaps the problem is in the Snow Leopard client), this is the kind of thing you really want to know for sure if you are going to trust it with your data, ya know?</description>
		<content:encoded><![CDATA[<p>We&#8217;ve talked about using NFS and have had two things that have made us reluctant.  First, up until last year the only storage server we had was a Windows box, which while it could serve NFS, it was serving AFP, which worked just fine for us, other than the fact that the server was slow and management was a pain.  With the new Sun we started using Samaba/CIFS because we had some experience with doing our Mac work via Samba mounts, and we initially were really impressed with the Sun CIFS implementation; its handling of resource forks, for example, was done really well, especially as compared with the Linux CIFS servers we had messed with.  </p>
<p>And while resource forks have been factored out of our main workflow, knowing that they can be handled well if we need them is very comforting.</p>
<p>So while it is a poor excuse, to some degree, it&#8217;s momentum.  We&#8217;re in a bit of a holding pattern right now trying to decide what we want to do, depending on what Sun says on this issue.</p>
<p>We&#8217;d also feel a lot more comfortable if we could just know precisely what is causing this.  While I suspect it&#8217;s this interaction, and heck, there&#8217;s a chance it&#8217;s not Sun&#8217;s fault at all (perhaps the problem is in the Snow Leopard client), this is the kind of thing you really want to know for sure if you are going to trust it with your data, ya know?</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/C2JiFYFMTtA" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311665</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by nc</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/NKlsQ4vqUuw/</link>
		<dc:creator>nc</dc:creator>
		<pubDate>Wed, 24 Feb 2010 15:18:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311664</guid>
		<description>Thanks for the clarification Jim! Yeah, Cifs scares the snot out of me, especially after hearing from our local Sun guy who's been trying to help us figure out how to make these things perform without SSD's that there is no way to disable the write caching to memory when using CIFS.

If your clients are OS X, any reason you can't use NFS? We have a team using an OpenSolaris box as a bulk storage server for their HD video editing needs on OS X via NFS, and it's worked out quite well.</description>
		<content:encoded><![CDATA[<p>Thanks for the clarification Jim! Yeah, Cifs scares the snot out of me, especially after hearing from our local Sun guy who&#8217;s been trying to help us figure out how to make these things perform without SSD&#8217;s that there is no way to disable the write caching to memory when using CIFS.</p>
<p>If your clients are OS X, any reason you can&#8217;t use NFS? We have a team using an OpenSolaris box as a bulk storage server for their HD video editing needs on OS X via NFS, and it&#8217;s worked out quite well.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/NKlsQ4vqUuw" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311664</feedburner:origLink></item>
	<item>
		<title>Comment on Sun’s Unified Storage 7210 – designed to disappoint? by Jim</title>
		<link>http://feedproxy.google.com/~r/natecarlson-comments/~3/YHibBpHpg5A/</link>
		<dc:creator>Jim</dc:creator>
		<pubDate>Wed, 24 Feb 2010 15:14:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.natecarlson.com/?p=839#comment-311663</guid>
		<description>Actually, I should have been much more specific, although the intent of my post was more to commiserate about the support than to hash out technical problems.  Obviously the "c" word is a pretty scary one, so I apologize for not being more specific.

I have no reason to think that our problems are related to ZFS.  I realize now that my comments probably implied that, and I apologize.

Our current issue seems likely to revolve around the CIFS server, although we aren't 100% its not an interaction between the client implementation (OS X 10.6) and the Sun CIFS server implementation.  We aren't seeing it from any other OS, but we also aren't seeing this when using a 10.6 client to connect to a Windows based or Linux based server.

We found this because we deal with a lot of moving around of filesets that range from ~750MB to 5GB, so we do a lot of automated checksums, and started to see repeatable failures.

Brian, I'll email you about this.  The very fact that you offered is much appreciated.</description>
		<content:encoded><![CDATA[<p>Actually, I should have been much more specific, although the intent of my post was more to commiserate about the support than to hash out technical problems.  Obviously the &#8220;c&#8221; word is a pretty scary one, so I apologize for not being more specific.</p>
<p>I have no reason to think that our problems are related to ZFS.  I realize now that my comments probably implied that, and I apologize.</p>
<p>Our current issue seems likely to revolve around the CIFS server, although we aren&#8217;t 100% its not an interaction between the client implementation (OS X 10.6) and the Sun CIFS server implementation.  We aren&#8217;t seeing it from any other OS, but we also aren&#8217;t seeing this when using a 10.6 client to connect to a Windows based or Linux based server.</p>
<p>We found this because we deal with a lot of moving around of filesets that range from ~750MB to 5GB, so we do a lot of automated checksums, and started to see repeatable failures.</p>
<p>Brian, I&#8217;ll email you about this.  The very fact that you offered is much appreciated.</p>
<img src="http://feeds.feedburner.com/~r/natecarlson-comments/~4/YHibBpHpg5A" height="1" width="1"/>]]></content:encoded>
	<feedburner:origLink>http://www.natecarlson.com/2010/02/23/sun-7210-designed-to-disappoint/comment-page-1/#comment-311663</feedburner:origLink></item>
</channel>
</rss><!-- This site's performance optimized by W3 Total Cache. Dramatically improve the speed and reliability of your blog!

Learn more about our WordPress Plugins: http://www.w3-edge.com/wordpress-plugins/

Minified using memcached (user agent is rejected)
Page Caching using memcached (request URI is rejected)
Database Caching 23/45 queries in 0.011 seconds using memcached

Served from: plesk-chi.chi.technicality.org @ 2010-03-12 15:13:53 -->
