Archived posting to the Leica Users Group, 2009/07/31

[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]

Subject: [Leica] Seagate
From: reid at mejac.palo-alto.ca.us (Brian Reid)
Date: Fri, 31 Jul 2009 19:21:42 -0700
References: <200907311902.BSU39530@rg4.comporium.net> <20090731202331.TFS29503.eastrmmtao106.cox.net@eastrmimpo02.cox.net> <200907312232.BSU58581@rg4.comporium.net> <20090731225759.GN30571@jbm.org> <200907312304.BUB58320@rg5.comporium.net> <20090731235858.GP30571@jbm.org>

I get the sense that there are about half a dozen of us here on the LUG who 
are experienced IT professionals who have lived and breathed this 
disk/network/OS/reliability/make-it-work scenario for a long time.

I wonder if we could work collectively to come up with a few specific 
recommendations. I have spent about half of my professional career on the 
design and test of reliable systems, but since I spent several of those 
years working with NASA, I know that I am in the minor leagues as these 
things go. (Anytime you think you know a lot about reliability engineering 
or testing, go spend some time at  Johnson Space Center or Jet Propulsion 
Labs and you will change your mind.)

Based entirely on my own experience, without consulting these equally 
talented folks, I have a few comments, in no particular order.

* Reliability is a whole-system issue, not a component issue. You have to 
design under the assumption that components will fail, but it's always 
better to use components that fail less often. Remember the "for want of a 
nail the kingdom was lost" folk tale. A disk is only as reliable as the 
means that you have for getting data off of it. And unless you built it 
yourself, you are dependent on the disk system vendor to make the design 
decisions for the parts of it that are not actually disks.


* Reliability is not cheap. You get what you pay for. I have worked with 
people who have spent their whole professional lives doing reliability 
engineering and reliability analysis. I spent 2 years of my life about 20 
years ago redesigning the NASDAQ stock exchange network so that it would be 
more reliable, and I believe that engineers from Sun Microsystems did 
another redesign a dozen years later to get even more reliability. There 
were no "quick fixes". We had to re-think everything.

* I believe that it is not possible to get full reliability on a Windows 
system no matter what you do. If you store no data on the Windows system, 
using it only as a client to access a reliable server, then you can just 
replace it when it breaks, but trying to build reliability directly into a 
Windows system is fundamentally a lost cause.

* There are no magic brands at consumer price levels. In the consumer space, 
price is everything. If you want to buy Leica-grade components, you have to 
pay Leica-grade prices. If you think that some brand (e.g. Drobo) has been 
good for you, that just means that Drobo had at that time a very successful 
buyer and QA team.  These things change, and in the consumer space there is 
no company that has rigid reviewed reliability standards for its products. 
You have to buy industrial-grade disks for 
that.

* Because reliability is a whole-system issue, there are whole-system 
vendors that have better reliability records than others. This comes from 
having engineers who know how to design more-reliable systems, purchasing 
agents who know how to buy more-reliable components, QA departments that 
know how to test for reliability, manufacturing departments that know how 
both to manufacture for reliability and to incorporate what they learned 
from the QA department to make it better.

* There are three fundamentally different philosophies of making things be 
reliable. I call them
   -- the Phone Company way
   -- the Internet way
   -- the NASA way
  I won't bore you by trying to define or describe them. But I use the 
Internet way in my personal life.

I'd better shut up now.

Brian Reid








Replies: Reply from r.s.taylor at comcast.net (Richard Taylor) ([Leica] Seagate)
Reply from scheng at aotera.org (Spencer Cheng) ([Leica] Seagate)
Reply from images at comporium.net (Tina Manley) ([Leica] Seagate)
In reply to: Message from images at comporium.net (Tina Manley) ([Leica] Seagate)
Message from kcarney1 at cox.net (Ken Carney) ([Leica] Seagate)
Message from images at comporium.net (Tina Manley) ([Leica] Seagate)
Message from jbm at jbm.org (Jeff Moore) ([Leica] Seagate)
Message from images at comporium.net (Tina Manley) ([Leica] Seagate)
Message from jbm at jbm.org (Jeff Moore) ([Leica] Seagate)