[SGVLUG] Cluster Filesystems
Max Clark
max at clarksys.com
Sun Jan 8 10:03:55 PST 2006
Chris,
NetApp does support multiple head and device redundancy via a cluster
software option - this is just not economical feasible for many people.
For our customers with the 6 figure storage budget we can sell them a
wide selection of different options all with exceptional redundancy and
failover options - I am looking for something in the sub $10,000 dollar
range.
A series of ATA/SATA disks in a 4U w/ a Raidcore or 3ware controller
works well for mass storage requirements - however the recovery time due
to disk failure is way to painful (a disk failure on a 4TB array w/
500GB drives will take most of the day to rebuild.
I am looking to build a true COTS based system with each individual node
having between 1 and 4 hard drives. I want the data to be replicated
across the nodes in the storage cluster so that if one of the nodes
fails the storage is still accessible. I know with CIFS/NFS this does
not equate to automatic failover for the client - but the outage window
to reconnect to a different node is acceptable as long as the storage is
still accessible. While all of the storage nodes will be running Linux
the clients need to be a mix of Linux, Windows, Sun, OS X hence the need
for NFS/CIFS.
I know this is possible because Isilon has implemented this system with
a proprietary OS/Filesystem - I just need to know if this functionality
(of the filesystem) exists within the Open Source world.
Thanks,
Max
--
Max Clark
max [at] clarksys.com
http://www.clarksys.com
Chris Smith wrote:
> On 1/7/06, Max Clark <max at clarksys.com> wrote:
>> A recent failure of a customer's NetApp has again left me looking for a
>> better approach to network storage - specifically in the redundancy and
>> replication model. For the sites that can afford it we recommend and
>> sell the Isilon systems which give what I am looking for... multiple
>> nodes striped together to provide a distributed pool of storage that can
>> survive a node failure.
>
> I was pretty sure that NetApp had a way to approximate a functional
> high availablility system (something where one node would take over
> the IP of a failed node). It isn't perfect, but functional.
>
>> Ideally I'd love to run the Google File System
>> (http://labs.google.com/papers/gfs.html) but I don't think they would
>> give me a copy.
>
> The Google File System would probably not work to well for you any
> way. It isn't a proper POSIX file system (really, it's an API for
> managing data). It's optimised for a specific problem domain. It makes
> assumptions that are unlikely to be true in the general case (perhaps
> true in your case), like that file are mostly quite large, that one
> doesn't ever need to write to a file with anything other than an
> append, etc., etc.
>
> That said, if you are looking for something like it, you can look at
> the code in the nutch project:
>
> http://lucene.apache.org/nutch/
>
> The have implemented their own data management system which is based
> on similar principles as the Google File System.
>
>> Which leaves me with AFS and CODA. Can anyone give me
>> real world examples/tips/tricks with these systems?
>
> Don't use CODA for anything serious. AFS is a nice file system, but
> it's not really a cluster filesystem. It does function better (mildly)
> in the event of a server failure, but it is really just a network
> filesystem.
>
>> I would like to be able to take a bunch of 1U single CPU machines with
>> dual 250GB-500GB hard drives and cluster them together as a single NAS
>> system supporting NFS/CIFS clients. I figure I should be able to get
>> 0.2TB of usable protected storage into a node for ~$800/ea, this would
>> mean $5,600 for 1TB of protected storage (assuming parity and n+1).
>
> May I ask why you want to use multiple machines if you're still going
> to present an NFS/CIFS interface? In general, clustered filesystems
> really only make sense if the clients access them via their native
> interface.
>
> If you think about it, a single 4U machine with a nice RAID storage
> system. Heck, with SATA drives you can actually get that kind of
> storage out of a 1U (4 400GB drives with RAID-5 and you've got 1.2TB
> of storage) and at a bargain basement price (although without the same
> kind of high transaction rate performance you'd expect from higher end
> drives). While not a super high availability system, it'd have as good
> availability as what your are envisioning.
>
> If you are doing NFS/CIFS, you just aren't going to get the kind of
> redundancy you are talking about. If a client is talking to an
> NFS/CIFS server when it dies, there is going to be a service
> interruption (although particularly with UDP NFS you can do some
> clever things to provide a fairly smooth transiion). Probably the
> simple way to do that is have a designated master which serves an
> NFS/CIFS interface and then use Linux's network block device to RAID
> together the drives on all the other machines.
>
>> Thoughts and opinions would be very welcome.
>
> You probably are looking for a clustered file system. The ones that
> come to mind immediately are Lustre, SGI's CXFS, Red Hat's GFS, OCFS,
> PVFS, and PVFS2 (there are others).
>
> We are experimenting with Lustre at Yahoo Research, and I can say that
> the early results show just amazing performance, although you do need
> a really nice switch to sustain it. The down side of Lustre is that it
> only supports Linux clients. CXFS has a fairly broad set of client
> drivers, but I don't know of a Windows client. Same for GFS really. I
> think only PVFS has one --maybe OCFS has one, but I never looked. PVFS
> and PVFS2 are more geared towards massively parallel computer systems
> (to a certain extent all of the ones I mentioned are, but PVFS and
> PVFS2 are exceptional), so unless you are working on that you are
> probably better off with a more general purpose clustered filesystem.
>
> --
> Chris
>
More information about the SGVLUG
mailing list