[tahoe-dev] Choosing a distributed FS?
Nik
distfs at babel.homelinux.net
Thu Apr 11 02:34:17 UTC 2013
Hi All,
Whilst I am familiar with the general concepts of a distributed
filesystem/datastore, I have only recently started looking for candidate
implementations, and find myself a little overwhelmed by the amount of
choice available.
The elegance of Tahoe-FS appeals to me, but I am still unsure how well
my various applications may fit with it and/or other options.
I have 4 possible applications in mind, and was hoping people here could
give me advice to help me narrow down the field of candidates.
Whilst I am definitely not asking people in this forum to recommend some
other product, I am hoping to at least get comments along the lines of:
"Tahoe-FS would definitely be a good fit" or "Tahoe-FS would definitely
not be a good fit - you could look at XYZ instead."
Of course, if you are comfortable giving more detailed comments on how
different products would fit my needs, I would be very grateful.
I support the IT of an Architecture practice that employs about 40
architects, so we have a set of servers in a computer room, and 40+
workstations on desks, all connected with a Gb LAN.
We manage the workstations with FOG ( http://www.fogproject.org/ ) which
supports PXE boot, wake-on-lan (WOL), and a task scheduler; so I can
schedule tasks that will perform unattended actions, such as booting all
workstations to a specific PXE-provided OS, and initiating some action.
(Current actions in this vein include overnight virus scanning, for
example.)
So, here are the applications I am considering:
1. We typically use less than half the disk space on the workstations -
so I am considering creating a distributed datastore out of the unused
space, and utilising it for non-online storage - thereby freeing up
considerable space on the servers.
The standard software image we deploy to the workstations is a 60GB
image (of Windows - yuk!).
On the older workstations there may only be 20GB spare, but on many of
the newer ones, there would be 60-140GB, depending on the drive
installed. Assuming an average of 40GB spare, 40x40GB = 1.6TB, which is
not too shabby.
So my thought was to configure a small linux kernel with a distributed
FS installed (Tahoe-FS?), and use FOG to boot the workstations to this
kernel each night - thus giving me a TB datastore that I can use each night.
For example, this would make an ideal area for disk-based backup of the
servers (fileserver, email server, intranet server, FOG image server).
It could also be a useful place to archive OS images.
The data would normally be large immutable files:
* GB tar archives of full and incremental backup images;
* GB OS image files;
Most data would not be appended to, but would simply be stored, and
possibly deleted after some time.
Tahoe-FS seems a good choice here, although I have been looking at Ceph
as well.
2. I am also considering if I want to make this distributed FS online
during the day.
Tahoe-FS can support Windows storage nodes (yes?), and so I *could* add
Tahoe-FS to our standard workstation image, and thereby have a TB
datastore available during office hours.
I am still not sure this is practical, as the number of running
workstations will vary, and if someone came in over the weekend, we
would have to start most of the workstations (using WOL) to get the
datastore up and running.
And I am concerned that the load of being an active storage node might
slow the workstations down sufficiently to annoy the users.
I would have to investigate the file-save semantics of the applications
we use most (ArchiCAD, Sketchup, MS Word, image-editing), but I think
they are mostly file-replace options rather than file-append operations.
Does anyone have suggestions or comments on this?
3. I am planning to consolidate 3 new servers (HP proliant something);
run our existing server processes in container-type VMs (eg OpenVZ); and
create a distributed filesystem out of the local disks direct-attached
to the servers (about 350GB each) to store VM images, email-inboxes, etc.
I still haven't worked out whether I will assemble the software
components for this myself, or use a pre-assembled solution such as
ProxMox or OpenStack.
The privacy features of Tahoe-FS are not so important in this
application, and I am wondering if something like Ceph (or other?) would
be a better fit?
4. I also support a volunteer library.
I have installed a single server and 6 or 7 workstations and they are
all running linux.
I am going to create a distributed FS, again using spare space on the
workstations, for storing unattended backups of the server data.
In this instance, I wouldn't even need to boot the workstations to a
different OS. The benefit of a separate OS would be isolation of the
backup data; and the downside would be that the backup data would not be
readily available for immediate recovery of a lost or corrupted file.
Again, Tahoe-FS and Ceph seem viable candidates, although there seem to
be countless others which could also be considered.
Thanks in advance for any and all advice and suggestions.
Cheers!
Nik
More information about the tahoe-dev
mailing list