[tahoe-lafs-trac-stream] [tahoe-lafs] #2106: RAIC behaviour different from RAID behaviour
tahoe-lafs
trac at tahoe-lafs.org
Thu Nov 14 21:03:27 UTC 2013
#2106: RAIC behaviour different from RAID behaviour
----------------------+------------------------
Reporter: sickness | Owner:
Type: defect | Status: new
Priority: normal | Milestone: 1.11.0
Component: code | Version: 1.10.0
Keywords: | Launchpad Bug:
----------------------+------------------------
Let's assume we have a local RAID5 set of 4 identical disks attached on a
controller inside a computer.[[BR]]
This RAID5 level guarantees that if we lose 1 of 4 disks, we can continue
to not only read, but also write on the set, but in degraded mode.[[BR]]
When we change the failed disk with a new one, the RAID takes care of
repairing the set syncing the data in background and the 4th disk gets
populated again with chunks of our waluable data (not only parity because
we know that in RAID5 parity is striped but explaining this isn't the
scope of this ticket)[[BR]]
starting condition:[[BR]]
DISK1[chunk1] DISK2[chunk2] DISK3[chunk3] DISK4[chunk4] [[BR]]
broken disk:[[BR]]
DISK1[chunk1] DISK2[chunk2] DISK3[chunk3] DISK4[XXXXXX][[BR]]
new disk is put in place:[[BR]]
DISK1[chunk1] DISK2[chunk2] DISK3[chunk3] DISK4[ ][[BR]]
repair rebuilds DISK4's chunk of data reading the other 3 disks:[[BR]]
DISK1[chunk1] DISK2[chunk2] DISK3[chunk3] DISK4[chunk4][[BR]]
Now let's assume we have a tahoe-lafs RAIC set of 4 identical servers on a
LAN.[[BR]]
To mimic the RAID5 behaviour we configure it to write 4 shares for every
file, needing only any 3 of them to succesfully read the file.[[BR]]
So in this way we have a RAIC that should behave like a RAID5.[[BR]]
We can lose any 1 of these 4 servers, and still be able to read the data,
and to repair it should we lose 1 server.[[BR]]
But what happens if we actually lose 1 of those 4 servers and then try to
read/repair the data? or maybe even write new data?[[BR]]
We will end up having ALL the 4 shares on just 3 servers, and when we
rebuild the 4th server
and put it back online, even repairing will not put shares on it because
the file will be seen as already healthy, but now what if we lose that one
server wich actually holds 2 shares of the same file?[[BR]]
starting condition:[[BR]]
SERV1[share1] SERV2[share2] SERV3[share3] SERV4[share4][[BR]]
broken server:[[BR]]
SERV1[share1] SERV2[share2] SERV3[share3] SERV4[XXXXXX][[BR]]
data is written, or scheduled repair is attempted and we get to this
situation:[[BR]]
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[XXXXXX][[BR]]
new server is put in place:[[BR]]
SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ] [[BR]]
now if we try to repair situation remains the same because as of now the
repairer
DOESN'T know that he has to actually rebalance share4 on SERV4, he just
tell us the file is healthy[[BR]]
we can still read and write data, so far so good, isn't it?[[BR]]
but what if SERV1 now suddenly gets broken?[[BR]]
SERV1[XXXXXX] SERV2[share2] SERV3[share3] SERV4[ ] [[BR]]
ok we can replace it:[[BR]]
SERV1[ ] SERV2[share2] SERV3[share3] SERV4[ ] [[BR]]
ok now we have a problem: how can we rebuild if we need 3 shares of 4 but
we have just 2 even if we previously had 4 servers and the file was listed
as "healthy" by the repairer?[[BR]]
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list