[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2472: encrypted cloud database
Tahoe-LAFS
trac at tahoe-lafs.org
Tue Jul 21 18:53:02 UTC 2015
#2472: encrypted cloud database
---------------------+---------------------------
Reporter: zooko | Owner: daira
Type: defect | Status: new
Priority: normal | Milestone: undecided
Component: unknown | Version: 1.10.1
Keywords: | Launchpad Bug:
---------------------+---------------------------
Tahoe-LAFS does a reasonable job of flat-file storage, and of directories
structure. But, kids these days (for the last few decades, I mean) are
really into ''structured storage'', i.e. relational databases, queriable
nosql databases, etc.
Here's a proposal for a stab at a "Minimum Viable Product" for an end-to-
end encrypted cloud database. It's extremely simple: store a sqlite db in
Tahoe-LAFS.
This would immediately give off-site storage (possibly even peer-to-peer
if the underlying Tahoe-LAFS grid is a peer-to-peer grid), erasure-coding
for redundancy, and it would also immediately give Tahoe-LAFS's nice
access-control semantics: you can give people read-only access to your
sqlitedb.
A potentially interesting use for this would be to store Tahoe-LAFS caps
in the sqlitedb so that you can query them out. ☺
There are a few important details about how to map sqlite's storage needs
to Tahoe-LAFS's storage offerings for best performance and to retain
Tahoe-LAFS's guarantees about access control and atomicity and so forth. I
looked into it at one point about a year ago, and unfortunately didn't
post notes to the trac so I don't remember precisely, what I decided, but
I think it was that the sqlitedb should be in write-ahead-logging `WAL`
mode (https://www.sqlite.org/wal.html), and with exclusive locking mode,
and should be stored a single MDMF file with its segment size set to be
the same as the sqlitedb's page size.
The `-wal` file should probably also be an MDMF, although it would be cool
if sqlite happened to use it in write-once mode, in which case ''maybe''
it could be an immutable.
There's an open issue about whether read-only access to such a DB would
work without `PRAGMA journal_mode=DELETE`. Read
https://www.sqlite.org/wal.html#readonly to see what I mean, and keep in
mind that because we're telling the user that they have to set exclusive
locking mode: https://www.sqlite.org/wal.html#noshm
With this setup, the cap to the database has to be a cap to the directory
''containing'' the sqlitedb file, not a cap to the sqlitedb file itself.
That's because sqlite needs to access the `-wal` file adjacent to the
sqlitedb file itself.
A different approach would be to use the older rollback-log functionality
of sqlite instead of WAL. The trade-offs listed in
https://www.sqlite.org/wal.html make it sound like maybe that would fit
better into Tahoe-LAFS. It might require experimentation and benchmarking
to understand.
But also it requires careful study of things like
https://www.sqlite.org/lockingv3.html#how_to_corrupt and
https://www.sqlite.org/atomiccommit.html#sect_9_0 to figure out if Tahoe-
LAFS could provide the guarantees that sqlite needs. I think we can, and I
tentatively think that the `WAL` is easier to guarantee than the rollback
journal, because with the rollback journal there is a positive requirement
to ''preserve'' and make ''available'' any hot journal, or else corruption
can result, whereas with a `WAL` a failure of preservation or availability
of the `-wal` just results in rollback, not corruption. I think.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2472>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list