[tahoe-dev] Observations on Tahoe performance

Shawn Willden shawn at willden.org
Tue Aug 25 05:47:54 PDT 2009


On Tuesday 25 August 2009 02:59:10 am Brian Warner wrote:
> The actual call that Foolscap makes is a transport.write(), which is
> implemented in Twisted by appending the outbound data to a list and
> marking the socket as writeable (so that select() or poll() will wake up
> the process when that data can become written).

Can't you pass a disk-backed buffer-like object to transport.write()?  Perhaps 
an mmap object?  If there's a reason that doesn't work, then 
transport.write() needs to either accept a file-like object or implement 
disk-based buffering itself.  Expecting the data to be small enough to be 
queueable in RAM isn't a good idea, even with ubiquitous virtualized memory 
and gigabytes of physical RAM.

> So the kernel will consume 64KB, and the transport's list (in
> userspace/python/Twisted) will consume N/k*1GB. Badness.

Indeed.

> Whereas, if we just put off creating later segments until the earlier
> ones have been retired, we don't consume more than a segment's worth of
> memory at any one time.

Another advantage of delaying segment creation is that it will be necessary 
for (someday) streaming uploads, which are an important feature, IMO.  But, 
as you mentioned with the 1.5.0 improvements, it is important to pipeline the 
process and ensure that you always have enough buffered up to keep the 
sending socket busy.

> We've always had low-memory-footprint as a goal 
> for Tahoe, especially since the previous codebase which it replaced

I think that's a very important goal.  Especially since using home 
router/access point devices as storage nodes is my strategy for making 
GridBackup usable in homes without always-on desktop computers.  Physical RAM 
is still at a premium on such devices.

	Shawn.


More information about the tahoe-dev mailing list