[tahoe-dev] Fwd: incident report

Iantcho Vassilev ianchov at gmail.com
Thu Nov 15 12:20:00 UTC 2012


COOL!
Very interesting things here..
So the server is perfectly fine even with tahoe on it..load averate below 0.40

Now what i see is that from 3 days tahoe process grew to 350MB virtual
memory from 290MB when i started it..
Also i do upload every night backup from the server itself so i guess
those are the interesting numbers you said..

What do you mean using a different process for the gateway?
I am starting tahoe node only and it is started by ordinary users..
Now i have put sim hard memory limit to 500MB so it will be killed if
it conitnues to grow..


Iantcho
On Thu, Nov 15, 2012 at 1:57 PM, Zooko Wilcox-O'Hearn <zooko at zooko.com> wrote:
> On Thu, Nov 15, 2012 at 12:11 AM, Iantcho Vassilev <ianchov at gmail.com> wrote:
>> Here is the incident archive as from the server..
>
> Thanks, Iantcho!
>
> This is interesting. Let's see... I see the hostnames of all the
> storage servers from your grid. Oh, I see that Peter Secor is running
> one of them. Cool.
>
> It says that your tahoe-lafs node was overloaded with work so that it
> took 1, 7, or even 10 seconds to do a simple "how long did it take me
> to do nothing?" test which should have taken < 1 second. Was this
> because the server machine was overloaded, or more likely was the
> overload specific to the tahoe-lafs process?
>
> It looks like you're using the same node for storage server and as a
> gateway to upload and download files. Nothing wrong with that, just
> talking to myself out loud about I see in here.
>
> Okay, I don't see any unusual things in this log. Below are the
> statistics from the log about the function of your storage server.
>
> My guess is that the memory leak has something to do with this node
> acting as a gateway (uploading and downloading files to remote
> servers) rather than as a server. (Just because the gateway does a lot
> more complicated work than the server does.) That doesn't mean the
> memory leak is okay — I still want to fix it — but maybe you could
> help track it down by using a different process for gateway and see
> whether the gateway or the server is the one that starts using too
> much memory next time around.
>
> Regards,
>
> Zooko
>
> {'counters': {'downloader.bytes_downloaded': 323424955,
>               'downloader.files_downloaded': 54,
>               'mutable.bytes_published': 16519313,
>               'mutable.bytes_retrieved': 53957072,
>               'mutable.files_published': 205,
>               'mutable.files_retrieved': 744,
>               'storage_server.abort': 3,
>               'storage_server.add-lease': 1375,
>               'storage_server.allocate': 1610,
>               'storage_server.bytes_added': 4641488250,
>               'storage_server.close': 1248,
>               'storage_server.get': 40867,
>               'storage_server.read': 7625,
>               'storage_server.readv': 12895,
>               'storage_server.write': 236804,
>               'storage_server.writev': 745,
>               'uploader.bytes_uploaded': 4701532493,
>               'uploader.files_uploaded': 205},
>  'stats': {'chk_upload_helper.active_uploads': 0,
>            'chk_upload_helper.encoded_bytes': 0,
>            'chk_upload_helper.encoding_count': 0,
>            'chk_upload_helper.encoding_size': 0,
>            'chk_upload_helper.encoding_size_old': 0,
>            'chk_upload_helper.fetched_bytes': 0,
>            'chk_upload_helper.incoming_count': 0,
>            'chk_upload_helper.incoming_size': 0,
>            'chk_upload_helper.incoming_size_old': 0,
>            'chk_upload_helper.resumes': 0,
>            'chk_upload_helper.upload_already_present': 0,
>            'chk_upload_helper.upload_need_upload': 0,
>            'chk_upload_helper.upload_requests': 0,
>            'cpu_monitor.15min_avg': 0.011144453469539533,
>            'cpu_monitor.1min_avg': 0.018499974711772733,
>            'cpu_monitor.5min_avg': 0.01586668887236763,
>            'cpu_monitor.total': 2622.33,
>            'load_monitor.avg_load': 0.0435554305712382,
>            'load_monitor.max_load': 1.8420119285583496,
>            'node.uptime': 312675.6783568859,
>            'storage_server.accepting_immutable_shares': 1,
>            'storage_server.allocated': 0,
>            'storage_server.disk_avail': 400010970112,
>            'storage_server.disk_free_for_nonroot': 950010970112,
>            'storage_server.disk_free_for_root': 1004986490880,
>            'storage_server.disk_total': 1090848337920,
>            'storage_server.disk_used': 85861847040,
>            'storage_server.latencies.add-lease.01_0_percentile':
> 0.0001361370086669922,
>            'storage_server.latencies.add-lease.10_0_percentile':
> 0.00014400482177734375,
>            'storage_server.latencies.add-lease.50_0_percentile':
> 0.0005128383636474609,
>            'storage_server.latencies.add-lease.90_0_percentile':
> 0.01019287109375,
>            'storage_server.latencies.add-lease.95_0_percentile':
> 0.019355058670043945,
>            'storage_server.latencies.add-lease.99_0_percentile':
> 0.03629708290100098,
>            'storage_server.latencies.add-lease.99_9_percentile':
> 0.49553394317626953,
>            'storage_server.latencies.add-lease.mean': 0.003135397434234619,
>            'storage_server.latencies.add-lease.samplesize': 1000,
>            'storage_server.latencies.allocate.01_0_percentile':
> 0.00038504600524902344,
>            'storage_server.latencies.allocate.10_0_percentile':
> 0.0006988048553466797,
>            'storage_server.latencies.allocate.50_0_percentile':
> 0.0010409355163574219,
>            'storage_server.latencies.allocate.90_0_percentile':
> 0.015402078628540039,
>            'storage_server.latencies.allocate.95_0_percentile':
> 0.020718097686767578,
>            'storage_server.latencies.allocate.99_0_percentile':
> 0.040006160736083984,
>            'storage_server.latencies.allocate.99_9_percentile':
> 1.5722789764404297,
>            'storage_server.latencies.allocate.mean': 0.005430383205413818,
>            'storage_server.latencies.allocate.samplesize': 1000,
>            'storage_server.latencies.close.01_0_percentile':
> 9.608268737792969e-05,
>            'storage_server.latencies.close.10_0_percentile':
> 0.00010800361633300781,
>            'storage_server.latencies.close.50_0_percentile':
> 0.0002491474151611328,
>            'storage_server.latencies.close.90_0_percentile':
> 0.00026607513427734375,
>            'storage_server.latencies.close.95_0_percentile':
> 0.0002830028533935547,
>            'storage_server.latencies.close.99_0_percentile':
> 0.02466297149658203,
>            'storage_server.latencies.close.99_9_percentile':
> 0.44386720657348633,
>            'storage_server.latencies.close.mean': 0.00146563982963562,
>            'storage_server.latencies.close.samplesize': 1000,
>            'storage_server.latencies.get.01_0_percentile':
> 0.00011420249938964844,
>            'storage_server.latencies.get.10_0_percentile':
> 0.00019693374633789062,
>            'storage_server.latencies.get.50_0_percentile':
> 0.0003380775451660156,
>            'storage_server.latencies.get.90_0_percentile': 0.0416719913482666,
>            'storage_server.latencies.get.95_0_percentile': 0.06571078300476074,
>            'storage_server.latencies.get.99_0_percentile': 0.24586009979248047,
>            'storage_server.latencies.get.99_9_percentile': 2.745851993560791,
>            'storage_server.latencies.get.mean': 0.021070765972137452,
>            'storage_server.latencies.get.samplesize': 1000,
>            'storage_server.latencies.read.01_0_percentile': 1.9073486328125e-05,
>            'storage_server.latencies.read.10_0_percentile':
> 2.002716064453125e-05,
>            'storage_server.latencies.read.50_0_percentile':
> 2.2172927856445312e-05,
>            'storage_server.latencies.read.90_0_percentile':
> 4.601478576660156e-05,
>            'storage_server.latencies.read.95_0_percentile':
> 6.29425048828125e-05,
>            'storage_server.latencies.read.99_0_percentile':
> 0.00017309188842773438,
>            'storage_server.latencies.read.99_9_percentile': 0.01799607276916504,
>            'storage_server.latencies.read.mean': 0.0001399552822113037,
>            'storage_server.latencies.read.samplesize': 1000,
>            'storage_server.latencies.readv.01_0_percentile':
> 0.00012803077697753906,
>            'storage_server.latencies.readv.10_0_percentile':
> 0.00014901161193847656,
>            'storage_server.latencies.readv.50_0_percentile':
> 0.00024318695068359375,
>            'storage_server.latencies.readv.90_0_percentile':
> 0.00039505958557128906,
>            'storage_server.latencies.readv.95_0_percentile':
> 0.00046706199645996094,
>            'storage_server.latencies.readv.99_0_percentile':
> 0.03636002540588379,
>            'storage_server.latencies.readv.99_9_percentile': 1.7137258052825928,
>            'storage_server.latencies.readv.mean': 0.0028906521797180174,
>            'storage_server.latencies.readv.samplesize': 1000,
>            'storage_server.latencies.write.01_0_percentile':
> 5.984306335449219e-05,
>            'storage_server.latencies.write.10_0_percentile':
> 8.296966552734375e-05,
>            'storage_server.latencies.write.50_0_percentile':
> 9.393692016601562e-05,
>            'storage_server.latencies.write.90_0_percentile':
> 0.00013208389282226562,
>            'storage_server.latencies.write.95_0_percentile':
> 0.00014901161193847656,
>            'storage_server.latencies.write.99_0_percentile':
> 0.0001728534698486328,
>            'storage_server.latencies.write.99_9_percentile': 1.1543080806732178,
>            'storage_server.latencies.write.mean': 0.0019269251823425292,
>            'storage_server.latencies.write.samplesize': 1000,
>            'storage_server.latencies.writev.01_0_percentile':
> 0.00039196014404296875,
>            'storage_server.latencies.writev.10_0_percentile':
> 0.0004100799560546875,
>            'storage_server.latencies.writev.50_0_percentile':
> 0.0005409717559814453,
>            'storage_server.latencies.writev.90_0_percentile':
> 0.0009508132934570312,
>            'storage_server.latencies.writev.95_0_percentile':
> 0.0010209083557128906,
>            'storage_server.latencies.writev.99_0_percentile':
> 0.016919851303100586,
>            'storage_server.latencies.writev.99_9_percentile': None,
>            'storage_server.latencies.writev.mean': 0.0010515379425663277,
>            'storage_server.latencies.writev.samplesize': 745,
>            'storage_server.reserved_space': 550000000000,
>            'storage_server.total_bucket_count': 67873}}
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev


More information about the tahoe-dev mailing list