Version 10 (modified by zooko, 15 years ago) (diff) |
---|
StringChain
Sometimes you want to accumulate data from some source while at the same time processing the data that arrived first. The naive way to do it in Python is like this:
def __init__(self): self.accum = '' # Will hold all unprocessed bytes def add_data(self, some_more_data): # some_more_data is a string self.accum += some_more_data def process_some(self, how_much): some = self.accum[:how_much] del self.accum[:how_much]
This works fine as long as the total amount of bytes accumulated and the number of separate add_data() events stay small, but it has O(N2) behavior and has bad performance if those numbers get large. Here are some benchmarks generated by running python -OOu -c 'from stringchain.bench import bench; bench.quick_bench()' as instructed by the README.txt file.
The N: in the left-hand column is how many bytes were in the test dataset. The ave rate: number in the right-hand column is how many bytes per second were processed. "naive" means the string-based idiom sketched above and "strch" means using the StringChain class.
_buildup init_naive N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 890, ave rate: 58350579 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 265, ave rate: 34800398 N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 79, ave rate: 20745346 N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05, 2th-worst: 0.05, worst: 0.05 (of 5), reps/s: 20, ave rate: 10823850 _buildup init_strch N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 25543, ave rate: 1674043282 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 14179, ave rate: 1858538925 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 8016, ave rate: 2101513050 N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4108, ave rate: 2154215572 _consume init_naive_loaded N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 931, ave rate: 61037862 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 270, ave rate: 35454393 N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 74, ave rate: 19471963 N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05, 2th-worst: 0.05, worst: 0.06 (of 5), reps/s: 19, ave rate: 10146747 _consume init_strch_loaded N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4309, ave rate: 282447500 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 2313, ave rate: 303263357 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 1186, ave rate: 311159052 N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 606, ave rate: 317814669 _randomy init_naive N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 479, ave rate: 31450561 N: 131072, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 140, ave rate: 18461191 N: 262144, time: best: 0.02, 2th-best: 0.02, ave: 0.02, 2th-worst: 0.03, worst: 0.03 (of 5), reps/s: 42, ave rate: 11127714 N: 524288, time: best: 0.06, 2th-best: 0.07, ave: 0.08, 2th-worst: 0.08, worst: 0.09 (of 5), reps/s: 13, ave rate: 6906341 _randomy init_strch N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 973, ave rate: 63827127 N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 495, ave rate: 64970669 N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00, 2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 239, ave rate: 62913360 N: 524288, time: best: 0.01, 2th-best: 0.01, ave: 0.01, 2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 121, ave rate: 63811569
The naive approach is slower than the StringChain class, and the bigger the dataset the slower it goes. The StringChain class is fast and also it is scalable (with regard to these benchmarks at least...).
Okay how do you use it? It is very simple -- see stringchain/stringchain.py and let me know if that interface doesn't fit your use case.
You can get the package from http://pypi.python.org/pypi/stringchain or with darcs get http://tahoe-lafs.org/source/stringchain/trunk.
It has unit tests. It is in pure Python (it uses collections.deque and string).
Starting Points
- TracGuide -- Built-in Documentation
- The Trac project -- Trac Open Source Project
- Trac FAQ -- Frequently Asked Questions
- TracSupport -- Trac Support
For a complete list of local wiki pages, see TitleIndex.