= !StringChain =

Sometimes you want to accumulate data from some source while at the same time processing the data that arrived first. The naïve way to do it in Python is like this:

{{{
def __init__(self):
    self.accum = '' # Will hold all unprocessed bytes

def add_data(self, some_more_data):
    # some_more_data is a string
    self.accum += some_more_data

def process_some(self, how_much):
    some = self.accum[:how_much]
    del self.accum[:how_much]
}}}

This works fine as long as the total amount of bytes accumulated and the number of separate {{{add_data()}}} events stay small, but it has O(N^2^) behavior and has bad performance if those numbers get large. Here are some benchmarks generated by running {{{python -OOu -c 'from stringchain.bench import bench; bench.quick_bench()'}}} as instructed by [source:README.txt the README.txt file].

The {{{N:}}} in the left-hand column is how many bytes were in the test dataset. The {{{ave rate:}}} number in the right-hand column is how many bytes per second were processed. "naive" means the string-based idiom sketched above and "strch" means using the !StringChain class.

{{{
$ python -OOu -c 'from stringchain.bench import bench; bench.slow_bench()'
impl:  StringChain
task:  _accumulate_then_one_gulp
N:   10000, ns/op: best:    2.79,   2th-best:    3.39, ave:    3.90,   2th-worst:    4.70, worst:    4.82 (of      5)
N:   50000, ns/op: best:    2.76,   2th-best:    2.78, ave:    2.95,   2th-worst:    2.94, worst:    3.36 (of      5)
N:  100000, ns/op: best:    2.33,   2th-best:    2.34, ave:    2.40,   2th-worst:    2.45, worst:    2.54 (of      5)
N:  500000, ns/op: best:    2.03,   2th-best:    2.10, ave:    2.26,   2th-worst:    2.28, worst:    2.70 (of      5)
N: 1000000, ns/op: best:    1.98,   2th-best:    2.01, ave:    2.30,   2th-worst:    2.45, worst:    2.68 (of      5)
N: 5000000, ns/op: best:    2.11,   2th-best:    2.14, ave:    2.18,   2th-worst:    2.20, worst:    2.25 (of      5)
task:  _many_gulps_str
N:   10000, ns/op: best:    5.20,   2th-best:    5.51, ave:    6.60,   2th-worst:    6.10, worst:   10.61 (of      5)
N:   50000, ns/op: best:    3.24,   2th-best:    3.24, ave:    3.42,   2th-worst:    3.42, worst:    3.88 (of      5)
N:  100000, ns/op: best:    2.87,   2th-best:    2.88, ave:    2.96,   2th-worst:    2.93, worst:    3.22 (of      5)
N:  500000, ns/op: best:    2.80,   2th-best:    2.84, ave:    2.90,   2th-worst:    2.95, worst:    3.00 (of      5)
N: 1000000, ns/op: best:    2.70,   2th-best:    2.70, ave:    2.77,   2th-worst:    2.84, worst:    2.88 (of      5)
N: 5000000, ns/op: best:    2.70,   2th-best:    2.71, ave:    2.72,   2th-worst:    2.74, worst:    2.76 (of      5)
task:  _alternate_str
N:   10000, ns/op: best:    6.79,   2th-best:    7.10, ave:    7.84,   2th-worst:    8.61, worst:    8.80 (of      5)
N:   50000, ns/op: best:    4.74,   2th-best:    4.78, ave:    4.83,   2th-worst:    4.88, worst:    4.90 (of      5)
N:  100000, ns/op: best:    4.34,   2th-best:    4.35, ave:    4.50,   2th-worst:    4.53, worst:    4.78 (of      5)
N:  500000, ns/op: best:    3.98,   2th-best:    4.12, ave:    4.25,   2th-worst:    4.36, worst:    4.50 (of      5)
N: 1000000, ns/op: best:    3.96,   2th-best:    4.00, ave:    4.07,   2th-worst:    4.02, worst:    4.36 (of      5)
N: 5000000, ns/op: best:    3.96,   2th-best:    3.97, ave:    4.07,   2th-worst:    4.18, worst:    4.19 (of      5)
impl:  StringIOy
task:  _accumulate_then_one_gulp
N:   10000, ns/op: best:    4.10,   2th-best:    4.29, ave:    4.84,   2th-worst:    5.29, worst:    5.51 (of      5)
N:   50000, ns/op: best:    5.06,   2th-best:    5.60, ave:    5.75,   2th-worst:    6.14, worst:    6.32 (of      5)
N:  100000, ns/op: best:    4.51,   2th-best:    4.52, ave:    4.60,   2th-worst:    4.67, worst:    4.69 (of      5)
N:  500000, ns/op: best:    3.80,   2th-best:    3.90, ave:    4.70,   2th-worst:    5.10, worst:    6.71 (of      5)
N: 1000000, ns/op: best:    4.19,   2th-best:    4.29, ave:    4.69,   2th-worst:    5.08, worst:    5.49 (of      5)
N: 5000000, ns/op: best:    5.27,   2th-best:    5.30, ave:    5.57,   2th-worst:    5.89, worst:    6.00 (of      5)
task:  _many_gulps_str
N:   10000, ns/op: best:    4.70,   2th-best:    5.51, ave:    5.70,   2th-worst:    5.70, worst:    6.91 (of      5)
N:   50000, ns/op: best:   24.28,   2th-best:   24.34, ave:   27.87,   2th-worst:   25.36, worst:   40.90 (of      5)
N:  100000, ns/op: best:   38.64,   2th-best:   38.84, ave:   39.03,   2th-worst:   39.22, worst:   39.38 (of      5)
N:  500000, ns/op: best:  158.61,   2th-best:  159.18, ave:  162.47,   2th-worst:  165.73, worst:  168.40 (of      5)
N: 1000000, ns/op: best:  455.90,   2th-best:  459.57, ave:  471.79,   2th-worst:  488.72, worst:  490.74 (of      5)
N: 5000000, ns/op: best: 2730.39,   2th-best: 2734.04, ave: 2771.03,   2th-worst: 2734.04, worst: 2848.67 (of      3)
task:  _alternate_str
N:   10000, ns/op: best:    6.99,   2th-best:    7.08, ave:    8.10,   2th-worst:    7.70, worst:   11.11 (of      5)
N:   50000, ns/op: best:    8.34,   2th-best:    8.34, ave:    8.52,   2th-worst:    8.38, worst:    9.18 (of      5)
N:  100000, ns/op: best:   22.37,   2th-best:   22.68, ave:   22.82,   2th-worst:   22.90, worst:   23.44 (of      5)
N:  500000, ns/op: best:   74.92,   2th-best:   75.96, ave:   76.61,   2th-worst:   76.87, worst:   78.45 (of      5)
N: 1000000, ns/op: best:  135.24,   2th-best:  136.14, ave:  139.87,   2th-worst:  137.46, worst:  153.85 (of      5)
N: 5000000, ns/op: best: 1000.30,   2th-best: 1003.59, ave: 1076.12,   2th-worst: 1003.59, worst: 1224.47 (of      3)
impl:  Stringy
task:  _accumulate_then_one_gulp
N:   10000, ns/op: best:    2.38,   2th-best:    2.69, ave:    3.01,   2th-worst:    3.00, worst:    4.20 (of      5)
N:   50000, ns/op: best:   15.86,   2th-best:   16.16, ave:   18.65,   2th-worst:   16.34, worst:   28.66 (of      5)
N:  100000, ns/op: best:   26.60,   2th-best:   26.80, ave:   30.51,   2th-worst:   34.00, worst:   35.07 (of      5)
N:  500000, ns/op: best:  102.15,   2th-best:  102.16, ave:  103.20,   2th-worst:  103.29, worst:  105.36 (of      5)
N: 1000000, ns/op: best:  266.58,   2th-best:  266.73, ave:  280.54,   2th-worst:  286.67, worst:  311.59 (of      5)
N: 5000000, ns/op: best: 1543.46,   2th-best: 1554.51, ave: 1553.37,   2th-worst: 1554.51, worst: 1562.16 (of      3)
task:  _many_gulps_str
N:   10000, ns/op: best:    2.60,   2th-best:    2.91, ave:    3.15,   2th-worst:    3.50, worst:    3.72 (of      5)
N:   50000, ns/op: best:   15.06,   2th-best:   15.08, ave:   15.34,   2th-worst:   15.60, worst:   15.86 (of      5)
N:  100000, ns/op: best:   24.20,   2th-best:   24.21, ave:   25.17,   2th-worst:   24.88, worst:   27.80 (of      5)
N:  500000, ns/op: best:   98.93,   2th-best:  101.46, ave:  101.85,   2th-worst:  102.31, worst:  105.05 (of      5)
N: 1000000, ns/op: best:  264.17,   2th-best:  266.87, ave:  268.33,   2th-worst:  270.90, worst:  272.28 (of      5)
N: 5000000, ns/op: best: 1556.56,   1th-best: 1556.56, ave: 1586.65,   1th-worst: 1616.73, worst: 1616.73 (of      2)
task:  _alternate_str
N:   10000, ns/op: best:    4.29,   2th-best:    4.70, ave:    5.48,   2th-worst:    6.01, worst:    6.39 (of      5)
N:   50000, ns/op: best:    5.16,   2th-best:    5.20, ave:    5.38,   2th-worst:    5.44, worst:    5.68 (of      5)
N:  100000, ns/op: best:   20.53,   2th-best:   20.65, ave:   20.77,   2th-worst:   20.71, worst:   21.30 (of      5)
N:  500000, ns/op: best:   81.43,   2th-best:   81.60, ave:   82.35,   2th-worst:   82.54, worst:   84.52 (of      5)
N: 1000000, ns/op: best:  143.76,   2th-best:  146.12, ave:  146.30,   2th-worst:  146.19, worst:  149.30 (of      5)
N: 5000000, ns/op: best:  917.51,   2th-best:  925.97, ave:  923.90,   2th-worst:  925.97, worst:  928.21 (of      3)
}}}

The naive approach is slower than the !StringChain class, and the bigger the dataset the slower it goes. The !StringChain class is fast and also it is scalable (with regard to these benchmarks at least...).

Okay how do you use it? It is very simple -- see [source:stringchain/stringchain.py] and let me know if that interface doesn't fit your use case.

You can get the package from http://pypi.python.org/pypi/stringchain or with {{{darcs get http://tahoe-lafs.org/source/stringchain/trunk}}}.

It has unit tests. It is in pure Python (it uses {{{collections.deque}}} and {{{string}}}).

LICENCE

You may use this package under the GNU General Public License, version 2 or, at
your option, any later version.  You may use this package under the Transitive
Grace Period Public Licence, version 1.0, or at your option, any later version.
(You may choose to use this package under the terms of either licence, at your
option.)  See the file [source:COPYING.GPL] for the terms of the GNU General Public
License, version 2.  See the file [source:COPYING.TGPPL.html] for the terms of the
Transitive Grace Period Public Licence, version 1.0.

== Starting Points ==

 * TracGuide --  Built-in Documentation
 * [http://trac.edgewall.org/ The Trac project] -- Trac Open Source Project
 * [http://trac.edgewall.org/wiki/TracFaq Trac FAQ] -- Frequently Asked Questions
 * TracSupport --  Trac Support

For a complete list of local wiki pages, see TitleIndex.