close Warning: Can't synchronize with repository "(default)" (Unsupported version control system "darcs": Can't find an appropriate component, maybe the corresponding plugin was not enabled? ). Look in the Trac log for more information.

Changes between Version 18 and Version 19 of WikiStart


Ignore:
Timestamp:
2012-09-16 18:48:42 (12 years ago)
Author:
zooko
Comment:

README

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v18 v19  
    1 = !StringChain =
    2 
    3 Sometimes you want to accumulate data from some source while at the same time processing the data that arrived first. The naïve way to do it in Python is like this:
    4 
    5 {{{
    6 def __init__(self):
    7     self.accum = '' # Will hold all unprocessed bytes
    8 
    9 def add_data(self, some_more_data):
    10     # some_more_data is a string
    11     self.accum += some_more_data
    12 
    13 def process_some(self, how_much):
    14     some = self.accum[:how_much]
    15     del self.accum[:how_much]
    16 }}}
    17 
    18 This works fine as long as the total amount of bytes accumulated and the number of separate {{{add_data()}}} events stay small, but it has O(N^2^) behavior and has bad performance if those numbers get large. Here are some benchmarks generated by running {{{python -OOu -c 'from stringchain.bench import bench; bench.quick_bench()'}}} as instructed by [source:README.txt the README.txt file].
    19 
    20 The {{{N:}}} in the left-hand column is how many bytes were in the test dataset. The results are all in {{{ns/op}}}: nanoseconds per byte. "Stringy" means the string-based idiom sketched above and "!StringChain" means using the !StringChain class. "StringIOy" means another implementation using the cStringIO class.
    21 
    22 {{{
    23 $ python -OOu -c 'from stringchain.bench import bench; bench.slow_bench()'
    24 impl:  StringChain
    25 task:  _accumulate_then_one_gulp
    26 N:   10000, ns/op: best:    2.79,   2th-best:    3.39, ave:    3.90,   2th-worst:    4.70, worst:    4.82 (of      5)
    27 N:   50000, ns/op: best:    2.76,   2th-best:    2.78, ave:    2.95,   2th-worst:    2.94, worst:    3.36 (of      5)
    28 N:  100000, ns/op: best:    2.33,   2th-best:    2.34, ave:    2.40,   2th-worst:    2.45, worst:    2.54 (of      5)
    29 N:  500000, ns/op: best:    2.03,   2th-best:    2.10, ave:    2.26,   2th-worst:    2.28, worst:    2.70 (of      5)
    30 N: 1000000, ns/op: best:    1.98,   2th-best:    2.01, ave:    2.30,   2th-worst:    2.45, worst:    2.68 (of      5)
    31 N: 5000000, ns/op: best:    2.11,   2th-best:    2.14, ave:    2.18,   2th-worst:    2.20, worst:    2.25 (of      5)
    32 task:  _many_gulps_str
    33 N:   10000, ns/op: best:    5.20,   2th-best:    5.51, ave:    6.60,   2th-worst:    6.10, worst:   10.61 (of      5)
    34 N:   50000, ns/op: best:    3.24,   2th-best:    3.24, ave:    3.42,   2th-worst:    3.42, worst:    3.88 (of      5)
    35 N:  100000, ns/op: best:    2.87,   2th-best:    2.88, ave:    2.96,   2th-worst:    2.93, worst:    3.22 (of      5)
    36 N:  500000, ns/op: best:    2.80,   2th-best:    2.84, ave:    2.90,   2th-worst:    2.95, worst:    3.00 (of      5)
    37 N: 1000000, ns/op: best:    2.70,   2th-best:    2.70, ave:    2.77,   2th-worst:    2.84, worst:    2.88 (of      5)
    38 N: 5000000, ns/op: best:    2.70,   2th-best:    2.71, ave:    2.72,   2th-worst:    2.74, worst:    2.76 (of      5)
    39 task:  _alternate_str
    40 N:   10000, ns/op: best:    6.79,   2th-best:    7.10, ave:    7.84,   2th-worst:    8.61, worst:    8.80 (of      5)
    41 N:   50000, ns/op: best:    4.74,   2th-best:    4.78, ave:    4.83,   2th-worst:    4.88, worst:    4.90 (of      5)
    42 N:  100000, ns/op: best:    4.34,   2th-best:    4.35, ave:    4.50,   2th-worst:    4.53, worst:    4.78 (of      5)
    43 N:  500000, ns/op: best:    3.98,   2th-best:    4.12, ave:    4.25,   2th-worst:    4.36, worst:    4.50 (of      5)
    44 N: 1000000, ns/op: best:    3.96,   2th-best:    4.00, ave:    4.07,   2th-worst:    4.02, worst:    4.36 (of      5)
    45 N: 5000000, ns/op: best:    3.96,   2th-best:    3.97, ave:    4.07,   2th-worst:    4.18, worst:    4.19 (of      5)
    46 impl:  StringIOy
    47 task:  _accumulate_then_one_gulp
    48 N:   10000, ns/op: best:    4.10,   2th-best:    4.29, ave:    4.84,   2th-worst:    5.29, worst:    5.51 (of      5)
    49 N:   50000, ns/op: best:    5.06,   2th-best:    5.60, ave:    5.75,   2th-worst:    6.14, worst:    6.32 (of      5)
    50 N:  100000, ns/op: best:    4.51,   2th-best:    4.52, ave:    4.60,   2th-worst:    4.67, worst:    4.69 (of      5)
    51 N:  500000, ns/op: best:    3.80,   2th-best:    3.90, ave:    4.70,   2th-worst:    5.10, worst:    6.71 (of      5)
    52 N: 1000000, ns/op: best:    4.19,   2th-best:    4.29, ave:    4.69,   2th-worst:    5.08, worst:    5.49 (of      5)
    53 N: 5000000, ns/op: best:    5.27,   2th-best:    5.30, ave:    5.57,   2th-worst:    5.89, worst:    6.00 (of      5)
    54 task:  _many_gulps_str
    55 N:   10000, ns/op: best:    4.70,   2th-best:    5.51, ave:    5.70,   2th-worst:    5.70, worst:    6.91 (of      5)
    56 N:   50000, ns/op: best:   24.28,   2th-best:   24.34, ave:   27.87,   2th-worst:   25.36, worst:   40.90 (of      5)
    57 N:  100000, ns/op: best:   38.64,   2th-best:   38.84, ave:   39.03,   2th-worst:   39.22, worst:   39.38 (of      5)
    58 N:  500000, ns/op: best:  158.61,   2th-best:  159.18, ave:  162.47,   2th-worst:  165.73, worst:  168.40 (of      5)
    59 N: 1000000, ns/op: best:  455.90,   2th-best:  459.57, ave:  471.79,   2th-worst:  488.72, worst:  490.74 (of      5)
    60 N: 5000000, ns/op: best: 2730.39,   2th-best: 2734.04, ave: 2771.03,   2th-worst: 2734.04, worst: 2848.67 (of      3)
    61 task:  _alternate_str
    62 N:   10000, ns/op: best:    6.99,   2th-best:    7.08, ave:    8.10,   2th-worst:    7.70, worst:   11.11 (of      5)
    63 N:   50000, ns/op: best:    8.34,   2th-best:    8.34, ave:    8.52,   2th-worst:    8.38, worst:    9.18 (of      5)
    64 N:  100000, ns/op: best:   22.37,   2th-best:   22.68, ave:   22.82,   2th-worst:   22.90, worst:   23.44 (of      5)
    65 N:  500000, ns/op: best:   74.92,   2th-best:   75.96, ave:   76.61,   2th-worst:   76.87, worst:   78.45 (of      5)
    66 N: 1000000, ns/op: best:  135.24,   2th-best:  136.14, ave:  139.87,   2th-worst:  137.46, worst:  153.85 (of      5)
    67 N: 5000000, ns/op: best: 1000.30,   2th-best: 1003.59, ave: 1076.12,   2th-worst: 1003.59, worst: 1224.47 (of      3)
    68 impl:  Stringy
    69 task:  _accumulate_then_one_gulp
    70 N:   10000, ns/op: best:    2.38,   2th-best:    2.69, ave:    3.01,   2th-worst:    3.00, worst:    4.20 (of      5)
    71 N:   50000, ns/op: best:   15.86,   2th-best:   16.16, ave:   18.65,   2th-worst:   16.34, worst:   28.66 (of      5)
    72 N:  100000, ns/op: best:   26.60,   2th-best:   26.80, ave:   30.51,   2th-worst:   34.00, worst:   35.07 (of      5)
    73 N:  500000, ns/op: best:  102.15,   2th-best:  102.16, ave:  103.20,   2th-worst:  103.29, worst:  105.36 (of      5)
    74 N: 1000000, ns/op: best:  266.58,   2th-best:  266.73, ave:  280.54,   2th-worst:  286.67, worst:  311.59 (of      5)
    75 N: 5000000, ns/op: best: 1543.46,   2th-best: 1554.51, ave: 1553.37,   2th-worst: 1554.51, worst: 1562.16 (of      3)
    76 task:  _many_gulps_str
    77 N:   10000, ns/op: best:    2.60,   2th-best:    2.91, ave:    3.15,   2th-worst:    3.50, worst:    3.72 (of      5)
    78 N:   50000, ns/op: best:   15.06,   2th-best:   15.08, ave:   15.34,   2th-worst:   15.60, worst:   15.86 (of      5)
    79 N:  100000, ns/op: best:   24.20,   2th-best:   24.21, ave:   25.17,   2th-worst:   24.88, worst:   27.80 (of      5)
    80 N:  500000, ns/op: best:   98.93,   2th-best:  101.46, ave:  101.85,   2th-worst:  102.31, worst:  105.05 (of      5)
    81 N: 1000000, ns/op: best:  264.17,   2th-best:  266.87, ave:  268.33,   2th-worst:  270.90, worst:  272.28 (of      5)
    82 N: 5000000, ns/op: best: 1556.56,   1th-best: 1556.56, ave: 1586.65,   1th-worst: 1616.73, worst: 1616.73 (of      2)
    83 task:  _alternate_str
    84 N:   10000, ns/op: best:    4.29,   2th-best:    4.70, ave:    5.48,   2th-worst:    6.01, worst:    6.39 (of      5)
    85 N:   50000, ns/op: best:    5.16,   2th-best:    5.20, ave:    5.38,   2th-worst:    5.44, worst:    5.68 (of      5)
    86 N:  100000, ns/op: best:   20.53,   2th-best:   20.65, ave:   20.77,   2th-worst:   20.71, worst:   21.30 (of      5)
    87 N:  500000, ns/op: best:   81.43,   2th-best:   81.60, ave:   82.35,   2th-worst:   82.54, worst:   84.52 (of      5)
    88 N: 1000000, ns/op: best:  143.76,   2th-best:  146.12, ave:  146.30,   2th-worst:  146.19, worst:  149.30 (of      5)
    89 N: 5000000, ns/op: best:  917.51,   2th-best:  925.97, ave:  923.90,   2th-worst:  925.97, worst:  928.21 (of      3)
    90 }}}
    91 
    92 The naïve approach is slower than the !StringChain class, and the bigger the dataset the slower it goes. The !StringChain class is fast and also it is scalable (with regard to these benchmarks at least...).
    93 
    94 Okay how do you use it? It is very simple -- see [source:stringchain/stringchain.py] and let me know if that interface doesn't fit your use case.
    95 
    96 You can get the package from http://pypi.python.org/pypi/stringchain or with {{{darcs get http://tahoe-lafs.org/source/stringchain/trunk}}}.
    97 
    98 It has unit tests. It is in pure Python (it uses {{{collections.deque}}} and {{{string}}}).
    99 
    100 LICENCE
    101 
    102 You may use this package under the GNU General Public License, version 2 or, at
    103 your option, any later version.  You may use this package under the Transitive
    104 Grace Period Public Licence, version 1.0, or at your option, any later version.
    105 (You may choose to use this package under the terms of either licence, at your
    106 option.)  See the file [source:COPYING.GPL] for the terms of the GNU General Public
    107 License, version 2.  See the file [source:COPYING.TGPPL.html] for the terms of the
    108 Transitive Grace Period Public Licence, version 1.0.
    109 
    110 ALSO, if you would like to use this software under different terms than the ones offered above, then ask.
     1[source:README.rst README]