[tahoe-lafs-trac-stream] [tahoe-lafs] #2138: file formatting conventions for text files in our source repo

Wed Dec 18 00:04:53 UTC 2013

#2138: file formatting conventions for text files in our source repo
-----------------------------+-----------------------
     Reporter:  zooko        |      Owner:  daira
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:  undecided
    Component:  unknown      |    Version:  1.10.0
   Resolution:               |   Keywords:
Launchpad Bug:               |
-----------------------------+-----------------------
Description changed by zooko:

Old description:

> This makes it so that emacs knows the intended character encoding, BOM,
> end-of-line markers, and standard line-width of these files.
>
> Also this is a form of documentation. It means that you should put only
> utf-8-encoded things into text files, only utf-8-encoded things into
> source
> code files (and actually you should write only put ASCII-encoded things
> except
> possibly in comments or docstrings!), and that you should line-wrap
> everything
> at 77 columns wide.
>
> It also specifies that text files should start with a "utf-8 BOM". (Brian
> questions the point of this, and my answer is that it adds information
> and
> doesn't hurt. Whether that information will ever be useful is an open
> question.)
>
> It also specifies that text files should have unix-style ('\n') end-of-
> line
> markers, not windows-style or old-macos-style.
>
> I generated this patch by writing and running the following script, and
> then
> reading the resulting diff to make sure it was correct. I then undid the
> changes that the script had done to the files inside the
> "setuptools-0.6c16dev4.egg" directory before committing the patch.
>
> {{{
> import os
>
> magic_header_line_comment_prefix = {
>     '.py': u"# ",
>     '.rst': u".. ",
>     }
>
> def format():
>     for dirpath, dirnames, filenames in os.walk('.'):
>         for filename in filenames:
>             ext = os.path.splitext(filename)[-1]
>             if ext in ('.py', '.rst'):
>                 fname = os.path.join(dirpath, filename)
>                 info = open(fname, 'rU')
>                 formattedlines = [ line.decode('utf-8') for line in info
> ]
>                 info.close()
>
>                 if len(formattedlines) == 0:
>                     return
>
>                 outfo = open(fname, 'w')
>                 outfo.write(u"\ufeff".encode('utf-8'))
>
>                 commentsign = magic_header_line_comment_prefix[ext]
>
>                 firstline = formattedlines.pop(0)
>                 while firstline.startswith(u"\ufeff"):
>                     firstline = firstline[len(u"\ufeff"):]
>                 if firstline.startswith(u"#!"):
>                     outfo.write(firstline.encode('utf-8'))
>                     outfo.write(commentsign.encode('utf-8'))
>                     outfo.write("-*- coding: utf-8-with-signature-unix;
> fill-column: 77 -*-\n".encode('utf-8'))
>                 else:
>                     outfo.write(commentsign.encode('utf-8'))
>                     outfo.write("-*- coding: utf-8-with-signature-unix;
> fill-column: 77 -*-\n".encode('utf-8'))
>                     if (commentsign in firstline) and ("-*-" in
> firstline) and ("coding:" in firstline):
>                         print "warning there was already a coding line %r
> in %r"  % (firstline, fname)
>                     else:
>                         outfo.write(firstline.encode('utf-8'))
>
>                 for l in formattedlines:
>                     if (commentsign in l) and ("-*-" in l) and ("coding:"
> in l):
>                         print "warning there was already a coding line %r
> in %r"  % (l, fname)
>                     else:
>                         outfo.write(l.encode('utf-8'))
>                 outfo.close()
>
> if __name__ == '__main__':
>     format()
> }}}

New description:

 This makes it so that emacs knows the intended character encoding, BOM,
 end-of-line markers, standard line-width, and tabs-vs-spaces policy for
 these files.

 This is also a form of documentation. It means that you should put only
 utf-8-encoded things into text files, only utf-8-encoded things into
 source code files (and actually you should write only put ASCII-encoded
 things except possibly in comments or docstrings!), and that you should
 line-wrap everything at 77 columns wide.

 It also specifies that text files should start with a "utf-8 BOM". (Brian
 questions the point of this, and my answer is that it adds information and
 doesn't hurt. Whether that information will ever be useful is an open
 question.)

 It also specifies that text files should have unix-style end-of-line
 markers (i.e. '\n'), not windows-style or old-macos-style.

 For Python source code files, it also specifies that you should not insert
 tab characters (so you should use spaces for Python block structure).

 I generated this patch by writing and running the following script, and
 then reading the resulting diff to make sure it was correct. I then undid
 the changes that the script had done to the files inside the
 "setuptools-0.6c16dev4.egg" directory before committing the patch.

 ------- begin appended script::
 {{{
 #!/usr/bin/env python
 # -*- coding: utf-8-with-signature-unix; fill-column: 77 -*-

 import os

 magic_header_line_comment_prefix = {
     '.py': u"# ",
     '.rst': u".. ",
     }

 def format():
     for dirpath, dirnames, filenames in os.walk('.'):
         for filename in filenames:
             ext = os.path.splitext(filename)[-1]
             if ext in ('.py', '.rst'):
                 fname = os.path.join(dirpath, filename)
                 info = open(fname, 'rU')
                 formattedlines = [ line.decode('utf-8') for line in info ]
                 info.close()

                 if len(formattedlines) == 0:
                     continue

                 outfo = open(fname, 'w')
                 outfo.write(u"\ufeff".encode('utf-8'))

                 commentsign = magic_header_line_comment_prefix[ext]

                 firstline = formattedlines.pop(0)
                 while firstline.startswith(u"\ufeff"):
                     firstline = firstline[len(u"\ufeff"):]
                 if firstline.startswith(u"#!"):
                     outfo.write(firstline.encode('utf-8'))
                     outfo.write(commentsign+"-*- coding: utf-8-with-
 signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                     if ext == '.py':
                         outfo.write(commentsign+"-*- indent-tabs-mode: nil
 -*-\n".encode('utf-8'))
                 else:
                     outfo.write(commentsign+"-*- coding: utf-8-with-
 signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                     if ext == '.py':
                         outfo.write(commentsign+"-*- indent-tabs-mode: nil
 -*-\n".encode('utf-8'))
                     if (firstline.strip().startswith(commentsign)) and
 ("-*-" in firstline) and ("coding:" in firstline):
                         print "warning there was already a coding line %r
 in %r"  % (firstline, fname)
                     else:
                         outfo.write(firstline.encode('utf-8'))

                 for l in formattedlines:
                     if (l.strip().startswith(commentsign)) and ("-*-" in
 l) and ("coding:" in l):
                         print "warning there was already a coding line %r
 in %r"  % (l, fname)
                     else:
                         outfo.write(l.encode('utf-8'))
                 outfo.close()

 if __name__ == '__main__':
     format()
 }}}

--

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2138#comment:4>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage