#1143 closed defect (fixed)

Double Encoding in HTML in File Names in WUI

Reported by: chrisp Owned by: davidsarah
Priority: major Milestone: 1.10.0
Component: code-frontend-web Version: 1.7.1
Keywords: easy wui html Cc:
Launchpad Bug:

Description

My file "zumby-bumby ; mail blaggy@… < /etc/hosts" in the pubgrid root http://pubgrid.tahoe-lafs.org/uri/URI%3ADIR2%3Actmtx2awdo4xt77x5xxaz6nyxm%3An5t546ddvd6xlv4v6se6sjympbdbvo7orwizuzl42urm73sxazqa/ is listed as "zumby-bumby ; mail blaggy@… &lt; /etc/hosts" in the listing.

That is, the < got converted to &lt; and then that ampersand got converted to &amp;. Thus, we end up with &amp;lt;.

HTML entity-encoding is good because it can stop XSS, but be careful: it increases the size of memory you have to allocate to handle the request. Also, double-encoding is just plain incorrect. Single-encode, and place limits on how much memory you will allocate to do the encoding. One way to do this is to include input size limits as part of your input validation framework.

Change History (9)

comment:1 Changed at 2010-08-04T23:29:22Z by warner

which tools did you use to add and list this file? CLI or WUI?

comment:2 Changed at 2011-08-24T02:58:12Z by davidsarah

  • Component changed from unknown to code-frontend-web
  • Keywords easy wui html added
  • Milestone changed from undecided to 1.10.0
  • Owner changed from nobody to davidsarah
  • Status changed from new to assigned

I've just spotted the likely cause of this bug: at several places in DirectoryAsHTML.render_row, we use T.a(href=...)[html.escape(name)]). This is wrong because nevow already escapes the argument to T.a (if it is a string).

I think it only affects the WUI.

comment:3 Changed at 2012-04-01T03:44:41Z by davidsarah

  • Milestone changed from 1.11.0 to 1.10.0

comment:4 Changed at 2012-10-15T23:31:24Z by zooko

I just used freedomsponsors.org to offer USD 25.00 to whoever fixes this issue: http://www.freedomsponsors.org/core/offer/24/double-encoding-in-html-in-file-names-in-wui?alert=SPONSOR&c=s

comment:5 Changed at 2012-10-16T12:43:31Z by mk.fg

David-Sarah was right, as strings passed as stan in Nevow are automatically escaped, unless raw() marker is used.

Nevow uses it's own escapeToXML method to do that though, which leaves single/double quotes intact (unless string is used as an attribute), so it doesn't match twisted.web.html.escape() 1-to-1, but I think it should be okay, as it doesn't affect rendering.

Fixed now in 1143_double_encoding_html_filenames branch (non-official repo), github pull request 16.

comment:6 follow-ups: Changed at 2012-10-21T16:27:37Z by zooko

Since Nevow's escapeToXML method leaves single/double quotes intact, could that be used to malicious craft input which would confuse the HTML parser by having embedded quote characters?

comment:7 in reply to: ↑ 6 Changed at 2012-10-21T18:15:19Z by mk.fg

Replying to zooko:

Since Nevow's escapeToXML method leaves single/double quotes intact, could that be used to malicious craft input which would confuse the HTML parser by having embedded quote characters?

I don't really see how and don't think I've heard of such things happening, maybe example of what you mean would be helpful?

I can imagine it happening only if malicious person can insert markup somewhere else, i.e. something like this:

<p>filename with <span randomattr="</p>
<p>filename_that_should_be_hidden</p>
<p>">visible_filename_ending</p>

But then again, I think if any tags can be inserted, it'll be something like <script> and the game is over, no amount of quote escaping should make any difference.

Interesting thing to note that the source for this very page contains unescaped quotes in user-submitted content.

comment:8 Changed at 2012-10-25T01:21:50Z by davidsarah

  • Resolution set to fixed
  • Status changed from assigned to closed

Fixed by 1df7f114b7094dab.

comment:9 in reply to: ↑ 6 Changed at 2012-10-25T01:34:02Z by davidsarah

Replying to zooko:

Since Nevow's escapeToXML method leaves single/double quotes intact, could that be used to malicious craft input which would confuse the HTML parser by having embedded quote characters?

No, because the input does not occur in an attribute or other quoted context.

Note: See TracTickets for help on using tickets.