Saturday, December 1, 2007

Why The h Can't Rails Escape HTML Automatically?

We all know about the dangers of cross-site scripting. The solution is simple: if you're not sure the data you're displaying is absolutely safe, run it through a filter that escapes HTML tag characters. Do this with data entered by users, data you get from other applications, etc. Be paranoid.

Rails has an html_escape method that does exactly this. Its alias is h, and you'll see it sprinkled throughout templates:
<%= h foo.bar %>
So where's the wart? The wart is that you, the developer, have to do this. Rails should escape HTML automatically, by default. That's what you want, 99% of the time. Instead, you have to remember the h. You have to put it in manually, again and again. Nothing DRY about that. And, if you forget even once, you've left a security hole in your application.

Yes, you can use plugins or alternative template engines to fix this. But many developers will use Rails out of the box, and some of them will forget an h or two. Django escapes HTML automatically now. Rails is nearing 2.0. Time to catch up.

17 comments:

  1. @anon: It's not quite that simple. It's more of a design flaw rather than a code flaw. The rails authors don't care that they're generating text, not HTML. This is a bad idea for a web framework, but they're certainly in good company. I seem to recall DHH being down on the idea, but I can't find a reference right now.

    It's really worth looking at what the Django people are doing with auto-escaping.

    History has shown programmers can't be trusted with critical stuff like this. How many web programmers would there be if you had to malloc() and free() in all the scripting languages?

    ReplyDelete
  2. This seems to be an entirely misguided solution. The way to deal with XSS is not to misprint the data in templates but to make sure the data is propertly escaped on INPUT (or at least is transformed from that data).

    I could get behind a version of this feature that instead of escaping the data for you and displaying it instead threw an error. However, this still doesn't address the fundamental issue of security. Outputting unescaped input is only one potential security vulnerability, unescaped input can also cause SQL injection vulnerabilities, or other problems when interacting with other components. The tools should encourage escaping the INPUT (in some fashion) not escaping the OUTPUT.

    IMO the best way to go is some extension of the tainting system. Trying to output a variable that is tainted would result in an error. Preferably the system would be extended so that there are different levels of taint. This would encourage the correct practice rather than potentially encouraging security holes as a result of assuming that the auto-escaping will take care of it.

    ReplyDelete
  3. @truepath: Nope. If you're generating HTML output, your tool should be helping you to generate correct HTML output. It's really that simple.

    ReplyDelete
  4. @truepath:

    Please no. This is a problem of output, not a problem of input. The problem is there only because of the format and grammar of your output, not because of the input.


    If you escape html on input you will have problem :
    - when you get the size of the strings
    - when you compare two string (one from input and one from internal calculation)
    - when you send this data in another format than html/xml


    PHP did the same error as you when dealing with SQL injection, they used to escape at input (magic_quotes_gpc) and many people ended with backslashes everywhere (because you do not always want to send your data in sql) and bugs (because your raw data had extra backslashes that shouldn't be there).

    ReplyDelete
  5. I don't think I was very clear. I certainly don't advocate the horrible php system. The short and sweet version of my objection is that unexpected unescaped html showing up in a template should throw an error not be automatically escaped.

    Encouraging programmers to rely on autoescaping is a bad idea because sometimes you use template tags to fill JS datastructures, sometimes later developers will come along and start intentionally outputting HTML from variables exposing the hole you didn't fix. When user supplied data gets to the output without having been deliberately cleansed by the programmer something has gone wrong and we should encourage fixing that not hiding it.

    Ideally we would have a system that went much farther. For instance often one wants to allow only some html tags from the user input (e.g. balanced formating tags) and this tool has completely abandoned you once you have gone there.

    The system I would like wouldn't do anything dumb like php's choice to always escape the input. Rather it would tag input data as tainted (as well as anything deriving from it) and the output templates would throw an error if such data was ever output into the document. This now could be extended to a much more general system that allows a given piece of data to be declared safe for a certain sort of context (html, js, SQL, whatever extension the user wants) (doing this well might require the template system to realize what kind of context it is in when it evaluates any kind of variable).

    In this sort of system even if you are doing something like posting comments where you WANT the user supplied data to have some tags in it you would still throw an error if the data ever made it to the output without any safety checks.

    ReplyDelete
  6. You miss a point: Rails has many commands that generate HTML.

    'link_to' for example. If i had to switch of autoescaping every time i used link_to...

    And another one: < %= % > is ERB-Syntax. ERB is not only used with HTML, so i guess such features would be unintersting for some that use it.

    ReplyDelete
  7. I've never read a critique so blatantly made by someone who had no idea what they're talking about. Your idea would invalidate the systems of helpers, partials, or any other dynamically generated HTML. Simply ridiculous.

    ReplyDelete
  8. this is a design flaw in django. think of it this way django wastes cpu cycles escaping every field every time it's rendered html so that I can waste some more to unescape it. the default behavior of any I/O system should be output = input. django breaks that because it assumes that programmers aren't programmers but little children.

    ReplyDelete
  9. @truepath

    "Those who don't understand Perl are doomed to reinvent it, poorly."

    ;p

    ReplyDelete
  10. FWIW, there was a discussion thread about this on the rails-core mailing list back in 2006, with DHH and other core contributors weighing in.

    You can read the thread and draw your own conclusions: http://lists.rubyonrails.org/pipermail/rails-core/2006-February/000731.html

    A plugin came out of the whole deal but looks like abandonware now: http://rubyforge.org/projects/autoescape/

    ReplyDelete
  11. @sam

    Yes, I was motivated by the taint mode in some other language (perhaps perl). Maybe I should have cited my inspiration but I wasn't confident enough of my interpretation of how that language worked to be sure I wasn't just assuming it worked the way I thought it should.

    ReplyDelete
  12. Only 2 problems with rails? Wow. I'm glad that they solved everything after December 2, 2007 12:25 AM.

    ReplyDelete
  13. Rails 3 will autoescape html. :)

    ReplyDelete
  14. ruby script/console
    Loading development environment (Rails 2.1.2)
    >> h "A&B"
    => "A&B"
    >> h "A&B"
    => "A&amp;B"
    >>

    ReplyDelete
  15. Well, that didn't quite show up properly. The point I was trying to make is that you need to make sure to escape once and only once, because double escaping doesn't work. So you have to let the programmer do it. Some help doing it might be OK, though.

    ReplyDelete
  16. Rails 3 does escaping by deafult now :)

    ReplyDelete

Note: Only a member of this blog may post a comment.