The complexity of the web

The webpages below let you explore the complexities of encoding (aka sanitising) and validating data to prevent XSS, the most notorious way for things to go off the rails in modern web-applications.

The central language here is HTML, but this language involves a range of other languages and technologies, including

The web browser is the obvious place where these technologies are used, but smartphone apps use them too: often an app is effectively just a dedicated browser for an individual web site. Increasingly applications on laptops or desktops also use these technologies: so-called Electron apps than can be run on Windows, Linux or macOS are built using HTML and JavaScript.

All the webpages below are totally self-contained: the required JavaScript code is inlined inside the HTML or uses standard APIs provided by any browser, notably the DOM API. Note that the use of inlined JavaScript is considered bad practice as it makes for a poorly maintainable messy mixture of JavaScript code and HTML content. Still, for these demo webpages it has advantages: you can easily make a copy of this webpage on your local file system, tinker around with it, and then view - and execute - the result in a browser to see the effect.

Mimicking a web server inside a webpage with JavaScript

The webpages listed below contain JavaScript code that mimics the behaviour of a web server. More specifically, the pages micic a web template engine that generates a webpage using a web template. The pages contain form fields to supply parameters to these templates and buttons to generate a webpage by inserting them. The resulting HTML is inserted into the webpage using the DOM API so that it is rendered inside the browser.

Two techniques that we use here are interactively changing a webpage and JavaScript template strings:

  1. Changing a webpage using innerHTML and document.write()

    The webpages linked below use two different ways to insert HTML content into a webpage:
    1. an assignment to the property innerHTML of an 'element' of a webpage, and
    2. the method document.write() to change the entire webpage.
    The browser handles execution of scripts differently for these two techniques: scripts in HTML content inserted via an assignment to innerHTML are normally not executed, whereas scripts inserted using document.write() are. (For more explanation, see discussion in the HTML spec. Note the use of the word `usually' in this specification.). Still, there are tricks to get scripts executed via an assignment to innerHTML, namely if we avoid the use of explicit <script> tags, as in <img src='http://some.Link/ThatDoesNotExist.jpg' onerror='alert(1)'> For more explanation, see the discussion of innerHTML in Mozilla's docs of the DOM API.

    The second technique, using document.write(), is more realistic as it demonstrates the effect of what would happen if a real web server would send the HTML payload to the browser (as then scripts in the HTML are executed). Still, the first technique, using innerHTML, is useful as it allows you to inspect the HTML that is rendered without execution of scripts.

    WARNING: Both techniques are terribly insecure and error-prone ways to interactive modify a webpage. Normally you would steer clear of using them, but for the demos here they are useful.

  2. JavaScript template strings

    In the JavaScript code we make extensive use of JavaScript template strings. These strings, written between back-quotes, can contain both single and double quotes without any escaping. More importantly, string literals support an infix substitution operation, called string interpolation, by including expressions between special brackets ${ }. String literals can span multiple lines, i.e. they can contain newline characters without these having to be escaped, which is useful to keep things readable.

The demo pages

  1. A simple demo just to explain the idea

  2. A demo blog page, without any encodings of parameters
    Injecting scripts or causing other problems in this page is easy.

  3. A more defensive variant of the demo page, with some encoding of parameters
    Injecting scripts in this page is (a bit) harder.