This how-to recipe describes how to escape HTML for output, encode URL parameters to form valid GET parameter input, and sanitize input. It’s not intended to be an exhaustive discussion of the many alternatives available, it is just the way I’ve chosen to do it after conducting that research myself.
In my example, I’m using PHP to create HTML for a dropdown menu where one of the selections contains the value:
The "Good", the 'Bad', & the Ugly
If it weren’t for the special characters mucking things up, it would look like this:
<select id="my-dropdown" name="url"> <option value="The "Good", the 'Bad', & the Ugly">The "Good", the 'Bad', & the Ugly</option> </select>
Of course, you can tell at a glance that this is a train wreck that will go horribly wrong. The question is “what do I do about it?”
Part 1: Creating escaped HTML output
Let’s say we’re retrieving a plain-text list of phrases to convert into dropdown options. In the Javascript that performs receives the data, we need to process these values so we don’t create broken HTML. In our sample phrase, the single- and double-quotes, plus the ampersand, will cause the <option> tag and its displayed output to be malformed. The way that we correct this in Javascript is to pass this text through the escapeHtml function below (which is borrowed from mustache.js):
var entityMap = { '&': '&', '<': '<', '>': '>', '"': '"', "'": ''', '/': '/', '`': '`', '=': '=' }; function escapeHtml (string) { return String(string).replace(/[&<>"'`=\/]/g, function (s) { return entityMap[s]; }); }
The resulting string is:
The "Good", the 'Bad', & the Ugly
This value is safe for the value attribute of the <option> tag, as well as the text between <option> and </option>.
Part 2: Creating percent-encoded URIs
Let’s say we want to take our selected value and pass it as a GET parameter in a URL. If our parameter is named phrase, by just jamming the phrase into the URL we would end up with something like this:
http://myserver.com/page.php?phrase=The "Good", the 'Bad', & the Ugly
Talk about the bad and the ugly. When using URI reserved characters like the quotes and ampersand, we need to percent-encode the values. But we can’t do it to the entire URI, we only want to encode the query parameter value. A little Javascript like this will handle it nicely:
url = `http://myserver.com/page.php?phrase=${ encodeURIComponent( dropdown.value ) }`;
If you’re not familiar with them already, the “ and ${} are part of the ECMAscript 6 2015 standard and are described under template literals. They allow us to embed arbitrary code inside the braces, which is where we percent-encode that string using Javascript’s built-in encodeURIComponent function. The resulting URL looks like this:
Part 3: Sanitizing input
PHP offers an entire suite of capabilities for sanitizing and validating user input under the topic “Data Filtering“. Validating user input via filter_var provides a boolean value indicating whether the input meets the datatype rules for valid content. Sanitizing that input means quietly removing characters that make the value unsuitable for its datatype. There are a number of predefined filters, and you can use a regular expression or callback to create your own.
Here’s an example of validating vs. sanitizing:
$good_ip = '192.168.1.1'; $bad_ip = '999.1.1.1'; $res_good = filter_var( $good_ip, FILTER_VALIDATE_IP ) ? 'true' : 'false'; $res_bad = filter_var( $bad_ip, FILTER_VALIDATE_IP ) ? 'true' : 'false'; echo "Is $good_ip valid: $res_good\n"; echo "Is $bad_ip valid: $res_bad\n"; $email = 'name(@somewhere.com)'; echo "$email becomes " . filter_var( $email, FILTER_SANITIZE_EMAIL ) . "\n";
The resulting output is:
Is 192.168.1.1 valid: true Is 999.1.1.1 valid: false name(@somewhere.com) becomes [email protected]