Genius Open Source Libraries

Some time ago, Genius Engineering decided to unify the manner in which we encode values that contain user input. We previously depended upon the PHP built-in htmlentities() and some simple wrappers around it for our encoding needs, but this function alone can’t safely sanitize tainted data in all contexts. Furthermore, we didn’t have a unified vision of whether encoding should happen immediately upon receipt of data from the user or when we display that data to the user. The ambiguity of our security arrangement, and the lack of encoding functions appropriate for all contexts led the engineering team to look for better options in PHP security for the prevention of cross-site scripting (XSS) and SQL injection vulnerabilities. While there is plenty of information about these issues and what must be done to fix them, there is a distinct dearth of libraries in PHP to properly encode strings for all of the situations.

When the right tool for the job doesn’t exist, you build it. We came up with a set of functions to sanitize tainted data in any of the places that it is output to the user. The functions are very straightforward: give them a string and you get back one that is fully escaped. Output from the gosSanitizer functions can be safely used as a double-quoted string in an HTML attribute or JavaScript context, or as a single-quoted string in an SQL context.

// Output an unsafe string, presumably user input
$xss = '<script>alert(\'oh snap\');</script>';
echo 'If your entered your name as ' . $xss . ', we\'d be in trouble.<br />' . "\n";
// Sanitize that string, and output it safely
$htmlContentContext = gosSanitizer::sanitizeForHTMLContent($xss);
echo "But if we sanitize your name, " . $htmlContentContext . ", then all is well.<br />\n";
echo '<h2>HTML Attribute</h2>';
// We can also safely sanitize it for an HTML attribute context
$htmlAttributeContext = gosSanitizer::sanitizeForHTMLAttribute($xss);
echo 'Tainted strings can also be used in an
    <a href="" title="' . $htmlAttributeContext . '">HTML attribute</a>
    context.<br />' . "\n";
echo '<h2>JavaScript string</h2>';
// And we can even make strings used in JavaScript safe
$jsString = '\';alert(1);var b =\'';
echo '<script type="text/javascript">
var a = \'' . $jsString . '\';
var aSafe = \'' . gosSanitizer::sanitizeForJS($jsString) . '\';

We have created a project on Launchpad to host the Genius text sanitizing libraries. The project consists of three modules: Core and Utility which provide general purpose support functions, and Sanitizer, which holds the functions used above. In the case of Sanitizer, all of the functions are static, and can be accessed through the gosSanitizer class. To use the Genius Sanitizer, you’ll need all three modules: Core, Utility, and Sanitizer itself. All of the Genius modules are loaded using the autoloader defined in Core/, so including this file is all that is needed to use any of the Genius Open Source libraries.

// Include the Genius config file
require_once 'Core/';
// Use gos* classess & functions here

We plan to continue adding modules to the Genius Open Source libraries collection in the future. Keep an eye on this blog for announcements!

Edited 2010-08-30 to reflect prefix change from “sg” to “gos”

  • Digg
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • DZone
  • HackerNews
  • LinkedIn
  • Reddit
  • florin

    When I see how code is mixed up with html elements I feel like pulling my hair. What a retard technology.

    • Drew Stephens

      Indeed, code should be divorced from markup whenever possible, but that makes examples difficult to follow.

  • asdf

    why do so many people do this:

    echo “But if we sanitize your name, ” . $htmlContentContext . “, then all is well.\n”;

    instead of this:

    echo “But if we sanitize your name, $htmlContentContext, then all is well.\n”;

    • Drew Stephens

      Having variable completely separate from the strings they are going into generally makes the action more obvious than interpolation.

      • Stevan Goode

        That’s why we have the {} operators for surrounding variables within a string. This both makes the variables easier to spot as well as containing them separately from the HTML code. It also highlights them as variables in good IDEs.

        Example: echo “But if we sanitize your name, {$htmlContentContext}, then all is well.\n”;

        It also means that we can call object methods within the code:

        Example: echo “Hello, {$user->getName()}\n”;

        Of course if you are echoing the output from a function, you still need concatenation, or to assign it to a variable before hand.

  • stef

    asdf: The reason I use the quotes is personal. The syntax highlighting kick in with quotes and variables are visible in the quoted text.

  • Dave

    Drew – this looks like a great library. Thanks for sharing!

  • nikunj shah

    hi first of all thanks for sharing this code with us that will help developer ……………………….

  • Pingback: Librería PHP pare evitar SQL injection y XSS | Sentido Web

  • Pingback: Neue Klasse zum Bereinigen von HTML-Code

  • Pingback: Librería en PHP que te permitirá evitar SQL injection y XSS | Maya Digital

  • Pingback: Bibliotecas para evitar problemas en PHP « Mbpfernand0's Blog