Remove JavaScript from HTML
with PHP (Prevent XSS)
If a piece of HTML code is taken from a user as an input, it is very important to sanitize and filter out any malicious JavaScript code to prevent XSS attack. This function is specially significant if you are using a rich text editor (WYSIWYG Editor etc.) on your client side, and you want to make sure no JavaScript code is attached to the input from your rich text editor.
This PHP function will remove any JavaScript (inline on the HTML element or script tag) from the string of HTML code. It is capable of removing any event handlers present on any HTML element tag. With the help of regex, it removes any piece of JavaScript code in your string of HTML.
How to use
- The function sanitizeInput($inputP) takes only one parameter.
- The value of the parameter can be a string or an array of strings.
- If the value of the parameter is a string, the return value is a sanitized string.
- If the value of the parameter is an array of strings, the return value is an array of sanitized strings.
function sanitizeInput($inputP) { $spaceDelimiter = "#BLANKSPACE#"; $newLineDelimiter = "#NEWLNE#"; $inputArray = []; $minifiedSanitized = ''; $unMinifiedSanitized = ''; $sanitizedInput = []; $returnData = []; $returnType = "string"; if($inputP === null) return null; if($inputP === false) return false; if(is_array($inputP) && sizeof($inputP) <= 0) return []; if(is_array($inputP)) { $inputArray = $inputP; $returnType = "array"; } else { $inputArray[] = $inputP; $returnType = "string"; } foreach($inputArray as $input) { $minified = str_replace(" ",$spaceDelimiter,$input); $minified = str_replace("\n",$newLineDelimiter,$minified); //removing <script> tags $minifiedSanitized = preg_replace("/[<][^<]*script.*[>].*[<].*[\/].*script*[>]/i","",$minified); $unMinifiedSanitized = str_replace($spaceDelimiter," ",$minifiedSanitized); $unMinifiedSanitized = str_replace($newLineDelimiter,"\n",$unMinifiedSanitized); //removing inline js events $unMinifiedSanitized = preg_replace("/([ ]on[a-zA-Z0-9_-]{1,}=\".*\")|([ ]on[a-zA-Z0-9_-]{1,}='.*')|([ ]on[a-zA-Z0-9_-]{1,}=.*[.].*)/","",$unMinifiedSanitized); //removing inline js $unMinifiedSanitized = preg_replace("/([ ]href.*=\".*javascript:.*\")|([ ]href.*='.*javascript:.*')|([ ]href.*=.*javascript:.*)/i","",$unMinifiedSanitized); $sanitizedInput[] = $unMinifiedSanitized; } if($returnType == "string" && sizeof($sanitizedInput) > 0) { $returnData = $sanitizedInput[0]; } else { $returnData = $sanitizedInput; } return $returnData; }
Following are the few examples of outputs after sanitization:
1. Input = <p> <script>alert("i am trying to hack you")</script> </p> Output = <p> </p> 2. Input = <img src="cat.jpg" onerror="document.querySelector('body').innerHTML=''"> Output = <img src="cat.jpg" > 3. Input = <a href='javascript:alert(1)'>Test</a> Output = <a>Test</a>