SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • July 21, 2023
  • Rss Fetcher

And here’s how to use them correctly

A digital magnifier researching programming source code.
Image generated by Gencraft

Sometimes, we underestimate the importance of the order of the characters in the Regular Expressions pattern. Sometimes… Okay, let’s say that none of us have ever thought about this. Come on, let’s face it.

Example

During a code review on a Java project with the support of Fortify SCA, a Header Manipulation came out, one of the typical problems when you don’t sanitize the input data.

The code in question looked very similar to the following:

protected void error(HttpServletRequest request, HttpServletResponse response, Error error) {
try {
String errorMessage = error.getMessage();
log(errorMessage);

response.setContentType(request.getContentType());
response.getWriter().print(errorMessage);
} catch (Exception e) {
throw new ServletException(e);
}
}

The problem is that the ContentType is taken from a request and inserted into a response, without checking its content, which could be dangerous (I will talk about it in detail, maybe in a separate article).

The developer accepted this report and implemented “a particular filter using a RegEx because it is powerful and customizable.”

His solution was, therefore, to create the following method to sanitize that field:

public static String sanitizeContentType(String input) {
return input
.replaceAll("[^a-zA-Z0-9;=-\\/", "")
.replaceAll("\s{2,}", " ")
.replaceAll("\r", "")
.replaceAll("\n", "");
}

In detail:

  • [^a-zA-Z0–9;=-\/] intercepts all characters other than semicolons, equal, minus, slash, backslash, all numbers, and all letters a to z, both lowercase and uppercase.
  • s{2,} intercepts all sequences with more than one space.
  • r intercepts carriage return.
  • n intercepts the escape sequence for the new line.

Since I never trust much in general, and above all, I don’t understand why I’d need to rewrite something when there are several more efficient and advanced libraries that do this kind of thing, I decided to do a little test.

Test

As usual, I created a small program to do the tests:

public class RegExSanitizer {
public static void main(String[] args) {
if (args.length == 0) {
System.out.println("Usage is: java RegExSanitizer input");
System.exit(0);
}

String input2sanitize = args[0];
System.out.println("String to sanitize: " + input2sanitize);
System.out.println("Sanitized string: " + sanitize(input2sanitize));
}

public static String sanitize(String input) {
return input.replaceAll("[^a-zA-Z0-9;=-\\/]", "")
.replaceAll("\s{2,}", " ")
.replaceAll("\r", "")
.replaceAll("\n", "");
}
}

Being a function to sanitize the inputs, the first test passed a rather strange string, but not much for the truth.

C:RegExSanitizer> javac RegExSanitizer.java
C:RegExSanitizer> java RegExSanitizer Bob%%0d%00d%0aa<script>alert('document.domain')</script>
String to sanitize: Bob%%0d%00d%0aa<script>alert('document.domain')</script>
Sanitized string: Bob0d00d0aascript>alertdocumentdomain/script>

The first thing that immediately catches the eye is that the closed hook brackets have not been eliminated. And already we start badly.

I did some tests with trusty Regex101 starting from the regex created by the developer and studying the pattern. The nice thing about Regex101 is that every single sequence and its meaning are highlighted by passing the mouse over it. In addition, the EXPLANATION box on the right explains it in detail point by point.

And that’s exactly how I discovered this:

=- matches a single character in the range between =(index 61) and (index 92) (case sensitive)

That is, the sequence =- intercepts any character between index 61 and index 92.

Looking at the ASCII Table, between index 61 and index 92, there are several characters, including the right angle bracket, with index 62 (those who work with XSS probably already guessed, given the use of &#60; and &#62; in certain payloads, the HTML code of the angle brackets).

To fix all this, I changed the pattern sequence like so:

[^\\/a-zA-Z0-9;=-]

And they all lived happily ever after.

Conclusion

Regular Expressions remain fantastic things and a world to be discovered. Their usefulness is immense, and I would use them when I have to ask someone for the time.

The fact remains that there are many much more reliable libraries than us for sanitizing inputs, and there is probably no need to reinvent the wheel every time.

For heaven’s sake, nothing is perfect. Maybe by using one of these libraries, you will find an improperly sanitized input.

And there is applause because you won.


Why Sequence Matters in Regular Expressions was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

Previous Post
Next Post

Recent Posts

  • Banking on a serverless world
  • Cursor’s Anysphere nabs $9.9B valuation, soars past $500M ARR
  • Circle IPO soars, giving hope to more startups waiting to go public
  • Why are Elon Musk and Donald Trump fighting?
  • Europe will have to be more Tenacious to land its first rover on the Moon

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.