A Quick Guide to RegEx in Ruby

Gabriel Demes
4 min readFeb 15, 2021

Whether you’re just starting out your journey to programming or you’re like the hundreds of seasoned developers out there, you have definitely encountered this scenario:

You are vigorously typing away at endless lines of code. You arrive at a situation requiring you to do some sort of string manipulation, but you aren’t too sure on how to approach the problem. As the savvy developer you are, you begin a google search: “How to remove all the vowels and blank spaces from a string?” Behold, you are brought to the mighty Stack Overflow but, the top answer is something like this:

string = "This is my string"
string = string.gsub(/[aeiou]|\s/i, '')

In reality you were really looking for something like this:

string = "This is my string"
array = ["a", "e", "i", "o", "u", " "]
new_string = ""
string.split("").each do |char|
if array.include?(char)
new_string += ""
else
new_string += char
end
end

So, what gives? I thought Ruby was supposed to be such a high level language that resembles spoken language. What’s with the weird combination of characters and numbers that everyone on Stack Overflow is giving as answers?

The answer? Two words: Regular Expressions!

Want to read this story later? Save it in Journal.

What are Regular Expressions?

Regular Expressions, Regex or Regexp for short, are a sequence of characters used to search for patterns within strings. Because Regex is not unique to Ruby, once you learn how to navigate the different symbols and characters used you can quickly transfer those skills to other languages as a tool for string manipulations, validations, and parsings.

You might be wondering how you can spot Regex notation in comparison to just plain old strings and characters. A Regex pattern in most languages is noted by a set of characters between two forward slashes and occasionally some characters at the end known as flags.

string.gsub(/[aeiou]|\s/i, '')  #Example of a Regex pattern

Character Classes and Ranges

Character classes and ranges, noted by a group of characters within square brackets[], allow users to find characters in a string that match the character class or fall within a specific range. A ‘^’ at the start of the character class/range, allows the user to find characters that are not a part of that character class or range.

Character Class
[aeiou] - will match any vowel
[^aeiou] - will match anything that is not a vowel
Ranges
[A-Z] - will match any uppercase letter
[^A-Z] - will match any non-uppercase letter
[a-z] - will match any lowercase letters
[^a-z] - will match any non-lowercase letters
[0-9] - will match any number from 0 to 9
[^0-9] - will match any non-number
Shorthands
\w - [0-9a-zA-z] will match any alphanumeric character
\d - [0-9] will match any number from 0 to 9
\s - will match any white spaces (space, tab, newline)
. - will match anything that isn't a new line
Negated Shorthands
\W - will match any non-alphanumeric characters
\D - will match any non-numeric characters
\S - will match any character that is not a space

Modifiers or Quantifiers

The addition of modifiers, sometimes called quantifiers, allows you to match multiple characters at a time. Using a Character Class or Range without a modifier will result with only one character to be matched.

*       - 0 or more characters matched
+ - 1 or more characters matched
? - 0 or 1 characters matched
{4} - Exactly 4 times
{4,} - 4 or more times
{,4}. - 4 or less times
{4,8} - between 4 and 8 characters matched

Anchors

Regular Expressions will often make use of anchors which are characters that tell us something typically about the position within the string (or line) that you are trying to match.

String Anchors
\A - will match at the beginning of a string
\Z - will match at the end of a string
\G - will match at the first matching position
Line Anchors
^ - will match at the begging of a line
$ - will match at the end of aline

Flags or Options

Regular Expressions sometimes have flags or options at the end of the expression right after the closing ‘/’. These flags will usually affect how Ruby goes about its matching. It is also important to note that Ruby has less flags than other languages since methods such as .gsub tends to cover the missing flags.

i      - will ignore cases when matching
m - will treat newline as a character matched by .
x - will ignore white spaces and comments in pattern
Example:
"This IS my StRiNg".match?(/[A-Z]/i) #case insensitive

Useful tools and Resources

The above examples are meant to be used as a quick guide or refresher to Regex in Ruby. Regex can be very robust, and there are a lot of quirks and tricks to keep learn. Use the links below for some more useful resources to help you practice and hone in on your Regex skills.

📝 Save this story in Journal.

--

--

Gabriel Demes

Software Engineer creating the blog posts I wish I had when learning to code.