What are JavaScript Regular Expressions?
A "regular expression" can almost be considered a language on its own (not turing complete of course). The purpose of a regular expression is to find characters within a string based on a certain pattern that you define.
This is a loaded topic and confusing one, but you WILL use regular expressions as a developer. Below is a 10,000 foot summary of regular expressions. If you want more detail, please read my detailed post on them. At this point in your journey, getting deep into regular expressions is probably not the priority. The important thing right now is to know what they are, what they do, and how to read them–not how to write them.
Here is the documentation for regular expressions.
The best example that we can use to explain why regular expressions (often abbreviated as "regex" or "regexp") matter is validation of form data.
Let's say that you have a user register form for your app, and over the last several weeks, you've been getting a lot of invalid email addresses registering for your app. You of course don't want this. You want valid emails.
To avoid this, you can validate the user's input with a regex prior to registering them. Here is how you might do this.
const emailValidatorRegex = new RegExp("^.+@.+..+$");
const userInput = "invalidemail@g";
const isValid = emailValidatorRegex.test(userInput);
console.log(isValid); // false
^.+@.+\..+$
is considered the regular expression, and all of those symbols represent something very specific. This is by no means the best regex to use for validating emails (it actually overlooks a lot of scenarios), but it is a good place for us to start.
Before we explain this pattern, I want to introduce the absolute basics of regular expressions.
No matter what language you're working in, regular expressions follow the same structure.
- Identifiers
- Quantifiers
Identifiers
These help you identify characters within a string. They can be anything from a single character to a more advanced expression.
For example, to identify a string that has the letter g
in it, you can do this:
const regex = new RegExp("g");
const string1 = "my favorite food is steak";
const string2 = "my favorite thing to do is code";
console.log(regex.test(string1)); // false
console.log(regex.test(string2)); // true
You could also check for an entire word.
const regex = new RegExp("favorite");
const string1 = "my favorite food is steak";
const string2 = "my favorite thing to do is code";
console.log(regex.test(string1)); // true
console.log(regex.test(string2)); // true
Regular expressions are case-sensitive, so the following expression won't match.
const regex = new RegExp("FavoritE");
const string1 = "my favorite food is steak";
const string2 = "my favorite thing to do is code";
console.log(regex.test(string1)); // false
console.log(regex.test(string2)); // false
Identifiers do not have to be letters, numbers, and words. There are "special" identifiers that can identify patterns. Here are a few common examples, but you can find a more exhaustive list in my detailed post on regular expressions.
[A-Z]
- Match all uppercase letters[a-z]
- Match all lowercase letters[0-9]
- Match all numbers[A-Za-z0-9]
- Match all letters and numbers.
- Match any character (wildcard)\d
- Match all numbers (another way to write[0-9]
)\s
- Match any white space character\w
- Match all letters and numbers (another way to write[A-Za-z0-9]
)^
- Indicates the start of a line$
- Indicates the end of a line(dog|cat)
- Matches "dog" OR "cat"
Let's use [A-Za-z]
as an example. This matches ALL letters (uppercase AND lowercase).
const regex = new RegExp("[A-Za-z]");
const string1 = "my favorite food is steak 239042038124";
const string2 = "my favorite thing to do is code 23094029340923";
console.log(regex.test(string1)); // true
console.log(regex.test(string2)); // true
Wait a second... If [A-Za-z]
matches only letters, then why are the expressions above returning true
? So far, we have been using the test()
method, which will check if your regular expression matches ANY PART of a string. But what part did it match?? To find out, you can use the exec()
method, which will return an array that tells you what was matched in your string.
const regex = new RegExp("[A-Za-z]");
const string1 = "my favorite food is steak 239042038124";
const string2 = "my favorite thing to do is code 23094029340923";
// Using the exec() method
console.log(regex.exec(string1)); // ["m", index: 0, input: "my favorite food is steak 239042038124", groups: undefined]
console.log(regex.exec(string2)); // ["m", index: 0, input: "my favorite thing to do is code 23094029340923", groups: undefined]
In the example above, the first element of the array is the substring that was matched. The second element tells you at what index of the string it was matched at. In this case, we matched the first letter of each string, which has a 0
index. The third element is the original string, and the fourth element shows the groups that were matched (but this is an advanced topic we will not be covering).
So... Why did we only match the first letter of each string? Doesn't [A-Za-z]
match ALL letters?
Queue quantifiers.
Quantifiers
Here are the quantifiers.
*
- Matches 0 or more of the preceding character+
- Matches 1 or more of the preceding character?
- Matches 0 or 1 of the preceding character{1}
- Matches exactly 1 of the preceding character{1,}
- Matches 1 or more of the preceding character (identical to +){2,6}
- Matches between 2 and 6 of the preceding character
And this is how we can fix our code from above to match ALL of the letters. By adding *
at the end, we are saying, "match 1 or more letters".
const regex = new RegExp("[A-Za-z]+");
const string1 = "my favorite food is steak 239042038124";
const string2 = "my favorite thing to do is code 23094029340923";
// Using the exec() method
console.log(regex.exec(string1)); // ["my", index: 0, input: "my favorite food is steak 239042038124", groups: undefined]
console.log(regex.exec(string2)); // ["my", index: 0, input: "my favorite thing to do is code 23094029340923", groups: undefined]
You'll notice that the first element of both arrays equals my
, which is still not what we are trying to match! The reason–we did not match the spaces between the words!
All you have to do is add a space in your character group (the brackets).
// WE CHANGED THIS LINE - see the space at the end??
const regex = new RegExp("[A-Za-z ]+");
const string1 = "my favorite food is steak 239042038124";
const string2 = "my favorite thing to do is code 23094029340923";
// Using the exec() method
console.log(regex.exec(string1)); // ["my favorite food is steak ", index: 0, input: "my favorite food is steak 239042038124", groups: undefined]
console.log(regex.exec(string2)); // ["my favorite thing to do is code ", index: 0, input: "my favorite thing to do is code 23094029340923", groups: undefined]
Now, our exec()
method returns all of the words.
And finally, if we wanted to match the entire string, we could of course just add 0-9
into our character group, but I'm going to do it in a slightly inefficient way to demonstrate something.
// WE CHANGED THIS LINE - see the space at the end??
const regex = new RegExp("[A-Za-z ]+[0-9]+");
const string1 = "my favorite food is steak 239042038124";
const string2 = "my favorite thing to do is code 23094029340923";
// Using the exec() method
console.log(regex.exec(string1)); // ["my favorite food is steak 239042038124", index: 0, input: "my favorite food is steak 239042038124", groups: undefined]
console.log(regex.exec(string2)); // ["my favorite thing to do is code 23094029340923", index: 0, input: "my favorite thing to do is code 23094029340923", groups: undefined]
In this code, we want to match any letter or space (identifier: [A-Za-z ]
) 1 or more times (quantifier: +
) and then match 1 or more numbers ([0-9]+
). If we reversed the strings, our expression would no longer work.
const regex = new RegExp("[A-Za-z ]+[0-9]+");
const string1 = "239042038124 my favorite food is steak";
const string2 = "23094029340923 my favorite thing to do is code";
// Using the exec() method
console.log(regex.exec(string1)); // null
console.log(regex.exec(string2)); // null
Our exec()
function returns null
because our regex pattern no longer matches the strings!
Another way to write a regular expression
So far, we have written them like this:
const regex = new RegExp("[A-Za-z ]+[0-9]+");
You can also write them like this:
const regex = /[A-Za-z ]+[0-9]+/;
From my experience, most developers tend to use the second version.
At this point, we have covered the bare basics of JavaScript regular expressions and for the sake of your sanity and my own, we will stop here. You can learn more about regular expressions in the future, but hopefully, this brief overview gets you to a place where you can recognize what they do and how to read them.