Starting Out with Learning REGEX and Atom

REGEX (Regular Expressions) had always been something that looked alien and seemingly impossible to learn with all the different rules. Like what the hell does this mean –

^\(*\d{3}\)*( |-)*\d{3}( |-)*\d{4}$

But necessity forced me to look at it in smaller chunks instead of trying to understand everything all at once and do some complicated REGEX statement. 

While working for Ascena Retail Group as their Senior Adobe Target Developer, I would sometimes be provided spreadsheets of data for various campaigns we were running.  One of these spreadsheets was a list of style ids that they wanted to target on. The problem was that for this to work the ids needed to be prefixed with a “/” which they did not contain in the spreadsheet. This list would contain 700 items or more, updated every week and be for three different brands (LOFT, Ann Taylor and Lou & Grey). I did NOT want to waste time adding a “/” to every id manually!

If I can save time by having the computer do something – I would much rather do that. That is where REGEX came in. I decided to simply copy the list of style ids into Atom and do a REGEX search and replace.

Here is the sample of style ids

536743
531066
545414
531050
545414

So what we want to do is simply do a search and replace on this list of items and prepend a “/” mark to the id so we end up with this — /535414.

Finding the Digits

So the first question to ask is how do we find a digit. The easiest way is with \d which will find a single digit. Another way is with [0-9], but \d is shorter and will work better for what we need.

Doing the Search/Replace and Grouping with Parenthesis

Now we can’t simply do \d in the search field and “/” in the replace field. If we did, then EVERY number would simply be replaced by a “/” and we would end up with this “//////”. What we want is the number to end up like /535414 as stated above.

That is where the parenthesis “()” come in. Parenthesis group the content and essentially save it in a variable in Atom which we then put into the replace field. The replace field is a dollar sign $ followed by the number of the position of the grouping. So for group one we would have $1, for group two it would be $2, etc. For this we will only have one group though.

So in the search field put (\d) and in the replace field /$1. What this will do is put a “/” in front of each digit. But WAIT! That isn’t what we want! If we do just that we will end up with /5/3/5/4/1/4 since the \d only matches ONE digit and the slash is being prepended to each matched item.

Searching for a Group of Digits

So what we need to do is treat the ENTIRE six digits in the ID as a SINGLE group. So how do we do that? Well we could do (\d\d\d\d\d\d) but that is sort of long and hard to read.

Instead what we could do is specify how many digits we want to make up the group. To do that we use curly braces and a number like this {6}. So if we do (\d{6}) that will search for exactly six digits as a single group.

So in the search field we put

(\d{6})

and in the replace field we put…

/$1

Additional Items with REGEX Curly Braces

There are some other options we can add to the curly braces too.

(\d{1,6}) – this will search for a group containing one to six digits; matches “1”, “12”, “123”…”123456″
(\d{1,}) – this will search for a group of one or more digits

There is also the question mark “?”, but that won’t work for our example. It searches for one or none – it may or may not exist. As an example say we had a text string “Item: 1”, “Items: 2” we could do something like this…

(Items?: \d)

That basically says that the “s” is optional. and so will find BOTH “Item: 1” and “Items: 2”

Other Options with + or *

Another thing we could do is replace the {6} with a + or *. What do these do? The plus sign + simply searches for one or more items, in our case digits. While the asterisk * on the other hand searches for zero or more items.

So we could have…

(\d+) - one or more; (Items?: \d+) would allow us to match and group "Items: 23"
(\d*) - zero or more

Conclusion

Now these are just very simple examples, but by starting small you get to build up your knowledge instead of getting confused and frustrated by trying to understand it all at once. You should now be able to make out SOME of what ^\(*\d{3}\)*( |-)*\d{3}( |-)*\d{4}$ does, although the parenthesis don’t serve the same purpose as grouping there.

In a future lesson I will show how to take multiple sets of numbers, format them into an object and remove the newline characters.

You can experiment by downloading the Atom editor, or you can also experiment with an online REGEX tester such as https://regex101.com/ or any number of online REGEX testers.

When I started at Ascena and was working with SiteSpect it would have come in VERY handy to know and understand REGEX. SiteSpect uses REGEX to target what web pages a campaign should run on.

Posted in New Developers, REGEX, Text Editors and tagged , .
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments