Table of Contents
Motivation
Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already.
Because I’ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I’ve started a little project to ease up the use of RegEx.
What are Regular Expressions aka RegEx?
RegEx are a sequence of characters which help you to search patterns in text.
Say you have an input string which contains whitespaces, tabs and line break:
input_string = " \tJoernBoegeholz \n"
You will certainly agree that it won’t be a good idea to use this string as e.g. a username. If a username is necessary to login into a system, a user will not remember if he accidentially typed a whitespace character in to form field.. So we have to replace the whitespaces, tabs and linebreak.
output = input_string.replace(" ", "")
output = output.replace("\t", "")
output = output.replace("\n", "")
This is a bit messy, with RegEx we can use the “\s” Metacharacter
output = re.sub("\s", "", input_string)
From the Python Doc:
“When the UNICODE
flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]
.”
Please take this just as an example, in production code You would use “strip()” to remove leading and trailing whitespaces.
OK, here is the catch: I cannot remember the meta-characters. That makes working with RegEx cumbersome for me.
First step
All meta-characters are represented as a constant.
ANY_CHAR = '.'
DIGIT = '\d'
NON_DIGIT = '\D'
WHITESPACE = '\s'
NON_WHITESPACE = '\S'
ALPHA = '[a-zA-Z]'
ALPHANUM = '\w'
NON_ALPHANUM = '\W'
Second Step
We wrap the multiplier in convenience methods.
def zero_or_more(string):
return string + '*' d
ef zero_or_once(string):
return string + '?'
def one_or_more(string):
return string + '+'
Third Step
As syntactic sugar we introduce a class which encapsulates the pattern:
class Pattern:
def __init__(self):
self.pattern = ''
def starts_with(self, start_str):
self.pattern += start_str
return self
def followed_by(self, next_string):
self.pattern += next_string
return self
def __str__(self):
return self.pattern
def __repr__(self):
return self._regex/code>
Result
Instead of writing
pattern = "\d\D+\s{2,4}"
you can now write
pattern = Pattern()
pattern.starts_with(DIGIT)\
.followed_by(one_or_more(NON_DIGIT))\
.followed_by(between(2, 4, WHITESPACE))
which is more human readable.
My first PyPI package
After using
pip install <module_name>
for a couple of years, I wanted to know how I can upload a new package to PyPI or the “Python Package Index”, so I’ve written another tutorial:
Distributing your own package on PyPi
At the moment it’s a pet project, but if you are interested You can use the code via
pip install easy_pattern.