Table of Contents
Motivation
Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already.
Because I’ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I’ve started a little project to ease up the use of RegEx.
What are Regular Expressions aka RegEx?
RegEx are a sequence of characters which help you to search patterns in text.
Say you have an input string which contains whitespaces, tabs and line break:
input_string = " \tJoernBoegeholz \n"
You will certainly agree that it won’t be a good idea to use this string as e.g. a username. If a username is necessary to login into a system, a user will not remember if he accidentially typed a whitespace character in to form field.. So we have to replace the whitespaces, tabs and linebreak.
output = input_string.replace(" ", "")
output = output.replace("\t", "")
output = output.replace("\n", "")
This is a bit messy, with RegEx we can use the “\s” Metacharacter
output = re.sub("\s", "", input_string)
From the Python Doc:
“When the UNICODE
flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]
.”
Please take this just as an example, in production code You would use “strip()” to remove leading and trailing whitespaces.
OK, here is the catch: I cannot remember the meta-characters. That makes working with RegEx cumbersome for me.
First step
All meta-characters are represented as a constant.
ANY_CHAR = '.'
DIGIT = '\d'
NON_DIGIT = '\D'
WHITESPACE = '\s'
NON_WHITESPACE = '\S'
ALPHA = '[a-zA-Z]'
ALPHANUM = '\w'
NON_ALPHANUM = '\W'
Second Step
We wrap the multiplier in convenience methods.
def zero_or_more(string):
return string + '*' d
ef zero_or_once(string):
return string + '?'
def one_or_more(string):
return string + '+'
Third Step
As syntactic sugar we introduce a class which encapsulates the pattern:
class Pattern:
def __init__(self):
self.pattern = ''
def starts_with(self, start_str):
self.pattern += start_str
return self
def followed_by(self, next_string):
self.pattern += next_string
return self
def __str__(self):
return self.pattern
def __repr__(self):
return self._regex/code>
Result
Instead of writing
pattern = "\d\D+\s{2,4}"
you can now write
pattern = Pattern()
pattern.starts_with(DIGIT)\
.followed_by(one_or_more(NON_DIGIT))\
.followed_by(between(2, 4, WHITESPACE))
which is more human readable.
My first PyPI package
After using
pip install <module_name>
for a couple of years, I wanted to know how I can upload a new package to PyPI or the “Python Package Index”, so I’ve written another tutorial:
Distributing your own package on PyPi
At the moment it’s a pet project, but if you are interested You can use the code via
pip install easy_pattern.
An impressive share, I just now with all this onto a colleague who had been performing a little analysis for this. And that he in truth bought me breakfast since I discovered it for him.. smile. So ok, i’ll reword that: Thnx for the treat! But yeah Thnkx for spending some time to debate this, I’m strongly over it and enjoy reading regarding this topic. When possible, as you become expertise, might you mind updating your blog site with an increase of details? It truly is highly a good choice for me. Massive thumb up in this article!
Thank You very much! 🙂
I dugg some of you post as I thought they were very beneficial invaluable
tanks