<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Regex Archives - Creatronix</title>
	<atom:link href="https://creatronix.de/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>https://creatronix.de/tag/regex/</link>
	<description>My adventures in code &#38; business</description>
	<lastBuildDate>Fri, 07 Oct 2022 05:36:42 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</title>
		<link>https://creatronix.de/regular-expressions-demystified/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Mon, 05 Jun 2017 12:36:31 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[meta character]]></category>
		<category><![CDATA[Regex]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=578</guid>

					<description><![CDATA[<p>Motivation Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already. Because I&#8217;ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I&#8217;ve started&#8230;</p>
<p>The post <a href="https://creatronix.de/regular-expressions-demystified/">Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Motivation</h2>
<p>Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already.</p>
<p>Because I&#8217;ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I&#8217;ve started a little project to ease up the use of RegEx.</p>
<p>&nbsp;</p>
<h2>What are Regular Expressions aka RegEx?</h2>
<p>RegEx are a sequence of characters which help you to search patterns in text.</p>
<p>Say you have an input string which contains whitespaces, tabs and line break:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>input_string = " \tJoernBoegeholz \n"</code></pre>
</div>
<p>You will certainly agree that it won&#8217;t be a good idea to use this string as e.g. a username.  If a username is necessary to login into a system, a user will not remember if he accidentially typed a whitespace character in to form field.. So we have to replace the whitespaces, tabs and linebreak.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>output = input_string.replace(" ", "") 
output = output.replace("\t", "") 
output = output.replace("\n", "")</code></pre>
</div>
<p>This is a bit messy, with RegEx we can use the &#8220;\s&#8221; Metacharacter</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>output = re.sub("\s", "", input_string)</code></pre>
</div>
<p>From the <a href="https://docs.python.org/2/library/re.html">Python Doc</a>:</p>
<p>&#8220;When the <a class="reference internal" title="re.UNICODE" href="https://docs.python.org/2/library/re.html#re.UNICODE"><code class="xref py py-const docutils literal"><span class="pre">UNICODE</span></code></a> flag is not specified, it matches any whitespace character, this is equivalent to the set <code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>.&#8221;</p>
<p>Please take this just as an example, in production code You would use &#8220;strip()&#8221; to remove leading and trailing whitespaces.</p>
<p>OK, here is the catch: I cannot remember the meta-characters. That makes working with RegEx cumbersome for me.</p>
<h2>First step</h2>
<p>All meta-characters are represented as a constant.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>ANY_CHAR = '.' 
DIGIT = '\d' 
NON_DIGIT = '\D' 
WHITESPACE = '\s' 
NON_WHITESPACE = '\S' 
ALPHA = '[a-zA-Z]' 
ALPHANUM = '\w' 
NON_ALPHANUM = '\W'</code></pre>
</div>
<h2>Second Step</h2>
<p>We wrap the multiplier in convenience methods.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>def zero_or_more(string): 
    return string + '*' d

ef zero_or_once(string): 
    return string + '?' 

def one_or_more(string): 
    return string + '+'</code></pre>
</div>
<h2>Third Step</h2>
<p>As syntactic sugar we introduce a class which encapsulates the pattern:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>class Pattern:

    def __init__(self):
        self.pattern = ''

    def starts_with(self, start_str):
        self.pattern += start_str
        return self

    def followed_by(self, next_string):
        self.pattern += next_string
        return self

    def __str__(self):
        return self.pattern

    def __repr__(self):
        return self._regex/code&gt;</code></pre>
</div>
<h2>Result</h2>
<p>Instead of writing</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>pattern = "\d\D+\s{2,4}"</code></pre>
</div>
<p>you can now write</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>pattern = Pattern()
pattern.starts_with(DIGIT)\
    .followed_by(one_or_more(NON_DIGIT))\
    .followed_by(between(2, 4, WHITESPACE))
</code></pre>
</div>
<p>which is more human readable.</p>
<h2>My first PyPI package</h2>
<p>After using</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install &lt;module_name&gt;</code></pre>
</div>
<p>for a couple of years, I wanted to know how I can upload a new package to PyPI or the &#8220;Python Package Index&#8221;, so I&#8217;ve written another tutorial:</p>
<p><a href="https://creatronix.de/distributing-your-own-package-on-pypi/">Distributing your own package on PyPi</a></p>
<p>At the moment it&#8217;s a pet project, but if you are interested You can use the code via</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install easy_pattern.</code></pre>
</div>
<h2>Links</h2>
<p><a href="https://pypi.org/project/easy_pattern/">PyPi</a></p>
<p><a href="https://github.com/jboegeholz/easypattern">Github</a></p>
<p>The post <a href="https://creatronix.de/regular-expressions-demystified/">Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
