Why not an Expression Query Language?
Regular Expressions are extremely powerful and ugly as all hell.[1] Even with comments and a good RegEx IDE like the Regulator , they're total gibberish. Why not a RegEx 2006 with a more readable syntax?
For instance, take a look this recent code snip from Eric Gunnerson's recent RegEx 101 article[2]:
\d{3} # three digits
- # literal '-'
\d{2} # two digits
- # literal '-'
\d{4} # four digits
$ # end of string
The comments are helpful, but why couldn't those comments be the regular expression? They exactly describe the pattern we're matching, so there's not real reason the parser couldn't compile those comments, or at least those comments be converted to the regex behind the scenes.
The Regulator has a
cool Regex Analyzer feature that does something similar;
here's what it does with "
Any digit
Exactly 3 times
-
Any digit
Exactly 2 times
-
Any digit
Exactly 4 times
$ (anchor to end of string)
This, again, shows exactly what we want to match, but in a more human readable form. There's no reason this couln't be the expression itself. Now, of course, it's easier to include a one line regex inline with your code, but I don't think that's worth the tradeoff. A more verbose Expression Query Language could be included inline, and would be much more readable. If needed, it could be a separate file - we've got piles of xml, xsd, config, resx, etc. files now, and a regex file or two that was actually readable would be much simpler than including cryptic strings in our code. Why don't we treat these things like small stored procedures?
I found
a thread on the Python newsgroups discussing an improved
RegEx syntax. One interesting idea is
RegEx Builder (RXB) - it lets you build RegEx's using verbose language:
digit + some(whitespace) + exactly('example') which would generate to
\d\s+example.
Wrappers, utility classes, and
copious comments are a step in the right direction, but magic strings like
"\w?<\s?\/?[^\s>]+(\s+[^"'=]+(=("[^"]*")|('[^\']*')|([^\s"'>]*))?)*\s*\/?>"
shouldn't be anywhere near professional development
languages circa 2005, especially when compilers are capable
of doing things like LINQ. We need an Expression Query
Language. How about Language Integrated Expressions
(LINE)?
[1] Yes, Jeff, that's an intentional
GoogleBomb.
[2] That's a simple RegEx for the point of
illustration. Read Jeff's post on
RegEx Abuse
if you don't see the problem. I've written my share of
complex regex's and I bet you have, too, if you've read this
far. Sure, we can write code in assembly language, but it's
not productive or maintainable.