Home

Writing

Family

China Stuff

China Photos

Mongolia

Programming
.    IC
.    Regexps
.    catchall.php
.    Elisp
.    Tetrii
.    sudoku
.    Mail aliases
.    AQI history
.    Xipiqi
.    Subway
.    SchoolSite
.    Host change
.    freehostingweb

Harry Potter

Contrarian Ultimate

Hip

Biking

Links

Site Map

Regular Expression searcher

When at Catharon, I wrote a regular expression engine for their environment. It worked, basically, but had a couple problems: first, I changed the standard regexp syntax too much in the name of "improvement"; second, the program was not well written, it had no overall design, and was very ad hoc, and so fragile. Getting it to work was a hassle, and took a long time, as each complicated new RE I tried it with seemed not to work, and to require adjustment.

First, my "improved" RE syntax was really no better, and because it is so far from standard it was too hard to use. And second, for the last few years I have regretted that program, and figured if I ever had time and inspiration I would redo it properly. Now I have.

The main difference now is that the RE is first parsed to make a RE tree, and then this tree is stepped along to build the execution stack, which is then used to generate the solution structure. This has a few advantages: first, it breaks the code up into several independent sections. It is even possible to replace the RE parser with a different one, so if I really want to "improve" it I can support both the traditional syntax and my new one.

Second, the code is far, far simpler and more robust. Once I got it working, it basically works for everything, so I have little fear that each new RE will break it.

Third, it supports a lot more - unicode characters, greedy and lazy evaluation, alternative parsers, to name a few.

And, it is (I think) very well-written and elegant code. The tree and stack walking routines are small and surprisingly simple, and the greedy and lazy search algorithms require almost no coding to work.

With that advertisement, you can now look and judge for yourself. Here are the 4 files: re1.c, re-parse.c, re-debug.c, re.h, or here is the zip'ed archive.

Call or Send Us Mail!
motley


or send email to
image of email address

(sorry, the email addresses
that used to be here have
been disabled due to spam
attacks.)