Using the regularexpression class


Introduction

The regularexpression class provides methods for making comparisons between text and regular expressions.

Regular expressions are complex, powerful, used in command line programs like grep, sed and find, and extensively in Perl.


Quick Matches

The regularexpression class provides a static match() method which is useful for a quick yes/no comparison.

#include <rudiments/regularexpression.h>
#include <rudiments/stdio.h>

int main(int argc, const char **argv) {

        const char      *string="void f(int a, bool b) { exit(0); }";
        const char      *pattern="(void|int|bool).*f\\(.*\\) { .* }";

        const char      *matches;
        if (regularexpression::match(string,pattern)) {
                matches="matches";
        } else {
                matches="doesn't match";
        }

        stdoutput.printf("%s\n  %s\n%s\n",string,matches,pattern);
}

Matching Over and Over

For patterns that need to be matched over and over, create an instance of the regularexpression class, compile() the pattern, and call match() to compare it to different strings. The optional study() method can also be used to improve performance with complex patterns.

#include <rudiments/regularexpression.h>
#include <rudiments/stdio.h>

int main(int argc, const char **argv) {

        const char      *pattern="(void|int|bool).*f\\(.*\\) { .* }";
        const char * const strings[]={
                "class t { public: int f(int a); void f(int a, bool b); };",
                "void f(int a, bool b) { exit(0); }",
                "int f(int a) { printf(\"hello\\n\"); }",
                "struct m { int a; int b; int c; };",
                NULL
        };

        regularexpression       re;
        re.compile(pattern);
        re.study();

        for (const char * const *s=strings; *s; s++) {
        
                const char      *matches;
                if (re.match(*s,pattern)) {
                        matches="does match";
                } else {
                        matches="doesn't match";
                }

                stdoutput.printf("%s\n  %s\n%s\n\n",*s,matches,pattern);
        }
}

Multiple Unique Matches

Some patterns return multiple unique matches. The methods getSubstringCount(), getSubstringStart()/getSubStringEnd(), and getSubstringStartOffset()/getSubstringEndOffset() can be used to get information about these matches.

#include <rudiments/regularexpression.h>
#include <rudiments/stdio.h>

int main(int argc, const char **argv) {

        const char      *string="hello there everyone";
        const char      *pattern="(\\w+) (\\w+)";

        regularexpression       re;
        re.compile(pattern);
        re.match(string);

        stdoutput.printf("\"%s\" matches \"%s\" %d times\n",
                                string,pattern,re.getSubstringCount());

        for (int32_t i=0; i<re.getSubstringCount(); i++) {

                stdoutput.printf("  match %d starts at offset %2d: ",
                                        i,re.getSubstringStartOffset(i));
                stdoutput.printf("\"%s\"\n",re.getSubstringStart(i));

                stdoutput.printf("  match %d ends at offset %2d  : ",
                                        i,re.getSubstringEndOffset(i));
                stdoutput.printf("\"%s\"\n",re.getSubstringEnd(i));
        }
}

Note that only unique matches are returned. If the exact same pattern of characters is found at multiple locations in the same string, only the first instance of that pattern is returned.