Development/Regex
From the makers of InspIRCd.
| Development Material - Information posted here is for developer reference only. This material is subject to possible change and will be technical in nature. |
Contents |
Proposal for Regex Provider Modules in InspIRCd
Introduction
The following document outlines a solution for implementing central regex provider modules for InspIRCd.
Regex?
Regexes (or regular expressions) describe a text pattern, providing rules for matching text. On that definition, one can think of the wildcard matching present in the core ircd (for things like glines, channel mode +b, etc) to be a form of regular expression, but usually this term is heard referring to the more powerful form that can, at its worst, appear as incomprehensible line noise, but can provide fine-grained control over what text (or users) are to be blocked/changed and which are not.
Regex Engines
Regexes aren't magic from the IRCd. An external library is often needed to utilize them. There are at least 3 known implementations that could be used by InspIRCd:
- PCRE (Perl-Compatible Regular Expressions) - this is the current engine used by m_filter_pcre and m_rline.
- POSIX Regexes - present on POSIX-conforming systems (it is part of GNU libc, for example), as it is part of the POSIX.1-2001 standard.
- TRE - the regex engine used by UnrealIRCd - users migrating from there may be more familiar with the regex syntax and limitations of this engine.
The Problem
Currently, there are two modules in the InspIRCd distribution as of 1.2 that use regular expressions: m_filter_pcre.so and m_rline.so. Both of these use PCRE (perl-compatible regexes). However, it's easy to see the problems currently:
- Both modules in the distribution use PCRE independantly. If the PCRE API ever changed for some reason, or if we wanted to use a new feature of PCRE, both of these modules and any possible future modules would have to be updated. Code duplication is rarely a good thing.
- Both modules are hard-set on PCRE. If a different regex engine was desired for some reason (for example, Unreal users migrating to insp are likely to be more familiar with the syntax, quirks, and limitations of TRE), both modules would have to be heavily modified to use a different API (and m_filter_pcre becomes a misnomer).
Architecure
The solution is to create a standardized regex provider API to InspIRCd modules, thereby allowing the regex engine to be changed merely by changing the name of a module in the config file (eg, from m_pcre.so to m_tre.so). Modules using the regex engine would require no changing (although regexes passed to them from the config would in the most extreme cases).
The Regex object is what will represent a regular expression to InspIRCd. It is an abstract object, so provider modules must derive this class with specific instructions (ie, calling out to the regex engine).
The RegexFactoryRequest is a request to get a Regex object from the provider. It is sent with the string to be made into a regex, and the return value is the regex object.
The basic process for a module wishing to use a regular expression goes like this:
- The provider module should be loaded first (ie first in order in the configuration), but ideally modules using regexes can use OnLoadModule to see the provider loading later (and should simply do nothing if no provider has been loaded). This is the time for the provider module to initialize the regex engine if necessary.
- Declare use of the interface RegularExpression (prevents the regex module from being unloaded while it's being used - alternatively watch for it unloading in OnUnloadModule), then ask for the module providing the feature RegularExpression. (Alternative: use only an interface, allowing more than 1 regex provider.) The best time to do this is once on loading, and anytime the provider is reloaded (note that any Regex objects must be recreated if that happens).
- With the pointer to the provider module, send a RegexFactoryRequest, passing the regex string to be used. This is where the provider creates some derived class of Regex which handles the API calls, and is the time to allocate the regex engine resources for the specific regex, and also to otherwise compile the regex. It will receive a Regex object (this will require a cast) if the creation was successful. An unsuccessful creation should ideally throw an exception, but if this is not possible, we can just return NULL (and maybe set errno to something useful).
- Once a Regex object is received, text can be matched against it by calling the Match method, passing the text to be checked. Match will return true if the string matches, and false if it doesn't. The module can call Match on the same Regex object as often as it likes. The provider is expected to call the appropriate APIs to match the text, and also ensure the regex can be reused later, as often as requested.
- When the module is done with a given Regex object, it can simply delete it. This is the time for the provider module to free up regex engine resources for that regex.
- At the end of it all, when the ircd is shut down or the provider and all modules using it are unloaded, this is the time for the provider to shut down the regex engine if necessary.
Here's an exmaple sequence, using m_pcre as a sample provider, and m_rline as a sample module using regex:
InspIRCd -> Load Module m_pcre
m_pcre -> Initialize libpcre if necessary. Provide Interface RegularExpression and Feature RegularExpression.
InspIRCd -> Load Module m_rline
m_rline -> Use Interface RegularExpression. Find Feature RegularExpression (m_pcre).
Operator -> Issue Command: /rline ^lamebot! 0 Lame Bots
m_rline -> Send RegexFactoryRequest to m_pcre, request Regex object for ^lamebot!
m_pcre -> Create an instance of PCRERegex (derived from Regex). In the constructor, call pcre_compile for ^lamebot!.
m_pcre -> Return the new regex object from the request (this will require a cast).
m_rline -> Receive the regex object from m_pcre (this will require a cast), store in RLine object.
m_rline -> Check new RLine against local users. For each user, call ->Match("nick!ident@host name") on the Regex object.
m_pcre -> Call pcre_exec on the regex, with the given string. Return true if the string matched, false if it didn't.
m_rline -> Handle return value appropriately (QuitUser if the return value was true, in this case.)
m_rline -> Check RLine against later connecting users, as according to normal XLine rules. This works the same as described just previously.
Operator -> Issue Command: /rline ^lamebot!
m_rline -> Delete the RLine for ^lamebot!, which in turn deletes the regex object provided earlier.
m_pcre -> In PCRERegex destructor, Call pcre_free on the regex structure stored in the PCRERegex object.
InspIRCd -> Unload m_pcre - Fails, interface in use by m_rline
InspIRCd -> Unload m_rline
m_rline -> Declare Done with interface RegularExpression
InspIRCd -> Unload m_pcre
m_pcre -> Shutdown libpcre if necessary. Unregister Interface RegularExpression and Feature RegularExpression.

















