Comparison of regular expression engines

From Seo Wiki - Search Engine Optimization and Programming Languages

Jump to: navigation, search

Contents

Libraries

List of regular expression libraries
Official website Programming language Software license
Boost.Regex Template:Ref label Boost C++ Libraries C++ Boost Software License
Boost.Xpressive Boost C++ Libraries C++ Boost Software License
CL-PPCRE Edi Weitz Common Lisp BSD
DEELX RegExLab C++ "free for personal use and commercial use"
GLib/GRegex Template:Ref label Marco Barisione C LGPL
GRETA Microsoft Research C++ ?
ICU International Components for Unicode C/C++/Java ICU license
Jakarta/Regexp The Apache Jakarta Project Java Apache License
JRegex JRegex Java BSD
Oniguruma Kosako C BSD
Pattwo Stevesoft Java (compatible with Java 1.0) LGPL
PCRE Philip Hazel C/C++Template:Ref label BSD
Qt/QRegExp Qt Software C++ Qt GNU GPL v. 3.0 / Qt GNU LGPL v. 2.1 / Qt Commercial
regex - Henry Spencer's regular expression libraries ArgList C BSD
TRE Ville Laurikari C BSD
TPerlRegEx TPerlRegEx VCL Component Object Pascal MPLv1.1
TRegExpr RegExp Studio Object Pascal Freeware

Template:Note label formerly called Regex++

Template:Note label included since version 2.13.0

Template:Note label C++ bindings were developed by Google and became officially part of PCRE in 2006

Languages

List of languages coming with regular expression support
Language Official website Software license Remarks
.NET MSDN Proprietary
D D Proprietary
Haskell Haskell.org BSD3
Java Java GNU General Public License REs are written as strings (all backslashes must be doubled, hurting readability).
JavaScript/ECMAScript
? Limited but REs are first-class citizens of the language with a specific /.../mod syntax.
Lua Lua.org MIT License Uses a simplified, limited dialect. Can be bound to a more powerful library, like PCRE or an alternative parser like LPeg.
Perl Perl.com Artistic License or the GNU General Public License Full, central part of the language.
PHP PHP.net ? Has two implementations, PCRE being the most efficient (speed, functionalities).
Python python.org Python Software Foundation License
Ruby ruby-doc.org GNU Library General Public License
SAP ABAP SAP.com ?
Tcl 8.4 tcl.tk Tcl/Tk License
(Permissive, similar to BSD)

Language features

NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.

Part 1

Language feature comparison (part 1)
"+" quantifier Negated character classes Non-greedy quantifiersTemplate:Refun Shy groupsTemplate:Refun Lookahead Lookbehind BackreferencesTemplate:Refun >9 indexable captures
Boost.Regex Yes Yes Yes Yes Yes Yes Yes Yes
Boost.Xpressive Yes Yes Yes Yes Yes Yes Yes Yes
CL-PPCRE Yes Yes Yes Yes Yes Yes Yes Yes
EmEditor Yes Yes Yes Yes Yes Yes Yes No
GLib/GRegex ? ? ? ? ? ? ? ?
GNU Grep Yes Yes Yes Yes Yes Yes Yes ?
Haskell Yes Yes Yes Yes Yes Yes Yes Yes
Java Yes Yes Yes Yes Yes Yes Yes Yes
ICU Regex Yes Yes Yes Yes Yes Yes Yes Yes
JGsoft Yes Yes Yes Yes Yes Yes Yes Yes
.NET Yes Yes Yes Yes Yes Yes Yes Yes
OmniOutliner 3.6.2 Yes Yes Yes No No No ? ?
PCRE Yes Yes Yes Yes Yes Yes Yes Yes
Perl Yes Yes Yes Yes Yes Yes Yes Yes
PHP Yes Yes Yes Yes Yes Yes Yes Yes
Python Yes Yes Yes Yes Yes Yes Yes Yes
Qt/QRegExp Yes Yes Yes Yes Yes No Yes Yes
Ruby Yes Yes Yes Yes Yes No Yes Yes
TRE Yes Yes Yes Yes No No Yes No
Vim Template:Latest preview release/Vim Yes Yes Yes Yes Yes Yes Yes No
  • ^  Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all
  • ^  Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the groups content needs not be accessed later.
  • ^  Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab"

Part 2

Language feature comparison (part 2)
Directives Template:Refun Conditionals Atomic groups Template:Refun Named capture Template:Refun Comments Embedded code Partial matching Fuzzy matching Unicode property support [1]
Boost.Regex Yes Yes Yes Yes Yes No Yes No Yes Template:Refun
Boost.Xpressive Yes No Yes Yes Yes No Yes No No
CL-PPCRE Yes Yes Yes Yes Yes Yes ? No No
EmEditor Yes Yes ? ? Yes No Yes No ?
GLib/GRegex ? ? ? ? ? No Yes No Yes Template:Refun
GNU Grep Yes Yes ? Yes Yes No ? No No
Haskell ? ? ? ? ? No ? No No
Java Yes Yes Yes Yes No No ? No Yes
ICU Regex Yes Yes Yes No Yes No No No Yes
JGsoft Yes Yes Yes Yes Yes No Yes ? Yes
.NET Yes Yes Yes Yes Yes No ? No Yes
OmniOutliner 3.6.2 ? ? ? ? No No ? No ?
PCRE Yes Yes Yes Yes Template:Refun Yes Yes Yes No Yes Template:Refun
Perl Yes Yes Yes Yes Template:Refun Yes Yes No No Yes
PHP Yes Yes Yes Yes Yes No No No No
Python Yes Yes No Yes Yes No No No No
Qt/QRegExp No No No No No No Yes No Yes
RubyTemplate:Refun Yes No No No Yes Yes No No No
TRE Yes No No No Yes No No Yes ?
Vim Template:Latest preview release/Vim Yes ? Yes ? ? No Yes No ?
  • ^  Also known as Flags modifiers or Option letters. Example pattern: "(?i:test)"
  • ^  Also called Independent sub-expressions
  • ^  Similar to back references but with names instead of indices
  • ^  Available as of PCRE 7.0 (as of PCRE 4.0 with Python-like syntax (?P<name>...))
  • ^  Available as of perl 5.9.5
  • ^  Requires optional Unicode support enabled.
  • ^  As of Ruby 1.8. The current development version, Ruby 1.9, has additional features.

API features

API feature comparison
Native UTF-16 support Template:Refun Native UTF-8 support Template:Refun Non-linear input support Dot-matches-newline option Anchor-matches-newline option
Boost.Regex No No Yes Yes Yes
Boost.Xpressive ? ? ? ? ?
GLib/GRegex No Yes Template:Refun No Yes Yes
ICU Regex Yes No No Yes ?
Java Yes ? ? Yes Yes
.NET Yes No Yes Yes ?
PCRE No Yes Template:Refun No Yes Yes
Qt/QRegExp Yes No No No No
TRE No ? Yes Yes Yes
  • ^  Native support means that conversion between UTF-16 <-> UTF-8 isn't required, the Unicode properties are supported, and the encoding type is always available (platform dependent wchar_t doesn't count).

See also

External links

Personal tools

Served in 0.366 secs.