Skip to content

Regexp Utils #652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Regexp Utils #652

wants to merge 6 commits into from

Conversation

Mzack9999
Copy link
Member

@Mzack9999 Mzack9999 commented May 14, 2025

@Mzack9999 Mzack9999 self-assigned this May 14, 2025
@ehsandeep ehsandeep requested a review from dwisiswant0 May 14, 2025 21:18
@dwisiswant0
Copy link
Member

afaik, using the combination of regexp2 and std regexp pkg should be sufficient, since regexep2 already support Perl5 (lookarounds & backreferences). so there’s really no need for the go-re2 engine. also it seems like the current detectEngine implementation isn’t comprehensive enough to handle non-std-regexp features properly.

tip: i actually built a package called pcregexp that supports lookarounds & backreferences while aiming for compatibility with the std regexp, it uses libpcre2 API bindings under the hood with purego (so there’s no Cgo). it also has a wrapper that automatically selects the appropriate engine based on the regex features being used. it’s NOT fully stable yet, but i’m actively looking for adopters to support its continuous development and testing.

go test -benchmem -run=^$ -bench . benchmark
goos: linux
goarch: amd64
pkg: benchmark
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkMatch/pcregexp/simple-16         	77527322	        15.09 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/simple-16  	 2509765	       496.2 ns/op	      80 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/simple-16  	 2244966	       552.1 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/simple-16         	 4953229	       234.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/simple-16                  	 2469645	       522.0 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/wasilibs-re2/simple-16                	 1229862	       961.8 ns/op	     248 B/op	       7 allocs/op
BenchmarkMatch/pcregexp/email-16                     	81896185	        14.94 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/email-16              	  578250	      1836 ns/op	      64 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/email-16              	 2268996	       516.1 ns/op	      16 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/email-16          	 4037431	       287.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/email-16                   	 2544838	       489.9 ns/op	      16 B/op	       2 allocs/op
BenchmarkMatch/wasilibs-re2/email-16                 	 1000000	      1082 ns/op	     240 B/op	       7 allocs/op
BenchmarkMatch/pcregexp/backreference-16             	76438894	        15.10 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/backreference-16      	 1294224	      1021 ns/op	      80 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/backreference-16      	 2315284	       486.8 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/backreference-16  	 5510665	       226.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/backreference-16           	 2588995	       460.9 ns/op	      28 B/op	       2 allocs/op
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1747392575.123307  145936 re2.cc:237] Error parsing '(\w+)\s+\1': invalid escape sequence: \1
PASS
ok  	benchmark	27.466s
package benchmark_test

import (
	"testing"

	AspieSoft "github.com/AspieSoft/go-regex/v8"
	GRbit "github.com/GRbit/go-pcre"
	dlclark "github.com/dlclark/regexp2"
	scorpionknifes "github.com/scorpionknifes/go-pcre"
	wasilibs "github.com/wasilibs/go-re2"

	"github.com/dwisiswant0/pcregexp"
)

func BenchmarkMatch(b *testing.B) {
	tests := []struct {
		name    string
		pattern string
		text    []byte
	}{
		{"simple", `p([a-z]+)ch`, []byte("peach punch pinch")},
		{"email", `\b\w+@\w+\.\w+\b`, []byte("[email protected]")},
		{"backreference", `(\w+)\s+\1`, []byte("hello hello world")},
		{"lookaround", `(?<=foo)bar`, []byte("foobar")},
	}

	for _, tt := range tests {
		r1 := pcregexp.MustCompile(tt.pattern)

		b.ResetTimer()
		b.Run("pcregexp/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r1.Match(tt.text)
			}
		})

		r2 := dlclark.MustCompile(tt.pattern, 0)

		b.ResetTimer()
		b.Run("dlclark-regexp2/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r2.MatchString(string(tt.text))
			}
		})

		r3, err := AspieSoft.CompTry(tt.pattern)
		if err != nil {
			b.Fatalf("r3: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("AspieSoft-regex/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r3.Match(tt.text)
			}
		})

		r4, err := scorpionknifes.Compile(tt.pattern, 0)
		if err != nil {
			b.Fatalf("r4: failed to compile pattern: %v", err)
		}
		r4Matcher := r4.NewMatcher()

		b.ResetTimer()
		b.Run("scorpionknifes-pcre/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r4Matcher.Match(tt.text, 0)
			}
		})

		r5, err := GRbit.Compile(tt.pattern, 0)
		if err != nil {
			b.Fatalf("r5: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("GRbit-pcre/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r5.MatchWFlags(tt.text, 0)
			}
		})

		r6, err := wasilibs.Compile(tt.pattern)
		if err != nil {
			b.Skipf("r6: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("wasilibs-re2/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r6.MatchString(string(tt.text))
			}
		})
	}
}

latest benchstat of pcregexp v. std regexp - dwisiswant0/pcregexp#5 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants