Mastering Regular Expressions: Jeffrey E. F. Friedl
Mastering Regular Expressions: Jeffrey E. F. Friedl
Third Edition
Jeffrey E. F. Friedl
Preface xvu
vii
vitt Table of Contents
Benchmarking 232
Know What You're Measuring 234
Benchmarking with PHP 234
Benchmarking with Java 235
Benchmarking with VB.NET 237
Benchmarking with Ruby 238
Benchmarking with Python 238
Benchmarking with Tcl 239
Common Optimizations 240
No Free Lunch 240
Everyone's Lunch is Different 241
The Mechanics of Regex Application 241
Pre-Application Optimizations 242
Optimizations with the Transmission 246
Optimizations of the Regex Itself 247
Techniques for Faster Expressions 252
Common Sense Techniques 254
Expose Literal Text 255
Expose Anchors 256
Lazy Versus Greedy: Be Specific 256
Split Into Multiple Regular Expressions 257
Mimic Initial-Character Discrimination 258
Use Atomic Grouping and Possessive Quantifiers 259
Lead the Engine to a Match 260
Unrolling the Loop 261
Method 1: Building a Regex From Past Expertences 262
The Real "Unrolling-the-Loopn Pattern 264
Method 2: A Top-Down View 266
Method 3: An Internet Hostname 267
Observations 268
Using Atomic Grouping and Possessive Quantifiers 268
Short Unrolling Examples 270
Unrolling C Comments 272
The Freeflowing Regex 277
A Helping Hand to Guide the Match 277
A Well-Guided Regex is a Fast Regex 279
Wrapup 281
In Summary: Think! 281
xii Table of Contents
7: Perl 283
Regular Expressions as a Language Component 285
Perl's Greatest Strength 286
Perl's Greatest Weakness 286
Perl's Regex Flavor 286
Regex Operands and Regex Literals 288
How Regex Literals Are Parsed 292
Regex Modifiers 292
Regex-Related Perlisms 293
Expression Context 294
Dynamic Scope and Regex Match Effects 295
Special Variables Modified by a Match 299
The qr/ / Operator and Regex Objects 303
Building and Using Regex Objects 303
Viewing Regex Objects 305
Using Regex Objects for Efficiency 306
The Match Operator 306
Match's Regex Operand 307
Specifying the Match Target Operand
- 308
Different Uses of the Match Operator 309
Iterative Matching: Scalar Context, with /g 312
The Match Operator's Environmental Relations 316
The Substitution Operator 318
.The Replacement Operand 319
The /e Modifier 319
Context and Return Value 321
The Split Operator 321
Basic Split 322
Returning Empty Elements 324
Split's Special Regex Operands 325
Split's Match Operand with Capturing Parentheses 326
Fun with Perl Enhancements 326
Using a Dynamic Regex to Match Nested Pairs 328
Using the Embedded-Code Construct 331
Using local in an Embedded-Code Construct 335
A Warning About Embedded Code and my Variables 338
Matching Nested Constructs with Embedded Code 340
Overloading Regex Literals 341
Problems with Regex-Literal Overloading 344
Table of Contents xtti
8: Java 365
Java's Regex Flavor 366
Java Support for \p { } and \ P{ • 369
Unicode Line Terminators 370
Using java.util.regex 371
The Pattern. compile () Factory 372
Pattern's matcher method 373
The Matcher Object 373
Applying the Regex 375
Querying Match Results 376
Simple Search and Replace 378
Advanced Search and Replace 380
In-Place Search and Replace 382
The Matcher's Region 384
Method Chaining 389
Methods for Building a Scanner 389
Other Matcher Methods 392
Other Pattern Methods 394
Pattern's split Method, with One Argument 395
Pattern's split Method, with Two Arguments 396
Additional Examples 397
Adding Width and Height Attributes to Image Tags 397
Validating HTML with Multiple Patterns Per Matcher 399
Parsing Comma-Separated Values (CSV) Text 401
Java Version Differences 401
Differences Between 1.4.2 and 1.5.0 402
Differences Between 1.5.0 and 1.6 403
xiv Table of Contents
9: .NET 405
.NET's Regex Flavor 406
Additional Comments an the Flavor 409
Using .NET Regular Expressions 413
Regex Quickstart 413
Package Overview 415
Core Object Overview 416
Core Object Details 418
Creating Regex Objects 419
Using Regex Objects 421
Using Match Objects 427
Using Group Objects 430
Static "Convenience" Functions 431
Regex Caching 432
Support Functions 432
Advanced .NET 434
Regex Assemblies 434
Matching Nested Constructs 436
Captur e Objects 437