Skip to content

Commit bdbd617

Browse files
committed
Modify Post
1 parent 06d6334 commit bdbd617

File tree

2 files changed

+64
-25
lines changed

2 files changed

+64
-25
lines changed

_posts/2022-12-15-python-re.markdown

Lines changed: 62 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,46 +5,84 @@ date: 2022-12-15 20:06:11 +0530
55
categories: programming
66
---
77

8-
In python, we import `re` module to work with regular expressions. Using it, we `search` for a pattern which is a `r'raw string'` in a text. It can return `None` if no match is found, otherwise it can return a `match` object.
8+
In python, we import `re` module to work with regular expressions. Using it, we `search` for a pattern which is a `r'raw string'` in a text like `re.search(pattern, text)`. It can return `None` if no match is found, otherwise it can return a `match` object.
99

10-
If a match is found, then `match.group()` will contain the matching portion of the original string. In case we group the regular expressions with a `(` and `)`, then `group(0)` will contain the whole match, while `group(1)`, `group(2)` etc. will contain specific matches. `group(1,2,3)` will return a tuple of them. Another way to get the tuple of the whole match is to simply call `groups()`. `group()` is same as `group(0)`
10+
regexes can be compiled with `re.compile(pattern)` - compilation work shifts to application start time, rather than application use time. It returns a `RegexObject` - on which we can get the `regex.pattern` and we can `regex.search(text)` to see if the given text matches the pattern. `search` also takes a position - to start searching from that position.
11+
12+
Most of the methods can be called directly from `re` module or from the `re.compile()` object.
1113

12-
A `groupdict()` would give a dictionary instead of a tuple, however, one has to specify the key name in the group matching expression and it gets messy ther, though it has a handy name to refer to.
14+
`findall()` finds all the matches - it returns a list of matches as strings. In case groupings are used, then each element in the list is a tuple and can be indexed. Instead of passing a string, one can even pass `f.read()` to pass file contents to match. BTW, `findall()` just returns the captured groups, not the whole expression. `finditer()` returns an iterator for match objects, not strings themselves. With `match` object returned by `finditer()` one can index the original string like `text[s:e]` where `s` is `match.start()` and `e` is `match.end()`
1315

14-
`findall()` finds all the matches - it returns a list of matches as strings. In case groupings are used, then each element in the list is a tuple and can be indexed. Instead of passing a string, one can even pass `f.read()` to pass file contents to match. BTW, `findall()` just returns the captured groups, not the whole expression. `finditer()` returns an iterator for match objects, not strings themselves.
1516

1617
Whatever is matched, can be substituted as well with `sub`. Whatever is matched can be replaced by default, however, only specific groups can be referred to/omitted to get a different result with `\1`, `\2` etc.
1718

19+
### Repetitions
20+
21+
| `*` | zero or more |
22+
| `+` | one or more |
23+
| `?` | zero or one |
24+
| `{m}` | exactly `m` matches |
25+
| `{m, n}` | `m` - `n` matches |
26+
| `{m, }` | `m` or more matches |
27+
28+
* Adding a `?` at the end of the repetition pattern will make it non-greedy
29+
* `ab??` - `a` followed by zero or one `b` - non-greedy version
30+
* In non-greedy version if a match has to be made zero or more times, then it will be matched zero times
31+
* In non-greedy version `{m,n}` will match only `m` even though more is available
32+
33+
### Character sets
34+
35+
Can be followed by repetition pattern.
36+
37+
| `[ab]` | `a` or `b` |
38+
| `[^ab]` | neither `a` nor `b` |
39+
| `[a-z]` | range of lower case, upper case, digits |
40+
| `-` | within `[]` matches a range if specified in between two ordered characters, specifies itself when at the end |
41+
42+
### Escape codes
43+
44+
| `\d` | a digit |
45+
| `\D` | a non-digit |
46+
| `\s` | a white-space |
47+
| `\S` | a non-white-space |
48+
| `\w` | alpha numeric |
49+
| `\W` | non alpha numeric |
50+
51+
### Anchoring
52+
53+
| `^`, `\A` | Beginning of line (if not used in character set) |
54+
| `$`, `\Z` | End of line |
55+
| `\b` | matches empty string at word boundary, that is, beginning or end |
56+
| `\B` | matches empty string at non-word boundary |
57+
58+
### Constraining the search
59+
60+
| `re.match` | match the pattern at the beginning of input |
61+
| `re.fullmatch` | match the whole input with the pattern |
62+
| `re.search(text, pos)` | match from `pos` in the input |
63+
64+
### Grouping
65+
66+
If a match is found, then `match.group()` will contain the matching portion of the original string. In case we group the regular expressions with a `(` and `)`, then `group(0)` will contain the whole match, while `group(1)`, `group(2)` etc. will contain specific matches. `group(1,2,3)` will return a tuple of them. Another way to get the tuple of the whole match is to simply call `groups()`. `group()` is same as `group(0)`
67+
68+
A `groupdict()` would give a dictionary instead of a tuple, however, one has to specify the key name in the group matching expression and it gets messy there, though it has a handy name to refer to.
69+
70+
`(?: )` will not count as a group - in case if some match has to be ignored.
71+
1872
A note on how to match:
1973

20-
* ordinary characters (alpha-numeric) just match themselves only
74+
### Misc
75+
2176
* `.` matches any single character except newline - however, when used in `[]`, it means a literal dot
22-
* `\w` matches a single alpha-numeric, underscore only
23-
* `\W` matches any non word character - whatever mentioned above
24-
* `\b` matches boundary between word and a non-word (not the non-word character itself)
25-
* `\s` matches a single space character
26-
* `\S` matches a single non-space character
2777
* `\t`, `\n`, `\r` - tab, newline, return
28-
* `\d` matches a single digit
29-
* `^` and `$` match beginning and end respectively
3078
* `\` inhibits the specialness or the identity of the character. If it is given spuriously, the engine will throw an error
31-
* `+` matches one or more occurences of a pattern before it
32-
* `*` matches zero or more occurences of a pattern before it
33-
* `?` matches zero or one occurences of a pattern before it
34-
* `+` and `*` are greedy, they match as much as possible
35-
* `[]` matches any specific set of characters mentioned inside it.
36-
* `-` within `[]` matches a range if specified in between two ordered characters, specifies itself when at the end
37-
* `^` within `[]` inverts the match
38-
* `(?: )` will not count as a group - in case if some match has to be ignored.
39-
* when `?` follows a `*` or `+`, it becomes non-greedy.
40-
* `{n,}` at the end of a matching sequence will match `n` or more occurrences of it.
4179
* any variable can be included in the regular expression by embedding `re.escape(var)` in between
42-
* `(?=)` and `(?!)` are positive and negative look-ahead matches - from what I understood, it just peeps, but does not move past it for matching..
80+
* `(?=)` and `(?!)` are positive and negative look-ahead matches - from what I understood, it just peeps, but does not move past it for matching.
4381

4482
Some flags that can be passed are:
4583

4684
* `re.IGNORECASE` - to ignore the case
4785
* `re.DOTALL` - let the `.` match the newline also
4886
* `re.MULTILINE` - `^` and `$` will match each line instead of the real beginning and end
4987

50-
https://www.debuggex.com/ - can be used to debug regular expressions in python
88+
(debuggex)[https://www.debuggex.com/] - can be used to debug regular expressions in python

_posts/2023-09-30-python-standard-library-reference.markdown

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,4 +155,5 @@ Different action arguments:
155155
| `:>` | right aligned |
156156
| `:^` | center aligned |
157157
| `:<num>{b|f|o|x|X}` | print the number in binary, float, octal, lower hex, upper hex respectively |
158-
158+
|`{!r}` | user `repr` method to display |
159+
| `{!s}` | use `str` method to display |

0 commit comments

Comments
 (0)