if not re.match('^[a-z0-9_]+$', user):
raise SomeException("invalid username")
as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the lineYes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
35.3.3 The ‘rx’ Structured Regexp Notation
------------------------------------------
As an alternative to the string-based syntax, Emacs provides the
structured ‘rx’ notation based on Lisp S-expressions. This notation is
usually easier to read, write and maintain than regexp strings, and can
be indented and commented freely. It requires a conversion into string
form since that is what regexp functions expect, but that conversion
typically takes place during byte-compilation rather than when the Lisp
code using the regexp is run.
Here is an ‘rx’ regexp(1) that matches a block comment in the C
programming language:
(rx "/*" ; Initial /*
(zero-or-more
(or (not "*") ; Either non-*,
(seq "*" ; or * followed by
(not "/")))) ; non-/
(one-or-more "*") ; At least one star,
"/") ; and the final /
or, using shorter synonyms and written more compactly,
(rx "/*"
(* (| (not "*")
(: "*" (not "/"))))
(+ "*") "/")
In conventional string syntax, it would be written
"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says: The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
most interactive situations where a regexp is requested, such as when
running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.* running a regex not in multi-line mode
* on input that was presumably split from multiple lines, or within a line of multi-line input
* wherein I care whether the line in question is the last line of input without a trailing newline
* but I didn't check, or `.strip()` or anything
I can't say I recall ever being bitten by this.
And there is also nothing here to justify \A over ^.
And it is same in perl: from `man perlre`:
^ Match the beginning of the string (or line, if /m is used) $foo =~ /regex/
$result = $foo =~ /regex/
if ($foo =~ /regex/) {whatever;}
while (/regex/) {whatever;}
The captures ($1, $2, etc.) are global and usable wherever you need them.In this particular case the default is that $ matches the end of a string without a newline but you can include it anytime you need to:
$foo =~ /regex$/ # end of string without newline
$foo =~ /regex$/m # end of string with newlinePython ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html