flufluflufluffy 20 hours ago
The vast majority of the times I use ^/$, I actually want the behavior of matching start/end of lines. If I had some multi-line text, and only wanted to update or do something with the actual beginning or end of the entire text, I’d typically just do it manually.
theamk 19 hours ago
A lot of time I want to check for valid identifier:

    if not re.match('^[a-z0-9_]+$', user):
        raise SomeException("invalid username")
as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the line
extraduder_ire 15 hours ago
Shouldn't you use the match returned from the string? Or use .fullmatch() (added 3.4) to match the whole string.
theamk 10 hours ago
In general no, you should not use match from the string. If you are getting input from user, you want a more complex processing (like stripping all whitespace), and if you are getting input from API calls, you want to either use specified name as-is, or fail.

Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...

Joker_vD 2 days ago
Regular expressions as we basically now them today were made for ed. In that context, '$' absolutely had to match the terminating newline or it would've been completely useless.
seanwilson 19 hours ago
I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?

Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.

I can't think of anywhere else in general programming where we have something so terse and symbol heavy.

db48x 19 hours ago
It’s been done. Emacs, for example, has rx notation. From the manual:

    35.3.3 The ‘rx’ Structured Regexp Notation
    ------------------------------------------
    
    As an alternative to the string-based syntax, Emacs provides the
    structured ‘rx’ notation based on Lisp S-expressions.  This notation is
    usually easier to read, write and maintain than regexp strings, and can
    be indented and commented freely.  It requires a conversion into string
    form since that is what regexp functions expect, but that conversion
    typically takes place during byte-compilation rather than when the Lisp
    code using the regexp is run.
    
       Here is an ‘rx’ regexp(1) that matches a block comment in the C
    programming language:
    
         (rx "/*"                    ; Initial /*
             (zero-or-more
              (or (not "*")          ;  Either non-*,
                  (seq "*"           ;  or * followed by
                       (not "/"))))  ;     non-/
             (one-or-more "*")       ; At least one star,
             "/")                    ; and the final /
    
    or, using shorter synonyms and written more compactly,
    
         (rx "/*"
             (* (| (not "*")
                   (: "*" (not "/"))))
             (+ "*") "/")
    
    In conventional string syntax, it would be written
    
         "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says:

       The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
    most interactive situations where a regexp is requested, such as when
    running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.
zahlman 11 hours ago
For this to matter, it seems that I would have to be in the situation of:

* running a regex not in multi-line mode

* on input that was presumably split from multiple lines, or within a line of multi-line input

* wherein I care whether the line in question is the last line of input without a trailing newline

* but I didn't check, or `.strip()` or anything

I can't say I recall ever being bitten by this.

And there is also nothing here to justify \A over ^.

eviks 2 days ago
so why \A instead of ^?
tkocmathla 24 hours ago
\A always matches the start of the string, but in multiline mode, ^ will match both the start of the string and the start of each line:

https://docs.python.org/3/library/re.html#re.MULTILINE

svilen_dobrev 21 hours ago
it's in the spec. Since forever, like v 1.3? don't remember.

And it is same in perl: from `man perlre`:

   ^   Match the beginning of the string  (or line, if /m is used)
autoexec 2 days ago
I've said it before and I'll say it again, I'd like Python a lot more if it abandoned re and handled regex like perl did.
edflsafoiewq 17 hours ago
I've never used perl. What's the difference?
autoexec 11 hours ago
It doesn't need an import at all. It's just a normal part of the language's syntax and can be used just about anywhere:

    $foo =~ /regex/
    $result = $foo =~ /regex/
    if ($foo =~ /regex/) {whatever;}
    while (/regex/) {whatever;}
The captures ($1, $2, etc.) are global and usable wherever you need them.

In this particular case the default is that $ matches the end of a string without a newline but you can include it anytime you need to:

   $foo =~ /regex$/ # end of string without newline
   $foo =~ /regex$/m # end of string with newline
instig007 18 hours ago
ABC: Always. Build on. Parser Combinators.

Python ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html

az09mugen 24 hours ago
They could simply advise to use boundaries '\b' instead.
notpushkin 16 hours ago
Which would also match whitespace in addition to the \n they’re trying to avoid matching?