[ prog / sol / mona ]

prog


How can I run my own instance of this

97 2020-02-28 04:02

As an aside, before I explain why (or) and (**) are broken in sre->procedure, there is a "Fix exponential explosion in backtrack compilation" commit to irregex by Peter Bex on "Dec 5, 2016".
https://github.com/ashinn/irregex/commit/a16ffc86eca15fca9e40607d41de3cea9cf868f1
It only came to my attention because it contains the current implementation of the (+) branch of sre->procedure. While "define * in terms of +, instead of vice versa" is a fine idea, you still need a working (+). This (+), however, also takes a light-hearted comedy approach to the "POSIX leftmost, longest semantics" guaranteed by the documentation. As explained in >>95 the fixed range of 1 is only there to cause irregex to use sre->procedure.

$ guile -l irregex.scm
[...]
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (define (inout re n)
   (let* ((sin  (string-join (make-list n "a") ""))
          (sout (imsis re sin)))
      (simple-format #t  " in ~A ~A\nout ~A ~A\n" (string-length sin) sin (string-length sout) sout)))
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 8)
 in 8 aaaaaaaa
out 6 aaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 9)
 in 9 aaaaaaaaa
out 9 aaaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 10)
 in 10 aaaaaaaaaa
out 9 aaaaaaaaa

This class of tests can also be made to fail if the prefix is the second alternative:

scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 9)
 in 9 aaaaaaaaa
out 8 aaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 10)
 in 10 aaaaaaaaaa
out 10 aaaaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 11)
 in 11 aaaaaaaaaaa
out 10 aaaaaaaaaa

Contrast this with grep, whose authors appear to actually know what they're doing. The {1,1} is only there for equivalence.

$ g () {
> local sin sout len
> len () { echo "$1" | gawk '{ print length ($0) }'; }
> sin=$(printf "%$2s" "" | tr ' ' 'a')
> echo " in $(len "$sin") $sin"
> sout=$(echo "$sin" | grep -E -oe "$1")
> echo "out $(len "$sout") $sout"
> }
$ g '(aaa|aaaaa)+{1,1}' 8
 in 8 aaaaaaaa
out 8 aaaaaaaa
$ g '(aaa|aaaaa)+{1,1}' 9
 in 9 aaaaaaaaa
out 9 aaaaaaaaa
$ g '(aaa|aaaaa)+{1,1}' 10
 in 10 aaaaaaaaaa
out 10 aaaaaaaaaa

And with reversed alternatives:

$ g '(aaaaa|aaa)+{1,1}' 9
 in 9 aaaaaaaaa
out 9 aaaaaaaaa
$ g '(aaaaa|aaa)+{1,1}' 10
 in 10 aaaaaaaaaa
out 10 aaaaaaaaaa
$ g '(aaaaa|aaa)+{1,1}' 11
 in 11 aaaaaaaaaaa
out 11 aaaaaaaaaaa

Great commit you have there.

>>96
Indeed, I hear entomologists can be quite peculiar people.

301


VIP:

do not edit these