[ prog / sol / mona ]

prog


How can I run my own instance of this

100 2020-03-01 11:17

[1/5]
OK, here is the cold, hard truth. The (or) branch of sre->procedure looks like this:

((or)
 (case (length (cdr sre))
   ((0) (lambda (cnk init src str i end matches fail) (fail)))
   ((1) (rec (cadr sre)))
   (else
    (let* ((first (rec (cadr sre)))
           (rest (lp (sre-alternate (cddr sre))
                     (+ n (sre-count-submatches (cadr sre)))
                     flags
                     next)))
      (lambda (cnk init src str i end matches fail)
        (first cnk init src str i end matches
               (lambda ()
                 (rest cnk init src str i end matches fail))))))))

It tries the first alternative, and if this fails it tries the (or) of the other alternatives. There is not even an attempt at the "POSIX leftmost, longest semantics". Since the (or) might be the entire regex, this is enough to make the sre->procedure call break the same semantics. Furthermore, in the general case where we do not restrict ourselves to alternatives of simple structure to construct a minimal >>91 failure case, the trick of reordering the alternatives cannot save the current (or). Consider "(aaa)*(ba)*" and "(aaaaa)*(ab)*". If we put the 3-group first, we can test on "aaaaaabab":

$ guile -l irregex.scm 
[...]
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (imsis "((aaa)*(ba)*|(aaaaa)*(ab)*){1,1}" "aaaaaabab")
$1 = "aaaaaaba"

which leaves off the final "b". The second branch would have taken one more character:

scheme@(guile-user)> (imsis "((aaaaa)*(ab)*){1,1}" "aaaaaabab")
$2 = "aaaaaabab"

If we put the 5-group first, we can test on "aaaaaababa":

scheme@(guile-user)> (imsis "((aaaaa)*(ab)*|(aaa)*(ba)*){1,1}" "aaaaaababa")
$3 = "aaaaaabab"

which leaves off the final "a". The second branch would again have taken one more character:

scheme@(guile-user)> (imsis "((aaa)*(ba)*){1,1}" "aaaaaababa")
$4 = "aaaaaababa"

The test string can be grown to arbitrary size by prefixing groups of 15 'a's. There is no trick that can substitute in the general case for the (or) properly searching its match space and returning the match candidates in non-increasing order of length.

301


VIP:

do not edit these