Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does a duplicate name imply a duplicate CapturingGroupNumber? #2

Closed
msaboff opened this issue May 24, 2022 · 3 comments
Closed

Does a duplicate name imply a duplicate CapturingGroupNumber? #2

msaboff opened this issue May 24, 2022 · 3 comments

Comments

@msaboff
Copy link

msaboff commented May 24, 2022

The current spec ties the group number (aka ParenIndex) to the GroupName through the abstract operation CountLeftCapturingParensBefore. What semantics to you propose for ParenIndex wrt subsequent instances of a duplicate GroupName? I could see two possibilities:

  1. Sever the one to one relationship between a GroupName and a unique GroupNumber as counted left to right. Instead ParenIndex is counted left to right. This could allow a developer to determine which of the named groups matched. It does however add semantic confusion.
  2. Maintain the one to one relationship between a GroupName and a unique GroupNumber. Since GroupNumber is called ParenIndex in the spec, the name is no longer a good description and there would be holes in which ParenIndex's are valid. This begs the question as to whether ParenIndex's would be compressed and what to assign to NcapturingParens. e.g. would
    /(?<day>[0-9]{1,2})-(?<month>1[0-2]|[1-9]_-(?<year>[1-9][0-9]{0,3})|(?<month>1[0-2]|[1-9])/(?<day>[0-9]{1,2})/(?<year>[1-9][0-9]{0,3})/u
    have three or six NcapturingParens and is the ParenIndex for "day" always 1?
@bakkot
Copy link
Collaborator

bakkot commented May 24, 2022

The spec text in the draft PR takes the first option. I think this is less confusing than it would be to make it so that you can no longer determine group numbers by just counting left-to-right.

In practice, I've generally found people tend not to mix group numbers and group names, so it's probably not terribly confusing either way.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Jun 21, 2022

Another point in favor of the current behavior is that it makes refactoring a regexp to add named capturing groups safer: if two groups with the same name had the same index, then it would break its consumers because it would change the result of .match.

@bakkot
Copy link
Collaborator

bakkot commented Aug 1, 2022

This proposal advanced to stage 3 with the behavior specified to be the first option above.

@bakkot bakkot closed this as completed Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants