Skip to content

op.c: Make ifdef structure of argument to S_op_clear_gv() less objectionable to syntax parsers #23255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

leonerd
Copy link
Contributor

@leonerd leonerd commented May 4, 2025

The prior structure of this syntax made it look to tree-sitter-based parsers as if two separate functions were being defined, though only the second of which was ever finished. This resulted in the entire rest of the file appearing to be nested inside the first function, meaning no top-level function analysis was performed on them. This has various knock-on effects on code analysis in tooling such as editors or browsers.

For example, see the two separate fold regions in this editor screenshot:
Screenshot_2025-05-04_09-48-40

When this function is folded, it appears to be a 15800-line long function that consumes the entire rest of the file:
Screenshot_2025-05-04_09-49-07

The fix for it moves the #ifdef syntax inside the argument parentheses to ensure that only one function appears to be created, so the entire rest of the file is parsed as normal.

  • This set of changes does not require a perldelta entry.

…ionable to syntax parsers

The prior structure of this syntax made it look to tree-sitter-based
parsers as if two separate functions were being defined, though only the
second of which was ever finished. This resulted in the entire rest of
the file appearing to be nested inside the first function, meaning no
top-level function analysis was performed on them. This has various
knock-on effects on code analysis in tooling such as editors or
browsers.

The fix for it moves the `#ifdef` syntax inside the argument parentheses
to ensure that only one function appears to be created, so the entire
rest of the file is parsed as normal.
@bulk88
Copy link
Contributor

bulk88 commented May 4, 2025

CPP #ifdefs, inside func call decls or inside a CPP macro invoke call site. This blows up most if not all MSVCs versions ever published, and is super not portable, considering P5P never prohibited ( aka support terminated) commercial Unix OS closed source C compilers, which can get as quirky/strange, just like MSVC gets quirky very easily. My IDEs/syntax high lighters don't freak at those double prototypes.

The first line is just a weird pre-declaration of the function with a missing ;. The IDE should resync at the next line. If its a "code intelligence tool" and not a light weight syntax highlighter, it needs FULL ACCESS to all the cmd line -D and perl core .h files anyways, and needs to send the whole file through the CPP from .c to .i stage before doing its analysis report.

Alternative, 1, just make a one off union type, or 2, declare a void * and cast correctly, in macros, at all caller sites and the callee body site. Alt 1 is noisy and verbose in callers to do all the union struct formalities, unless macros are used. Or heck, just make a CPP macro token, that resolves to string PADOFFSET or string SV * and throw that in the src code's prototype.

If no-threads AV*/GV* logic, vs ithread-only pad logic, is too verbose and too special cased in too many places, some new CPP macros in perl.h/etc need to be written to make everything more readable and more compact on the screen.

@xenu
Copy link
Member

xenu commented May 4, 2025

CPP #ifdefs, inside func call decls or inside a CPP macro invoke call site.

I believe only the last one is indeed forbidden by the C standard:

The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms
the list of arguments for the function-like macro. The individual arguments within the list are
separated by comma preprocessing tokens, but comma preprocessing tokens between matching
inner parentheses do not separate arguments. If there are sequences of preprocessing tokens within
the list of arguments that would otherwise act as preprocessing directives,220) the behavior is
undefined.

There's no macro call here, and indeed, the snippet from this PR compiles even on the oldest version of MSVC that I have installed (2005).

Of course, our codebase is full of macros, many of them look like normal functions (embed.h aliases), so it's very easy to run into this issue.

Copy link
Contributor

@tonycoz tonycoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a reasonable change.

Another approach might be something like:

#ifdef USE_ITHREADS
typedef PADOFFSET SVOP_svref_t;
#define SVOP_svref_t(o) (cPADOPx(o)->op_padix)
#define MDEREF_svref(pitem) ((pitem)->pad_offset)
#else
typedef SV *SVOP_svref_t;
#define SVOP_svref_t(o) (cSVOPx(o)->op_sv)
#define MDEREF_svref(pitem) ((pitem)->sv)
#endif

though that makes more macro hell (and may make the mderef calls harder to check the correctness of, since pitem is normally item++ there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants