Skip to content

fix include #2604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

877509395
Copy link

@877509395 877509395 commented Aug 14, 2021

Fix lex "include [ \t]filename" handle. To support more spaces/tabs after include key word. Before this change, code supports only one space after include. but lex support more.

@martinhsv
Copy link
Contributor

I like the simplification of leaving out regex.h.

But I don't think you need to make a copy. For example, for the first of the two sections, wouldn't this suffice?:

char *tmpStr = yytext + strlen("include");
const char *file = strtok ( tmpStr, " \t");

@877509395
Copy link
Author

877509395 commented Aug 26, 2021

strtok(str) will set the next byte of last byte of token to '\0' if token found, so it may change the str. If str contains multiple [ \t], and we need enumerate all possible tokens(call strtok multiple times), multiple bytes would be set to 0. below is a example.


#include <string.h>
int main () {
  char str[80] = "This is - www.runoob.com - website";
  const char s[2] = "-";
  int i=0;
  char *token;
  int length;
  length = strlen(str);
  token = strtok(str, s);
  while( token != NULL ) {
    i++;
    printf( "token%d =%s\n", i,token );
    printf( "token%d address=%p\n", i, token );
    token = strtok(NULL, s);
    printf("\n");
  }
  printf( "str =%s\n",str);
  printf( "str =%p\n", str );
  return(0);
}

the output is:
token1 =This is
token1 address=0x7ffe0e856810

token2 = www.runoob.com
token2 address=0x7ffe0e856819

token3 = website
token3 address=0x7ffe0e85682a

str =This is
str =0x7ffe0e856810
~
From the output, str is changed to "This is", the next byte of "This is" changed to 0. every token address is between str and str+length;

For our case, strtok only is called only once, and based on the below grammar, it is NOT possible there is a [ \t]+ in "CONFIG_VALUE_PATH", so strtok will not change CONFIG_VALUE_PATH LUCKILY;
"CONFIG_VALUE_PATH [0-9A-Za-z_/.-*:]+"
{CONFIG_INCLUDE}[ \t]+{CONFIG_VALUE_PATH}
{CONFIG_INCLUDE}[ \t]+["]{CONFIG_VALUE_PATH}["]

so below change can suffice.
char *tmpStr = yytext + strlen("include");
const char *file = strtok ( tmpStr, " \t");

But I think it is a good practice to first strdup and then free for strtok:

  1. strtok may do harm to input.
  2. In case for some reason in the future, CONFIG_VALUE_PATH can contain space/tab.
  3. hidden logic(can not contain space/tab in CONFIG_VALUE_PATH) of the code need to dig out

@martinhsv
Copy link
Contributor

Ok, good point. I agree that it's unwise to use strtok here without a copy.

Is it worth considering using strspn instead? It's a little more narrowly focussed on what is needed here.

char *tmpStr = yytext + strlen("include");
const char file = tmpStr + strspn ( tmpStr, " \t");

@877509395
Copy link
Author

877509395 commented Aug 27, 2021

Perfect, strspn exactly matches this context. I think I should go through libc.pdf to know every function.


remove tmpStr, merge two lines into one line:

const char *file = yytext + strlen("include")+ strspn ( yytext + strlen("include"), " \t");

Is it OK to remove tmpStr?

@877509395
Copy link
Author

877509395 commented Aug 27, 2021

#include <stdio.h>

#include <string.h>
int main(int argc, char**argv)
{
  const char *yytext = "include lj lj ljsdfj";
  const char *file = yytext + strlen("include")+ strspn ( yytext + strlen("include"), " \t");
  printf("%s", file);
}
the output is:
[root@localhost strcspn]# ./a.out
lj lj ljsdfj

@877509395
Copy link
Author

BTW,
Same issue also seems exists in many other line. for example,
{CONFIG_DIR_AUDIT_LOG2}[ \t]+{CONFIG_VALUE_PATH} { return p::make_CONFIG_DIR_AUDIT_LOG2(strchr(yytext, ' ') + 1, *driver.loc.back());
I have not check it carefully now.

@877509395 877509395 closed this Aug 27, 2021
@877509395 877509395 deleted the lexhandle branch August 27, 2021 02:55
@877509395
Copy link
Author

877509395 commented Aug 27, 2021

misoperation close this PR and delete lexhandle

@877509395 877509395 restored the lexhandle branch August 27, 2021 05:52
@martinhsv
Copy link
Contributor

Regarding collapsing the code change to a single line, I might lean towards leaving it as two lines, because:

  • allocating an extra pointer on the stack doesn't really create extra cost or risk and
  • it may make the code a little bit easier to read and
  • it avoids duplicating code strlen("include") ... including having a hard-coded string specified twice; it's generally a good coding practice to avoid such duplication

My preference on that isn't particularly strong, however,

@877509395
Copy link
Author

reasonable

@877509395
Copy link
Author

will submit PR tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants