Skip to content

Conversation

@smallzhongfeng
Copy link
Contributor

@smallzhongfeng smallzhongfeng commented Jan 22, 2022

What changes were proposed in this pull request?

Unpacking an archive s3a://zjx/python3.6.9.zip#python3 from /tmp/spark-d861d788-bd72-4d3c-88fc-8f79b30b081d/python3.6.9.zip to /opt/spark/work-dir/./python3 Exception in thread "main" java.io.IOException: Cannot run program "python3/bin/python3": error=13, Permission denied at java.base/java.lang.ProcessBuilder.start(Unknown Source) at java.base/java.lang.ProcessBuilder.start(Unknown Source) at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)

When we set parameter "--archives s3a://zjx/python3.6.9.zip" to submit the Spark job, driver will unzip it, but it will lost executable permission after unzipping, so we should keep those permission. In this PR, I add the executable permission for all file after unzipping. It may cost some time to unzip that, but the time cost is controllable. In the code submitted this time, I keep the permission information of the file in a collection, and then grant the permission to the file one by one after decompression. However, this is only for UNIX platforms. Zip decompression for non UNIX platforms skips the logic of granting permission. In this way, zip decompression can be realized normally.

Why are the changes needed?

When we submit a py job and want to use our own version of Python,we may add "--archives s3a://zjx/python3.6.9.zip" ,after driver unzip this file, and want to execute the program using python3, it will report permission denied. So we should add executable permission to those script. This keeps the permissions of tar and ZIP files the same.

Does this PR introduce any user-facing change?

After the fix, the zip package submitted by the user for Python will run normally.

How was this patch tested?

This scheme is already being deployed in production.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a StringBuffer (StringBuilder is preferred anyway), just use an interpolated string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good suggestion. I'm just worried about concurrency.

Copy link
Contributor

@LuciferYang LuciferYang Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be no concurrency issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a different code path for unzipping only here and not elsewhere?
I'm also concerned about adding the platform dependency on unzip just for this.
Why does the exec bit need to be set - wouldn't you run a script as python script.py anyway to be more sure about the python being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the issue of my premis, https://issues.apache.org/jira/browse/SPARK-37677, The submission method I mentioned is not supported after the zip file is unzipped.

Copy link
Contributor Author

@smallzhongfeng smallzhongfeng Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Do you mean that the absolute path of the source file is different from other decompression methods? This can use relative paths. I use absolute paths for easy understanding at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's an existing unZip method. At least, why not change that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing zip decompression method can not retain the original permission information of the file, but can only retain the read-write permission, so the user's zip file has no execution permission after decompression. The current example is that there is no python or python3 command after decompression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, would that be helpful in all unzip scenarios? rather than have two different implementations
Adding a dependency on the OS package is probably OK but not sure
Can you add python/python3 to the execution? that seems more standard but not sure if it is possible

Copy link
Contributor Author

@smallzhongfeng smallzhongfeng Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35278, this PR is to add execution permissions to python/python3, but the suggestion is to retain the permissions of the original file instead of adding permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that it won't support Windows. It's generally not a good practice to rely on external problems that has to be manually installed. If unzip is not installed, it will just fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the unzip method provided by the previous fileutil by the window operating system can retain the permissions of the original file, so I can use the judgment statement to execute the shell command when the Linux operating system is found, and the window operating system retains the original method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this way of relying on external installation also draws lessons from the decompression method of untar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I have revised it again. Could you review it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think untar uses Java library not the external ones. I still think we should avoid relying on external programs. Users can use tar.gz if they want to keep file permissions

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to mkdir dest directory again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't create a new folder at the beginning, but the file is missing a level-1 directory, which is inconsistent with the original directory layer after decompression, so here is to create a folder first and then extract it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does these blank line is related to this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be caused by opening the code specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • unzip
  • Just suggestion that it's better with shell errorMsg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your advice. This is consistent with the exception thrown by untar decompression. But I do need to change the command name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about the case when unzip is not installed in Linux. Is there any other ways to avoid to rely on external programs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the zip file is decompressed by creating a new file, the permission cannot be saved, but the metadata of the permission can be saved when the unzip command is executed on Linux, but I don't know what special processing is done by the unzip command in Linux. Moreover, I found many API attempts of zip, and found that they were similar to the unzip of JDK, and there was no way to save the permission information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen many JDK bugs and found that the unzip decompression permission problem seems to have lasted for a long time, such as https://bugs.openjdk.java.net/browse/JDK-6194856. Using java APIs doesn't seem to solve this problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon @srowen Hello, I have found a way to store permissions for files, but this requires injecting two dependency projects, is this ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 agree with @HyukjinKwon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blank line should not be deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore to import org.apache.spark.internal.{config, Logging}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@LuciferYang LuciferYang Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should be printed after successful decompression. It doesn't seem appropriate to move it in finally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to chmod again after unzip? I'm not sure

@smallzhongfeng
Copy link
Contributor Author

@HyukjinKwon @LuciferYang @yikf I used the zipfile under the common package, ZipArchiveEntry.getUnixMode() can expose the permissions of the file, so I use the collection to store the permissions of the file. When the file has been created, give each file the original permissions. Could you review it ?

@LuciferYang
Copy link
Contributor

LuciferYang commented Jan 27, 2022

@zhongjingxiong Please check the failure of GA(Build and test)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refer to other references of codes? e.g., https://cs.android.com/android/platform/superproject/+/master:tools/tradefederation/core/common_util/com/android/tradefed/util/ZipUtil2.java;drc=master;l=44

It has a bunch of corner cases handling. .e.g., we will have to skip this logic if the platform is non-Unix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your suggestion is right, we really can not cover all scenarios, I refer to, for the logic of granting file permissions added judgment, for non-UNIX platforms, skip this logic.

@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Jan 28, 2022

@zhongjingxiong Please check the failure of GA(Build and test)

@LuciferYang Okay, compiled and tested.

@LuciferYang
Copy link
Contributor

Does this issue only occur in k8s?

@smallzhongfeng
Copy link
Contributor Author

Not only in k8s scenarios, it also works when we add zip files via sparkContext.addFile().

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR. Please update your PR title and description according to your code change, @zhongjingxiong .

@smallzhongfeng smallzhongfeng changed the title [SPARK-37677][CORE] Use the shell command to decompress the ZIP file [SPARK-37677][CORE] Decompress the ZIP file and retain the original file permissions Feb 5, 2022
@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Feb 8, 2022

@dongjoon-hyun @HyukjinKwon Yes, I changed the title and explained this change. Can you give me some suggestions?

@HyukjinKwon
Copy link
Member

Hey @zhongjingxiong we should make sure covering all cases (see #35278 (comment)) since this replaces an existing functionality. Every case we didn't cover here is the regression.

@smallzhongfeng
Copy link
Contributor Author

@HyukjinKwon I think this is an improvement. At least on the UNIX platform, it can realize zip decompression normally, because the zip decompression method provided by Apache common only exposes the file permissions, which is the difference from the original zip decompression method provided by Apache Hadoop. At the same time, due to the platform restrictions, for the non UNIX platform such as windows, We skipped the process of granting file permissions. Once we skipped this logic, the code I wrote is similar to the zip decompression code originally provided by Hadoop.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants