-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP] Deploy the assembly as an artifact #53351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dongjoon-hyun hello, can you guide me on how you do the releases so I can attach the tgz/zip to the deployment properly please? This would be a great enhancement to the release |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, Apache Spark community intentionally dropped assembly way at Apache Spark 2.0.0 due to many issues including security, @rmannibucau .
|
Let me close this because this is a great regression from Apache Spark community perspective. We can continue our discussion on the closed PR.
|
|
Personally, I don't recommend |
|
@dongjoon-hyun the assembly is what is proposed on apache spark website download so spark didnt drop anything, just dropped the automotion and central publication which is negative from an user standpoint and leads to issues in downstream usages and automotion since the download urls are not stable (it would be from central. Also note that from a security standpoint it is not worse than all apache spark distro (from the tgz of the download area to the docker image) by design. So overall I don't see why not fixing the convenient deliverable fro mmy window, will help the community and not hurt spark more than it is today since the bundles are archives automatically anyway and must be "immutable" (in the spirit since nothing is never immutable). Can you please revise it since it doesn't impact spark project more than having to push the binary(ies) on nexus? |
|
Could you give me the specific link of that part from Apache Spark website?
I thought you are trying to build a far jar like Apache Spark 1.6.x. Did I understand your question correctly? |
|
@dongjoon-hyun this is what I'm referring https://spark.apache.org/downloads.html (you know, latest and previous are not using the same link and archives.apache.org are not considered stable so both cases are broken to consume the zip/tgz).
No, I'm more trying to do a custom distro to use on local machine of ops to interact with a Spark Cluster - but I need to add a bunch of jars and props. |
|
@dongjoon-hyun any hope we work on that issue so the assembly is consummable with maven dependency plugin "natively"? Happy to adjust the PR once I know how you do release it (if manual it is fine to keep it closed and just add it in the release steps for me if you do prefer) |
|
It doesn't make sense to me because Apache Spark Standalone Cluster has the full distribution already to launch
FYI, Apache Spark provides official |
This is exactly the point to have a custom distribution, kind of include the application inside it. Why do you have spark shell in spark distribution? It is an application so must not be there from your statement, this is exactly the same. Ultimately I do not want to have to rely on the internet or a random network config (think enterprise) to download the application so I want to bundle it upfront.
This is what i'm using for most applicative use cases but I need the custom distro case for human interaction (spark sql) - once again just to make it simple.
Look at apache iceberg case, you just need the jars to use it with Apache Spark SQL Shell, why can't we make a distribution with the application prebundled to make it easier. Will also enable to customize some library version (parquet) which are conflicting and enforces you to use userClassPathFirst which has other side effects. So yes, this is needed to cover all usages even if most automated usages are, as you say, done through other ways. Side note: just out of curiosity, why do you fight that much for something trivial to do? Is there a blocker in the release to deploy the zip/targz? We do it in plenty of apache projects and every is happy. |
|
No, you don't need to do that for your applications., @rmannibucau .
Please submit your application according to the Apache Spark community guideline. For the following question,
Let's talk about the applications (including yours). Since it should be built and deployed independently, you can see that it's only |
|
@dongjoon-hyun ok, let step back cause for me application can be interactive or not but you differentiate both very strictly in your answer so let's only focus on interactive case please (sql shell). How launching If it helps here is the current not convenient at all way to setup the connection - truncated some props: side note: in real there are way more properties and even a custom catalog. one very not satisfying solution is if there is something easier I'm happy to use it instead, idea was to quickly get feedback from the iceberg data without having to write an app for that - using shell SQL one. |
What changes were proposed in this pull request?
Re-enable assembly artifacts - not that I'm not 100% sure of release process so maybe I missed something.
Not sure the link with
./dev/make-distribution.shWhy are the changes needed?
Being able to download spark distro (ideally any flavor but the base one is the most important) enables to provide custom distro with prepackaged bundles (like Apache Iceberg for ex).
--packagesoption can be tricky on some env or with some deps.Does this PR introduce any user-facing change?
No zip published, nothing breaks.
How was this patch tested?
Not tested in release process.
Was this patch authored or co-authored using generative AI tooling?
Nop.