Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 17, 2020

What changes were proposed in this pull request?

This PR aims to upgrade Kafka library to 2.5.0 for Apache Spark 3.1.0.

Why are the changes needed?

Apache Kafka 2.5.0 client has improvements and bug fixes like KAFKA-9241

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the Jenkins with the existing tests.

@SparkQA
Copy link

SparkQA commented Apr 17, 2020

Test build #121386 has finished for PR 28235 at commit 85c8370.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Apr 17, 2020

Just curious, did you hit the issue KAFKA-9241 in production or likewise?

Because if KAFKA-9241 is worth to consider as outstanding issue, then we probably may want to get this in Spark 3.0.0 as well. If the affected version on KAFKA-9241 is correct - KIP-368 seemed to be introduced in Kafka 2.2 so likely - then it would affect starting from Spark 3.0.0, not Spark 2.x.

(Yeah I also waited for Kafka 2.5.0 to remove the usage of poll(0) - SPARK-28367 so the change itself is welcome.)

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Apr 17, 2020

cc. @gaborgsomogyi as the author of #25135 (SPARK-28367)

@SparkQA

This comment has been minimized.

@HyukjinKwon
Copy link
Member

retest this please

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending tests)

@SparkQA

This comment has been minimized.

@gaborgsomogyi
Copy link
Contributor

gaborgsomogyi commented Apr 17, 2020

@dongjoon-hyun BTW, don't we need to modify this?

<!-- Kafka embedded server uses Zookeeper 3.4.7 API -->

2.5.0 uses 3.5.7 zookeeper...

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 17, 2020

Thank you, @HeartSaVioR , @HyukjinKwon , @gaborgsomogyi .
To @HeartSaVioR , I didn't hit KAFKA-9241 yet personally. I'm only aware of the situation.
To @gaborgsomogyi , I updated the test dependency, too.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 Apr 17, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 Apr 17, 2020
@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 17, 2020

The failure in the above SBT Jenkins run seems to be a known flaky test, CliSuite.path command. And, all Kafka tests passed in the same run.

@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 18, 2020

It seems that our Maven Jenkins build grows again. The master branch maven Java/Scala only build takes 6 hr 16 min. So, PR builder takes more time.

There is another timeout failure in another PR, too.

I will re-trigger this PR after midnight (PST).

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 Apr 18, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA

This comment has been minimized.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 Apr 18, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 Apr 18, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2020

Test build #121457 has finished for PR 28235 at commit 7f47570.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2020

Test build #121455 has finished for PR 28235 at commit 6d5785b.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2020

Test build #121456 has finished for PR 28235 at commit 7f47570.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 Apr 18, 2020
@dongjoon-hyun
Copy link
Member Author

Retest this please.

1 similar comment
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Apr 19, 2020

Test build #121462 has finished for PR 28235 at commit 7f47570.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2020

Test build #121463 has finished for PR 28235 at commit 7f47570.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The above job passed all Scala/Java/Python tests. It's timeout at R testing which is irrelevant.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Apr 19, 2020

Test build #121472 has finished for PR 28235 at commit 7f47570.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31464][BUILD][SS][test-maven] Upgrade Kafka to 2.5.0 [SPARK-31464][BUILD][SS] Upgrade Kafka to 2.5.0 Apr 19, 2020
@dongjoon-hyun
Copy link
Member Author

As I wrote in the PR description, the following is the current status.

@dongjoon-hyun
Copy link
Member Author

Hi, @viirya , @HyukjinKwon , @maropu .
Could you review this PR, please?

  • For me, the test coverage looks good to me because Maven build is only different at Java/Scala part. I'd like to merge this to master.
  • Like PR builders, the Jenkins Maven jobs also looks a little on the edge (~ 6 hours 16 mins). I guess it will hit timeout in the future. However, as of now, this PR doesn't add another tests. So, merging this PR will be okay for now.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @viirya .
Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-KAFKA-2.5 branch April 19, 2020 17:51
@dongjoon-hyun
Copy link
Member Author

Thank you all again!

@maropu
Copy link
Member

maropu commented Apr 20, 2020

Thanks for pinging me, @dongjoon-hyun, late LGTM.

holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
### What changes were proposed in this pull request?

This PR aims to upgrade Kafka library to 2.5.0 for Apache Spark 3.1.0.

### Why are the changes needed?

Apache Kafka 2.5.0 client has improvements and bug fixes like [KAFKA-9241](https://issues.apache.org/jira/browse/KAFKA-9241)
- https://downloads.apache.org/kafka/2.5.0/RELEASE_NOTES.html

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

- [x] SBT apache#28235 (comment)
- [x] Maven apache#28235 (comment) (All Scala/Java/Python/R UT tests passed. It's timeout during R installation testing which is already covered by SBT.)

Closes apache#28235 from dongjoon-hyun/SPARK-KAFKA-2.5.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
### What changes were proposed in this pull request?

This PR aims to upgrade Kafka library to 2.5.0 for Apache Spark 3.1.0.

### Why are the changes needed?

Apache Kafka 2.5.0 client has improvements and bug fixes like [KAFKA-9241](https://issues.apache.org/jira/browse/KAFKA-9241)
- https://downloads.apache.org/kafka/2.5.0/RELEASE_NOTES.html

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

- [x] SBT apache#28235 (comment)
- [x] Maven apache#28235 (comment) (All Scala/Java/Python/R UT tests passed. It's timeout during R installation testing which is already covered by SBT.)

Closes apache#28235 from dongjoon-hyun/SPARK-KAFKA-2.5.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Dec 7, 2023
…m the `streaming-kafka-0-10` and `sql-kafka-0-10` modules

### What changes were proposed in this pull request?
SPARK-31464 | #28235 specifies that the Zookeeper version used for testing in the `streaming-kafka-0-10` and `sql-kafka-0-10` modules is 3.5.7, while at that time, the default Zookeeper version used by Spark was 3.4.14.

In SPARK-45956, the default Zookeeper version used by Spark was upgraded to 3.9.1, which is higher than both 3.5.7 and 3.6.4, the latter being used by the Kafka 3.4.1 embedded server. This PR try to make the `streaming-kafka-0-10` and `sql-kafka-0-10` modules also use the project's default Zookeeper version(3.9.1) for testing, rather than relying on a special version.

### Why are the changes needed?
Make the `streaming-kafka-0-10` and `sql-kafka-0-10` modules use the default Zookeeper version of Spark for testing.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44230 from LuciferYang/kafka-zk.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…m the `streaming-kafka-0-10` and `sql-kafka-0-10` modules

### What changes were proposed in this pull request?
SPARK-31464 | apache#28235 specifies that the Zookeeper version used for testing in the `streaming-kafka-0-10` and `sql-kafka-0-10` modules is 3.5.7, while at that time, the default Zookeeper version used by Spark was 3.4.14.

In SPARK-45956, the default Zookeeper version used by Spark was upgraded to 3.9.1, which is higher than both 3.5.7 and 3.6.4, the latter being used by the Kafka 3.4.1 embedded server. This PR try to make the `streaming-kafka-0-10` and `sql-kafka-0-10` modules also use the project's default Zookeeper version(3.9.1) for testing, rather than relying on a special version.

### Why are the changes needed?
Make the `streaming-kafka-0-10` and `sql-kafka-0-10` modules use the default Zookeeper version of Spark for testing.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44230 from LuciferYang/kafka-zk.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants