Skip to content

[SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner #43675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Nov 6, 2023

What changes were proposed in this pull request?

This pr refactor StorageUtils#bufferCleaner as follows:

  • Change the return value of bufferCleaner from DirectBuffer => Unit to ByteBuffer => Unit
  • Directly calling unsafe.invokeCleaner instead of reflecting calls

Why are the changes needed?

  1. After Scala 2.13.9, it is recommended to use the -release instead of the -target for compilation. However, due to sun.nio.ch module was not exported, this can lead to the issue of class invisibility during Java cross compilation, such as building or testing using Java 21 with -release:17 After this pr, the following compilation errors will not occur again when build core module using Java 21 with -release:17:
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala:71: object security is not a member of package sun
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: object nio is not a member of package sun
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:200: not found: type DirectBuffer
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:206: not found: type DirectBuffer
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:220: not found: type DirectBuffer
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: Unused import
  1. Direct use of unsafe.invokeCleaner provides better performance, compared to reflection calls, it is at least 30% faster

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass GitHub Actions
  • Manual check building core module using Java 21 with -release:17, no longer compilation failure logs above

Note: There is still an issue with other classes being invisible, which needs to be fixed in follow up

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft November 6, 2023 13:38
@github-actions github-actions bot added the CORE label Nov 6, 2023
@LuciferYang LuciferYang changed the title Refactor StorageUtils#bufferCleaner to avoid directly using classes under the sun package [CORE] Refactor StorageUtils#bufferCleaner to avoid directly using classes under the sun package Nov 6, 2023
@LuciferYang
Copy link
Contributor Author

Test first

buffer: DirectBuffer => cleanerMethod.invoke(unsafe, buffer)
private val bufferCleaner: ByteBuffer => Unit = {
val cleanerClass = Utils.classForName("jdk.internal.ref.Cleaner")
val directBufferClass = Utils.classForName("sun.nio.ch.DirectBuffer")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class can also be java.nio.DirectByteBuffer due to MappedByteBuffer has only two subclasses: java.nio.DirectByteBuffer and java.nio.DirectByteBufferR, and java.nio.DirectByteBufferR also inherits from java.nio.DirectByteBuffer. which one is better?

@LuciferYang LuciferYang changed the title [CORE] Refactor StorageUtils#bufferCleaner to avoid directly using classes under the sun package [SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner Nov 8, 2023
@LuciferYang LuciferYang changed the title [SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner [SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner to avoid directly using classes under the sun package Nov 8, 2023
@LuciferYang LuciferYang marked this pull request as ready for review November 8, 2023 03:33
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there is a chance of performance difference, @LuciferYang ?

@LuciferYang
Copy link
Contributor Author

Do you think there is a chance of performance difference, @LuciferYang ?

cleanMethod.invoke(cleanerMethod.invoke(buffer)) vs cleanerMethod.invoke(unsafe, buffer)?

Although the official recommendation is for the former rather than reflection, I have previously done some mircabenchmarks on Java 8 and there is no significant performance difference between the two. If necessary, I can write some more cases to compare them in Java 17.

@LuciferYang LuciferYang marked this pull request as draft November 9, 2023 05:47
@LuciferYang
Copy link
Contributor Author

convert to draft first, let me check the performance

MethodHandles.privateLookupIn(cleanerClass, MethodHandles.lookup())
val cleanMethod: MethodHandle =
cleanerLookup.findVirtual(cleanerClass, "clean", MethodType.methodType(classOf[Unit]))
buffer: ByteBuffer => cleanMethod.invoke(cleanerMethod.invoke(buffer))
Copy link
Contributor Author

@LuciferYang LuciferYang Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new method is indeed slower than the old method because there are two calls to the method handle. Let me think about what to do. @dongjoon-hyun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test several cpu models more and feedback later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun

I wrote a micro benchmark to test the initialization and invocation of bufferCleaner, observing their performance under different CPU models, including AMD EPYC 7763, E5-2673, 8171M, and E5-2673. The test data reflects the following facts:

  1. Using the methodhandle method to initialize bufferCleaner is 60%~70% slower than base implementation

  2. The performance of using the methodhandle to call is basically the same as that of base implementation, with a single call delay difference of ~1ns

  3. The current pr implementation has a performance advantage of more than 10 times over base implementation in initializing bufferCleaner

  4. The performance of the current pr implementation call is at least 30% faster than that of base implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, sun.misc.Unsafe is exported, so it can still be used directly now

@LuciferYang LuciferYang changed the title [SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner to avoid directly using classes under the sun package [SPARK-45830][CORE] Refactor StorageUtils#bufferCleaner Nov 10, 2023
@LuciferYang LuciferYang marked this pull request as ready for review November 10, 2023 03:30
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .
Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun
Copy link
Member

BTW, cc @rednaxelafx , too.

@LuciferYang
Copy link
Contributor Author

Thanks @dongjoon-hyun ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants