-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDDS-1512. Implement DoubleBuffer in OzoneManager. #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bharatviswa504
commented
May 9, 2019
- In this Jira added double buffer implementation. (Not integrated to OM, this will be done in further jira's)
- Added a few response classes to give an idea, how this will be integrated with OM.
|
cc @anuengineer (I am not able to add you in reviewer, tagging here to notify about this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits:
- The license should be the same with other exists file.
- Should better add blank line at the end of file.
- Need to delete the redundancy blank line.
|
Thank You @jiwq for the review. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
linyiqun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minors review comments, please have a look, @bharatviswa504
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to separate the daemon thread from the OM double buffer, so that this class will be more flexible to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion here is to Make Deamon thread logic in a separate class, and instantiate that class in OMDoubleBuffer and start the thread in OMDoubleBuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I mean not to start flushTransactions thread in OMDoubleBuffer. It's would be better to let the caller of function flushTransactions outside of the class OMDoubleBuffer. How the flush transactions behavior should be triggered by outside not by itself I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process of doing this way is, background flush will be continuously happening if currentBuffer has any entries so that we don't wait for the buffer to be full. So, that when a restart happens we don't need to apply the transactions again to OM DB. And in this way we keep disks busy.
In the other way as suggested if we do, when the buffer is full, swap and start flush, and keep further transactions adding to the currentBuffer, but if flush is taking much time, and this buffer happened to be full, we should wait until the first flush to be completed, and we might block all further transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this bounded and have a force sync function like edits double buffer did? Unbounded queue is uncontrollable, we might pass the init buffer size as a bounded value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added using the bounded buffer as TODO.
So your suggestion is in add check if it is full have a flush() and then add to buffer?
If we do this way that will penalize the transaction where when buffer is full, we need to wait for ongoing flush to be completed and then add this entry in to buffer. And also this will block all further requests received after buffer is full.
Let me know if I am missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can setReadyBuffer if we check the buffer if full, so this will not block requests long time. Then background will flush ready buffer. Only when the ready buffer is flushing meanwhile current buffer is full, the request will be blocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can setReadyBuffer if we check the buffer if full, so this will not block requests long time. Then background will flush ready buffer. Only when the ready buffer is flushing meanwhile current buffer is full, the request will be blocked.
One disadvantage I see is if flush is taking a too long time, and currentBuffer can grow very high and may cause OOM. So, if we use a fixed size buffer we might not see but we will block further transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the tradeoffs of both approaches. Why don't benchmark this and see the maximum length this queue grows under load?. One suggestion based on @linyiqun thought is that it would be good to have a counter to see the max length of this queue. So we can at least identify and track this issue if it really pops up in the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unbounded nature will be a problem only if the disk is really slow i.e. flush call blocks. Since we are always having a flush operation outstanding.
So the tradeoff is unbounded queue vs blocking RPC handlers. I am okay with unbounded queue to start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since the Error handler is same you can put both addToRockDB and commitBatchOperation in a single try block and follow with the exception block, do change the error message to be more specific and leave the code as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this try because it is running inside forEach.
Modified the logic, and moved common code to a new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we make this a Debug instead of info. There will be many in OM. But on the other hand, this might be ok, not sure. @arp7 What do think ? will this flood the logs and slow us down? Based on Namenode experience what would be your call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should be a debug log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the tradeoffs of both approaches. Why don't benchmark this and see the maximum length this queue grows under load?. One suggestion based on @linyiqun thought is that it would be good to have a counter to see the max length of this queue. So we can at least identify and track this issue if it really pops up in the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharatviswa504 why is this change a part of the DoubleBuffer patch? This patch should just introduce the DoubleBuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added them for testing purpose to test DoubleBuffer Implementation.
Actual usage and the classes will be changed further according to needs of HA in further Jira.
In this PR #827 handled bucket related operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This should be moved to a subsequent patch right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added them for testing purpose to test DoubleBuffer Implementation.
Actual usage and the classes will be changed further according to needs of HA in further Jira.
In this PR #827 handled bucket related operations.
arp7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, a few early comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should be a debug log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharatviswa504 can you add a slightly more descriptive comment. Maybe 3-4 lines on the purpose of the double buffer and some high level implementation note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short description of this flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this flag, as sync to DB is also happening in the same thread.
Removed the usage of the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is spinning the CPU in a tight loop. It should wait until the current flush completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing to disk is happening this while loop, so during flush, this while loop will not be iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will spin in a tight loop otherwise E.g. when system is idle. That will consume CPU and power. It will be better to use a wait and notify model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Updated the code to use wait, notify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume we are using RocksDB here. We may need to update the function name to addToDbBatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the method name to addToDbBatch as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss the synchronization. It probably needs to be fixed.
f9c7a98 to
4719943
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Thank You @arp7 for the review and offline discussion. |
Thank You @anuengineer for the review. |
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
|
Test failures are not related to this patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If multiple test methods are calling setup(), it should probably be a @Before method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short javadoc to each method to summarize what it is testing? It is a little hard to understand what the test is doing from reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need javadoc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move tests which depend on OM response classes to a separate test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Thank You @arp7 for the review. |
|
💔 -1 overall
This message was automatically generated. |
|
Test failures are fixed in the last commit. Now tests are passing. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
+1 LGTM. |
|
Thank You, everyone, for the review. @linyiqun provided an explanation for doing this way. If you have any more comments, we can open new Jira and address them. |
Currently there was a bug when registering a timer with a very short amount of delay, it might not be invoked since it depends on the creation of the run loop. This patch fixed the problem by double checking the ready timers when run loop is created (listener is registered.) Author: xinyuiscool <[email protected]> Reviewers: Prateek M <[email protected]> Closes apache#810 from xinyuiscool/SAMZA-1707