-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16294: Enable access to input options by DistCp subclasses #796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
| } | ||
|
|
||
| /** | ||
| * Returns the input options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a "." at the end or javadoc is unhappy
|
-1, compile failing |
|
💔 -1 overall
This message was automatically generated. |
Fixed ( noslowerdna@0301c83 ), I didn't realize that the trunk code had changed from the version I'd been working with. We would want subclasses to have access to the |
|
💔 -1 overall
This message was automatically generated. |
|
yetus isn't reviewing this again, is it? |
|
...even if yetus is silent, patch LGTM. @noslowerdna once you are happy with these changes are working for what you are doing with distcp, I'm happy to merge it in |
|
Thank you @steveloughran - I'm happy with this patch. |
|
+1 committed to branch-3.2+, can go earlier if you want |
This PR makes the following changes * Enriches StreamPartitionCountMonitor to periodically monitor input-regexes to match to actual inputs and stop the job when a new input stream is discovered. * Add a new API to SysAdmin to allow listing of all streams, e.g., Kafka-topics. KafkaSysAdmin implementation of this uses KafkaConsumer's listTopics API. (Even if listTopics had 1 million topics with 100 bytes per topic total, temporary memory overhead will be 100 MB). * Added config job.coordinator.monitor-input-regex.frequency.ms for the monitoring frequency, and job.coordinator.monitor-input-regex.%s for each input system. Users can then choose desired regex for each input system, e.g., job.coordinator.monitor-input-regex.kafka=test-.*. * We can later enrich RegexTopicGen rewriter to add a monitor-input-regex config to allow periodic jonitoring * Tested: Unit test for SPCM and tested with test jobs on local grid. Author: Ray Matharu <[email protected]> Reviewers: Jagadish<[email protected]> Closes apache#796 from rmatharu/newtopic-test
HADOOP-16294
Adding a protected-scope getter for the DistCpOptions, so that a subclass does not need to save its own copy of the inputOptions supplied to its constructor, if it wishes to override the createInputFileListing method with logic similar to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.