Skip to content
This repository was archived by the owner on Jan 29, 2022. It is now read-only.

Commit b78919d

Browse files
committed
Version 1.2.0-rc1
1 parent c60259a commit b78919d

File tree

3 files changed

+36
-35
lines changed

3 files changed

+36
-35
lines changed

README.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22

33
##Purpose
44

5-
The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem.
5+
The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem.
66

77
Current stable release: **1.1**
8-
Current unstable release: **1.2.0-rc0**
8+
9+
Current unstable release: **1.2.0-rc1**
910

1011
## Features
1112

@@ -19,7 +20,7 @@ Current unstable release: **1.2.0-rc0**
1920
## Download
2021

2122
* 0.20.x
22-
23+
2324
* [core](https://s3.amazonaws.com/drivers.mongodb.org/hadoop/mongo-hadoop-core_0.20.205.0-1.1.0.jar)
2425
* [pig support](https://s3.amazonaws.com/drivers.mongodb.org/hadoop/mongo-hadoop-pig_0.20.205.0-1.1.0.jar)
2526
* [hive support](https://s3.amazonaws.com/drivers.mongodb.org/hadoop/mongo-hadoop-hive_0.20.205.0-1.1.0.jar)
@@ -84,7 +85,7 @@ After successfully building, you must copy the jars to the lib directory on each
8485
Does **not** support Hadoop Streaming.
8586

8687
Build using `"1.0"` or `"1.0.x"`
87-
88+
8889
* ###Apache Hadoop 1.1
8990
Includes support for Hadoop Streaming.
9091

@@ -93,38 +94,38 @@ After successfully building, you must copy the jars to the lib directory on each
9394

9495
* ###Apache Hadoop 0.20.*
9596
Does **not** support Hadoop Streaming
96-
97+
9798
Includes Pig 0.9.2.
98-
99+
99100
Build using `"0.20"` or `"0.20.x"`
100-
101+
101102
* ###Apache Hadoop 0.23
102103
Includes Pig 0.9.2.
103-
104+
104105
Includes support for Streaming
105-
106+
106107
Build using `"0.23"` or `"0.23.x"`
107108

108109
* ###Apache Hadoop 0.21
109110
Includes Pig 0.9.1
110-
111+
111112
Includes support for Streaming
112-
113+
113114
Build using `"0.21"` or `"0.21.x"`
114115

115116
* ###Cloudera Distribution for Hadoop Release 3
116117
This is derived from Apache Hadoop 0.20.2 and includes custom patches.
117-
118+
118119
Includes support for streaming and Pig 0.8.1.
119120

120121
Build with `"cdh3"`
121122

122123
* ###Cloudera Distribution for Hadoop Release 4
123-
124+
124125
This is the newest release from Cloudera which is based on Apache Hadoop 2.0. The newer MR2/YARN APIs are not yet supported, but MR1 is still fully compatible.
125-
126+
126127
Includes support for Streaming and Pig 0.11.1.
127-
128+
128129
Build with `"cdh4"`
129130

130131
## Configuration
@@ -161,7 +162,7 @@ For examples on using Pig with the MongoDB Connector for Hadoop, also refer to t
161162

162163
## Notes for Contributors
163164

164-
If your code introduces new features, please add tests that cover them if possible and make sure that the existing test suite still passes. If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details and we will try to help.
165+
If your code introduces new features, please add tests that cover them if possible and make sure that the existing test suite still passes. If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details and we will try to help.
165166

166167
### Maintainers
167168
Mike O'Brien ([email protected])

project/MongoHadoopBuild.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import AssemblyKeys._
88
object MongoHadoopBuild extends Build {
99

1010
lazy val buildSettings = Seq(
11-
version := "1.2.0-rc0",
11+
version := "1.2.0-rc1",
1212
crossScalaVersions := Nil,
1313
crossPaths := false,
1414
organization := "org.mongodb"
@@ -331,7 +331,7 @@ object MongoHadoopBuild extends Build {
331331
println("*** Adding Hive Dependency for Version '%s'".format(hiveVersion))
332332

333333
Seq(
334-
"org.apache.hive" % "hive-serde" % hiveVersion,
334+
"org.apache.hive" % "hive-serde" % hiveVersion,
335335
"org.apache.hive" % "hive-exec" % hiveVersion
336336
)
337337
}
@@ -366,7 +366,7 @@ object Resolvers {
366366
object Dependencies {
367367
val mongoJavaDriver = "org.mongodb" % "mongo-java-driver" % "2.11.3"
368368
val hiveSerDe = "org.apache.hive" % "hive-serde" % "0.10.0"
369-
val hiveExec = "org.apache.hive" % "hive-exec" % "0.10.0"
369+
val hiveExec = "org.apache.hive" % "hive-exec" % "0.10.0"
370370
val junit = "junit" % "junit" % "4.10" % "test"
371371
val flume = "com.cloudera" % "flume-core" % "0.9.4-cdh3u3"
372372
val casbah = "org.mongodb" %% "casbah" % "2.3.0"

testing/run_treasury.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
CLEANUP_TMP=os.environ.get('CLEANUP_TMP', True)
2424
HADOOP_HOME=os.environ['HADOOP_HOME']
2525
HADOOP_RELEASE=os.environ.get('HADOOP_RELEASE',None)
26-
AWS_SECRET=os.environ.get('AWS_SECRET',None)
27-
AWS_ACCESSKEY=os.environ.get('AWS_ACCESSKEY',None)
26+
AWS_SECRET=os.environ.get('AWS_SECRET',None)
27+
AWS_ACCESSKEY=os.environ.get('AWS_ACCESSKEY',None)
2828
TEMPDIR=os.environ.get('TEMPDIR','/tmp')
2929
USE_ASSEMBLY=os.environ.get('USE_ASSEMBLY', True)
3030
num_runs = 0
@@ -35,7 +35,7 @@
3535

3636
#declare -a job_args
3737
#cd ..
38-
VERSION_SUFFIX = "1.2.0-rc0"
38+
VERSION_SUFFIX = "1.2.0-rc1"
3939

4040

4141
def generate_id(size=6, chars=string.ascii_uppercase + string.digits):
@@ -66,7 +66,7 @@ def generate_jar_name(prefix, version_suffix):
6666
streaming_jar_name = generate_jar_name("mongo-hadoop-streaming", VERSION_SUFFIX);
6767

6868
# result set for sanity check#{{{
69-
check_results = [ { "_id": 1990, "count": 250, "avg": 8.552400000000002, "sum": 2138.1000000000004 },
69+
check_results = [ { "_id": 1990, "count": 250, "avg": 8.552400000000002, "sum": 2138.1000000000004 },
7070
{ "_id": 1991, "count": 250, "avg": 7.8623600000000025, "sum": 1965.5900000000006 },
7171
{ "_id": 1992, "count": 251, "avg": 7.008844621513946, "sum": 1759.2200000000005 },
7272
{ "_id": 1993, "count": 250, "avg": 5.866279999999999, "sum": 1466.5699999999997 },
@@ -87,7 +87,7 @@ def generate_jar_name(prefix, version_suffix):
8787
{ "_id": 2008, "count": 251, "avg": 3.6642629482071714, "sum": 919.73 },
8888
{ "_id": 2009, "count": 250, "avg": 3.2641200000000037, "sum": 816.0300000000009 },
8989
{ "_id": 2010, "count": 189, "avg": 3.3255026455026435, "sum": 628.5199999999996 } ]#}}}
90-
90+
9191
def compare_results(collection, reference=check_results):
9292
output = list(collection.find().sort("_id"))
9393
if len(output) != len(reference):
@@ -98,7 +98,7 @@ def compare_results(collection, reference=check_results):
9898
#round to account for slight changes due to precision in case ops are run in different order.
9999
if doc['_id'] != reference[i]['_id'] or \
100100
doc['count'] != reference[i]['count'] or \
101-
round(doc['avg'], 7) != round(reference[i]['avg'], 7):
101+
round(doc['avg'], 7) != round(reference[i]['avg'], 7):
102102
print "docs do not match", doc, reference[i]
103103
return False
104104
return True
@@ -177,16 +177,16 @@ def runjob(hostname, params, input_collection='mongo_hadoop.yield_historical.in'
177177
cmd.append("-D")
178178
cmd.append(key + "=" + val)
179179

180-
181-
#if it's not set, assume that the test is
180+
181+
#if it's not set, assume that the test is
182182
# probably setting it in some other property (e.g. multi collection)
183183
if input_collection:
184184
cmd.append("-D")
185185
if type(input_collection) == type([]):
186186
input_uri = " ".join('mongodb://%s/%s?readPreference=%s' % (hostname, x, readpref) for x in input_collection)
187187
input_uri = '"' + input_uri + '"'
188188
else:
189-
input_uri = 'mongodb://%s%s/%s?readPreference=%s' % (input_auth + "@" if input_auth else '', hostname, input_collection, readpref)
189+
input_uri = 'mongodb://%s%s/%s?readPreference=%s' % (input_auth + "@" if input_auth else '', hostname, input_collection, readpref)
190190
cmd.append("mongo.input.uri=%s" % input_uri)
191191

192192
cmd.append("-D")
@@ -225,13 +225,13 @@ def runbsonjob(input_path, params, hostname,
225225

226226
print cmd
227227
subprocess.call(' '.join(cmd), shell=True)
228-
228+
229229

230230
def runstreamingjob(hostname, params, input_collection='mongo_hadoop.yield_historical.in',
231231
output_collection='mongo_hadoop.yield_historical.out',
232232
readpref="primary",
233233
input_auth=None,
234-
output_auth=None,
234+
output_auth=None,
235235
inputpath='file://' + os.path.join(TEMPDIR, 'in'),
236236
outputpath='file://' + os.path.join(TEMPDIR, 'out'),
237237
inputformat='com.mongodb.hadoop.mapred.MongoInputFormat',
@@ -250,9 +250,9 @@ def runstreamingjob(hostname, params, input_collection='mongo_hadoop.yield_histo
250250
cmd += ["-inputformat",inputformat]
251251
cmd += ["-outputformat",outputformat]
252252
cmd += ["-io", 'mongodb']
253-
input_uri = 'mongodb://%s%s/%s?readPreference=%s' % (input_auth + "@" if input_auth else '', hostname, input_collection, readpref)
253+
input_uri = 'mongodb://%s%s/%s?readPreference=%s' % (input_auth + "@" if input_auth else '', hostname, input_collection, readpref)
254254
cmd += ['-jobconf', "mongo.input.uri=%s" % input_uri]
255-
output_uri = "mongo.output.uri=mongodb://%s%s/%s" % (output_auth + "@" if output_auth else '', hostname, output_collection)
255+
output_uri = "mongo.output.uri=mongodb://%s%s/%s" % (output_auth + "@" if output_auth else '', hostname, output_collection)
256256
cmd += ['-jobconf', output_uri]
257257
cmd += ['-jobconf', 'stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.MongoIdentifierResolver']
258258

@@ -518,7 +518,7 @@ def test_treasury(self):
518518
#logging.info(doc['_id'], "was on shard ", self.shard1.name, "now on ", self.shard2.name)
519519
#print "inserting", doc
520520
destination['mongo_hadoop']['yield_historical.in'].insert(doc, safe=True)
521-
521+
522522
PARAMETERS = DEFAULT_PARAMETERS.copy()
523523
PARAMETERS['mongo.input.split.allow_read_from_secondaries'] = 'true'
524524
PARAMETERS['mongo.input.split.read_from_shards'] = 'true'
@@ -574,7 +574,7 @@ def setUp(self):
574574
self.temp_outdir = tempfile.mkdtemp(prefix='hadooptest_', dir=TEMPDIR)
575575
mongo_manager.mongo_dump(self.server_hostname, "mongo_hadoop",
576576
"yield_historical.in", self.temp_outdir)
577-
577+
578578

579579
def tearDown(self):
580580
logging.info("TestStaticBSON teardown")
@@ -756,7 +756,7 @@ def test_treasury(self):
756756
logging.info("Testing standalone with authentication on")
757757
x = self.server.connection()['admin'].add_user("test_user","test_pw", roles=["clusterAdmin", "readWriteAnyDatabase"])
758758
PARAMETERS = DEFAULT_PARAMETERS.copy()
759-
PARAMETERS['mongo.auth.uri'] = 'mongodb://%s:%s@%s/admin' % ('test_user', 'test_pw', self.server_hostname)
759+
PARAMETERS['mongo.auth.uri'] = 'mongodb://%s:%s@%s/admin' % ('test_user', 'test_pw', self.server_hostname)
760760
runjob(self.server_hostname, PARAMETERS)
761761

762762
server_connection = self.server.connection()

0 commit comments

Comments
 (0)