Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1721 commits
Select commit Hold shift + click to select a range
e07baf1
[SPARK-17001][ML] Enable standardScaler to standardize sparse vectors…
srowen Aug 27, 2016
095862a
[SPARK-17271][SQL] Planner adds un-necessary Sort even if child order…
tejasapatil Aug 28, 2016
1a48c00
[BUILD] Closes some stale PRs.
srowen Aug 29, 2016
08913ce
fixed a typo
Aug 29, 2016
6a0fda2
[SPARKR][MINOR] Fix LDA doc
junyangq Aug 29, 2016
48caec2
[SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hiv…
Aug 29, 2016
736a791
[SPARK-16581][SPARKR] Make JVM backend calling functions public
shivaram Aug 29, 2016
48b459d
[SPARK-17301][SQL] Remove unused classTag field from AtomicType base …
JoshRosen Aug 30, 2016
8fb445d
[SPARK-17303] Added spark-warehouse to dev/.rat-excludes
frreiss Aug 30, 2016
94922d7
[SPARK-17289][SQL] Fix a bug to satisfy sort requirements in partial …
maropu Aug 30, 2016
bca79c8
[SPARK-17234][SQL] Table Existence Checking when Index Table with the…
gatorsmile Aug 30, 2016
2d76cb1
[SPARK-17276][CORE][TEST] Stop env params output on Jenkins job page
keypointt Aug 30, 2016
befab9c
[SPARK-17264][SQL] DataStreamWriter should document that it only supp…
srowen Aug 30, 2016
d4eee99
[MINOR][DOCS] Fix minor typos in python example code
silentsokolov Aug 30, 2016
2720925
[MINOR][MLLIB][SQL] Clean up unused variables and unused import
keypointt Aug 30, 2016
4b4e329
[SPARK-5682][CORE] Add encrypted shuffle in spark
Aug 30, 2016
fb20084
[SPARK-17304] Fix perf. issue caused by TaskSetManager.abortIfComplet…
JoshRosen Aug 30, 2016
02ac379
[SPARK-17314][CORE] Use Netty's DefaultThreadFactory to enable its fa…
zsxwing Aug 30, 2016
f7beae6
[SPARK-17243][WEB UI] Spark 2.0 History Server won't load with very l…
ajbozarth Aug 30, 2016
231f973
[SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with …
zsxwing Aug 31, 2016
d92cd22
[SPARK-15985][SQL] Eliminate redundant cast from an array without nul…
kiszk Aug 31, 2016
fa63479
[SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command …
zjffdu Aug 31, 2016
12fd0cd
[SPARK-17180][SPARK-17309][SPARK-17323][SQL] create AlterViewAsComman…
cloud-fan Aug 31, 2016
9953442
[MINOR][SPARKR] Verbose build comment in WINDOWS.md rather than promo…
HyukjinKwon Aug 31, 2016
0611b3a
[SPARK-17320] add build_profile_flags entry to mesos build module
Aug 31, 2016
9bcb33c
[SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor…
zsxwing Aug 31, 2016
5d84c7f
[SPARK-17332][CORE] Make Java Loggers static members
srowen Aug 31, 2016
50bb142
[SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be …
HyukjinKwon Aug 31, 2016
d375c8a
[SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite
zsxwing Aug 31, 2016
2f9c273
[SPARK-16581][SPARKR] Fix JVM API tests in SparkR
shivaram Aug 31, 2016
d008638
[SPARKR][MINOR] Fix windowPartitionBy example
junyangq Sep 1, 2016
7a5000f
[SPARK-17241][SPARKR][MLLIB] SparkR spark.glm should have configurabl…
keypointt Sep 1, 2016
aaf632b
revert PR#10896 and PR#14865
cloud-fan Sep 1, 2016
21c0a4f
[SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with …
zsxwing Sep 1, 2016
536fa91
[SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code cha…
srowen Sep 1, 2016
a18c169
[SPARK-16283][SQL] Implements percentile_approx aggregation function …
clockfly Sep 1, 2016
dd859f9
fixed typos
Sep 1, 2016
1f06a5b
[SPARK-17353][SPARK-16943][SPARK-16942][SQL] Fix multiple bugs in CRE…
gatorsmile Sep 1, 2016
8e740ae
[SPARK-17257][SQL] the physical plan of CREATE TABLE or CTAS should t…
cloud-fan Sep 1, 2016
adaaffa
[SPARK-17271][SQL] Remove redundant `semanticEquals()` from `SortOrder`
tejasapatil Sep 1, 2016
a0aac4b
[SPARK-16533][CORE] resolve deadlocking in driver when executors die
angolon Sep 1, 2016
2be5f8d
[SPARK-17263][SQL] Add hexadecimal literal parsing
hvanhovell Sep 1, 2016
3893e8c
[SPARK-17331][CORE][MLLIB] Avoid allocating 0-length arrays
srowen Sep 1, 2016
edb4573
[SPARK-16533][HOTFIX] Fix compilation on Scala 2.10.
Sep 1, 2016
473d786
[SPARK-16926] [SQL] Remove partition columns from partition metadata.
bchocho Sep 1, 2016
e388bd5
[SPARK-16732][SQL] Remove unused codes in subexpressionEliminationFor…
Sep 1, 2016
d314677
[SPARK-16461][SQL] Support partition batch pruning with `<=>` predica…
HyukjinKwon Sep 1, 2016
15539e5
[SPARK-17355] Workaround for HIVE-14684 / HiveResultSetMetaData.isSig…
JoshRosen Sep 1, 2016
03d77af
[SPARK-16525] [SQL] Enable Row Based HashMap in HashAggregateExec
ooq Sep 1, 2016
5bea875
[SPARK-16619] Add shuffle service metrics entry in monitoring docs
lovexi Sep 2, 2016
06e3398
[SPARK-16302][SQL] Set the right number of partitions for reading dat…
lianhuiwang Sep 2, 2016
f2d6e2e
[SPARK-16926][SQL] Add unit test to compare table and partition colum…
bchocho Sep 2, 2016
2ab8dbd
[SPARK-17342][WEBUI] Style of event timeline is broken
sarutak Sep 2, 2016
0f30cde
[SPARK-16883][SPARKR] SQL decimal type is not properly cast to number…
wangmiao1981 Sep 2, 2016
6969dcc
[SPARK-15509][ML][SPARKR] R MLlib algorithms should support input col…
keypointt Sep 2, 2016
a3097e2
[SQL][DOC][MINOR] Add (Scala-specific) and (Java-specific)
jaceklaskowski Sep 2, 2016
7ee24da
[SPARK-17352][WEBUI] Executor computing time can be negative-number b…
sarutak Sep 2, 2016
247a4fa
[SPARK-16935][SQL] Verification of Function-related ExternalCatalog APIs
gatorsmile Sep 2, 2016
806d8a8
[SPARK-16984][SQL] don't try whole dataset immediately when first par…
Sep 2, 2016
6bcbf9b
[SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utilit…
JoshRosen Sep 2, 2016
ea66228
[SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkCont…
zjffdu Sep 2, 2016
812333e
[SPARK-17376][SPARKR] Spark version should be available in R
felixcheung Sep 2, 2016
419eefd
[SPARKR][DOC] regexp_extract should doc that it returns empty string …
felixcheung Sep 2, 2016
e79962f
[SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rol…
Sep 2, 2016
eac1d0e
[SPARK-17376][SPARKR] followup - change since version
felixcheung Sep 2, 2016
ed9c884
[SPARK-17230] [SQL] Should not pass optimized query into QueryExecuti…
Sep 2, 2016
a2c9acb
[SPARK-16334] Reusing same dictionary column for decoding consecutive…
sameeragarwal Sep 2, 2016
e6132a6
[SPARK-17298][SQL] Require explicit CROSS join for cartesian products
srinathshankar Sep 2, 2016
d2fde6b
[SPARKR][MINOR] Fix docs for sparkR.session and count
junyangq Sep 3, 2016
7a8a81d
[SPARK-17363][ML][MLLIB] fix MultivariantOnlineSummerizer.numNonZeros
WeichenXu123 Sep 3, 2016
97da410
[SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect…
CodingCat Sep 3, 2016
a8a35b3
[MINOR][SQL] Not dropping all necessary tables
techaddict Sep 3, 2016
c2a1576
[SPARK-17335][SQL] Fix ArrayType and MapType CatalogString.
hvanhovell Sep 3, 2016
abb2f92
[SPARK-17315][SPARKR] Kolmogorov-Smirnov test SparkR wrapper
junyangq Sep 3, 2016
e9b58e9
[SPARK-16829][SPARKR] sparkR sc.setLogLevel doesn't work
wangmiao1981 Sep 3, 2016
6b156e2
[SPARK-17324][SQL] Remove Direct Usage of HiveClient in InsertIntoHiv…
gatorsmile Sep 4, 2016
e75c162
[SPARK-17308] Improved the spark core code by replacing all pattern m…
shiv4nsh Sep 4, 2016
cdeb97a
[SPARK-17311][MLLIB] Standardize Python-Java MLlib API to accept opti…
srowen Sep 4, 2016
1b001b5
[MINOR][ML][MLLIB] Remove work around for breeze sparse matrix.
yanboliang Sep 4, 2016
c1e9a6d
[SPARK-17393][SQL] Error Handling when CTAS Against the Same Data Sou…
gatorsmile Sep 5, 2016
3ccb23e
[SPARK-17394][SQL] should not allow specify database in table/view na…
cloud-fan Sep 5, 2016
6d86403
[SPARK-17072][SQL] support table-level statistics generation and stor…
Sep 5, 2016
8d08f43
[SPARK-17279][SQL] better error message for exceptions during ScalaUD…
cloud-fan Sep 6, 2016
afb3d5d
[SPARK-17369][SQL] MetastoreRelation toJSON throws AssertException du…
clockfly Sep 6, 2016
64e826f
[SPARK-17358][SQL] Cached table(parquet/orc) should be shard between …
watermen Sep 6, 2016
c0ae6bc
[SPARK-17361][SQL] file-based external table without path should not …
cloud-fan Sep 6, 2016
6f13aa7
[SPARK-17356][SQL] Fix out of memory issue when generating JSON for T…
clockfly Sep 6, 2016
39d538d
[MINOR][ML] Correct weights doc of MultilayerPerceptronClassification…
yanboliang Sep 6, 2016
bc2767d
[SPARK-17374][SQL] Better error messages when parsing JSON using Data…
clockfly Sep 6, 2016
f7e26d7
[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable…
Sep 6, 2016
6c08dbf
[SPARK-17378][BUILD] Upgrade snappy-java to 1.1.2.6
a-roberts Sep 6, 2016
7775d9f
[SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other tha…
techaddict Sep 6, 2016
8bbb08a
[MINOR] Remove unnecessary check in MLSerDe
zhengruifeng Sep 6, 2016
29cfab3
[SPARK-17110] Fix StreamCorruptionException in BlockManager.getRemote…
JoshRosen Sep 6, 2016
4f769b9
[SPARK-17296][SQL] Simplify parser join processing.
hvanhovell Sep 6, 2016
0bd00ff
[SPARK-15891][YARN] Clean up some logging in the YARN AM.
Sep 6, 2016
175b434
[SPARK-17316][CORE] Fix the 'ask' type parameter in 'removeExecutor'
zsxwing Sep 6, 2016
c07cbb3
[SPARK-17371] Resubmitted shuffle outputs can get deleted by zombie m…
ericl Sep 6, 2016
a40657b
[SPARK-17408][TEST] Flaky test: org.apache.spark.sql.hive.StatisticsS…
gatorsmile Sep 7, 2016
d6eede9
[SPARK-17238][SQL] simplify the logic for converting data source tabl…
cloud-fan Sep 7, 2016
eb1ab88
[SPARK-17372][SQL][STREAMING] Avoid serialization issues by using Arr…
tdas Sep 7, 2016
9fccde4
[SPARK-16785] R dapply doesn't return array or raw columns
clarkfitzg Sep 7, 2016
3ce3a28
[SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffe…
lw-lin Sep 7, 2016
6b41195
[SPARK-17339][SPARKR][CORE] Fix some R tests and use Path.toUri in Sp…
HyukjinKwon Sep 7, 2016
6f4aecc
[SPARK-17427][SQL] function SIZE should return -1 when parameter is null
adrian-wang Sep 7, 2016
76ad89e
[MINOR][SQL] Fixing the typo in unit test
Sep 7, 2016
649fa4b
[SPARK-17370] Shuffle service files not invalidated when a slave is lost
ericl Sep 7, 2016
b230fb9
[SPARK-17052][SQL] Remove Duplicate Test Cases auto_join from HiveCom…
gatorsmile Sep 7, 2016
3ced39d
[SPARK-17432][SQL] PreprocessDDL should respect case sensitivity when…
cloud-fan Sep 8, 2016
f0d21b7
[SPARK-17442][SPARKR] Additional arguments in write.df are not passed…
felixcheung Sep 8, 2016
78d5d4d
[SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and tes…
HyukjinKwon Sep 8, 2016
722afbb
[SPARK-17405] RowBasedKeyValueBatch should use default page size to p…
ericl Sep 8, 2016
92ce8d4
[SPARK-15487][WEB UI] Spark Master UI to reverse proxy Application an…
Sep 9, 2016
65b814b
[SPARK-17456][CORE] Utility for parsing Spark versions
jkbradley Sep 9, 2016
2ed6012
[SPARK-17464][SPARKR][ML] SparkR spark.als argument reg should be 0.1…
yanboliang Sep 9, 2016
7098a12
Streaming doc correction.
Sep 9, 2016
a3981c2
[SPARK-17433] YarnShuffleService doesn't handle moving credentials le…
Sep 9, 2016
f7d2143
[SPARK-17354] [SQL] Partitioning by dates/timestamps should work with…
HyukjinKwon Sep 9, 2016
3354917
[SPARK-15453][SQL] FileSourceScanExec to extract `outputOrdering` inf…
tejasapatil Sep 10, 2016
1fec3ce
[SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank
Sep 10, 2016
bcdd259
[SPARK-15509][FOLLOW-UP][ML][SPARKR] R MLlib algorithms should suppor…
yanboliang Sep 10, 2016
6ea5055
[SPARK-17396][CORE] Share the task support between UnionRDD instances.
rdblue Sep 10, 2016
71b7d42
[SPARK-16445][MLLIB][SPARKR] Fix @return description for sparkR mlp s…
keypointt Sep 10, 2016
29ba957
[SPARK-17389][ML][MLLIB] KMeans speedup with better choice of k-means…
srowen Sep 11, 2016
180796e
[SPARK-17439][SQL] Fixing compression issues with approximate quantil…
thunterdb Sep 11, 2016
bf22217
[SPARK-17330][SPARK UT] Clean up spark-warehouse in UT
Sep 11, 2016
c76baff
[SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH fro…
BryanCutler Sep 11, 2016
883c763
[SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init ste…
yanboliang Sep 11, 2016
767d480
[SPARK-17415][SQL] Better error message for driver-side broadcast joi…
sameeragarwal Sep 11, 2016
72eec70
[SPARK-17486] Remove unused TaskMetricsUIData.updatedBlockStatuses field
JoshRosen Sep 12, 2016
cc87280
[SPARK-17171][WEB UI] DAG will list all partitions in the graph
cenyuhai Sep 12, 2016
4efcdb7
[SPARK-17447] Performance improvement in Partitioner.defaultPartition…
codlife Sep 12, 2016
b3c2291
[SPARK-16992][PYSPARK] use map comprehension in doc
gsemet Sep 12, 2016
8087ecf
[SPARK CORE][MINOR] fix "default partitioner cannot partition array k…
WeichenXu123 Sep 12, 2016
1742c3a
[SPARK-17503][CORE] Fix memory leak in Memory store when unable to ca…
clockfly Sep 12, 2016
3d40896
[SPARK-17483] Refactoring in BlockManager status reporting and block …
JoshRosen Sep 12, 2016
7c51b99
[SPARK-14818] Post-2.0 MiMa exclusion and build changes
JoshRosen Sep 12, 2016
f9c580f
[SPARK-17485] Prevent failed remote reads of cached blocks from faili…
JoshRosen Sep 12, 2016
a91ab70
[SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec
Sep 12, 2016
46f5c20
[BUILD] Closing some stale PRs and ones suggested to be closed by com…
HyukjinKwon Sep 13, 2016
3f6a2bb
[SPARK-17515] CollectLimit.execute() should perform per-partition limits
JoshRosen Sep 13, 2016
4ba63b1
[SPARK-17142][SQL] Complex query triggers binding error in HashAggreg…
jiangxb1987 Sep 13, 2016
72edc7e
[SPARK-17531] Don't initialize Hive Listeners for the Execution Client
brkyvz Sep 13, 2016
37b93f5
[SPARK-17530][SQL] Add Statistics into DESCRIBE FORMATTED
gatorsmile Sep 13, 2016
a454a4d
[SPARK-17317][SPARKR] Add SparkR vignette
junyangq Sep 14, 2016
def7c26
[SPARK-17449][DOCUMENTATION] Relation between heartbeatInterval and…
jagadeesanas2 Sep 14, 2016
b5bfcdd
[SPARK-17525][PYTHON] Remove SparkContext.clearFiles() from the PySpa…
sjakthol Sep 14, 2016
18b4f03
[CORE][DOC] remove redundant comment
wangmiao1981 Sep 14, 2016
4cea9da
[SPARK-17480][SQL] Improve performance by removing or caching List.le…
seyfe Sep 14, 2016
dc0a4c9
[SPARK-17445][DOCS] Reference an ASF page as the main place to find t…
srowen Sep 14, 2016
52738d4
[SPARK-17409][SQL] Do Not Optimize Query in CTAS More Than Once
gatorsmile Sep 14, 2016
6d06ff6
[SPARK-17514] df.take(1) and df.limit(1).collect() should perform the…
JoshRosen Sep 14, 2016
a79838b
[MINOR][SQL] Add missing functions for some options in SQLConf and us…
HyukjinKwon Sep 14, 2016
040e469
[SPARK-10747][SQL] Support NULLS FIRST|LAST clause in ORDER BY
xwu0226 Sep 14, 2016
ff6e4cb
[SPARK-17511] Yarn Dynamic Allocation: Avoid marking released contain…
kishorvpatil Sep 14, 2016
e33bfae
[SPARK-17463][CORE] Make CollectionAccumulator and SetAccumulator's v…
zsxwing Sep 14, 2016
dbfc7aa
[SPARK-17472] [PYSPARK] Better error message for serialization failur…
ericl Sep 14, 2016
bb32294
[SPARK-17465][SPARK CORE] Inappropriate memory management in `org.apa…
Sep 14, 2016
6a6adb1
[SPARK-17440][SPARK-17441] Fixed Multiple Bugs in ALTER TABLE
gatorsmile Sep 15, 2016
d15b4f9
[SPARK-17507][ML][MLLIB] check weight vector size in ANN
WeichenXu123 Sep 15, 2016
f893e26
[SPARK-17524][TESTS] Use specified spark.buffer.pageSize
a-roberts Sep 15, 2016
647ee05
[SPARK-17521] Error when I use sparkContext.makeRDD(Seq())
codlife Sep 15, 2016
ad79fc0
[SPARK-17406][WEB UI] limit timeline executor events
cenyuhai Sep 15, 2016
71a6582
[SPARK-17536][SQL] Minor performance improvement to JDBC batch inserts
Sep 15, 2016
2ad2769
[SPARK-17406][BUILD][HOTFIX] MiMa excludes fix
srowen Sep 15, 2016
b479278
[SPARK-17451][CORE] CoarseGrainedExecutorBackend should inform driver…
tejasapatil Sep 15, 2016
0ad8eeb
[SPARK-17379][BUILD] Upgrade netty-all to 4.0.41 final for bug fixes
a-roberts Sep 15, 2016
5b8f737
[SPARK-17547] Ensure temp shuffle data file is cleaned up after error
JoshRosen Sep 15, 2016
d403562
[SPARK-17114][SQL] Fix aggregates grouped by literals with empty input
hvanhovell Sep 15, 2016
fe76739
[SPARK-17429][SQL] use ImplicitCastInputTypes with function Length
cenyuhai Sep 15, 2016
a6b8182
[SPARK-17364][SQL] Antlr lexer wrongly treats full qualified identifi…
clockfly Sep 15, 2016
1202075
[SPARK-17484] Prevent invalid block locations from being reported aft…
JoshRosen Sep 15, 2016
b72486f
[SPARK-17458][SQL] Alias specified for aggregates in a pivot are not …
aray Sep 15, 2016
b2e2726
[SPARK-17543] Missing log4j config file for tests in common/network-…
jagadeesanas2 Sep 16, 2016
fc1efb7
[SPARK-17534][TESTS] Increase timeouts for DirectKafkaStreamSuite tests
a-roberts Sep 16, 2016
a425a37
[SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid OOM when conve…
clockfly Sep 16, 2016
dca771b
[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
rxin Sep 16, 2016
b9323fc
[SPARK-17561][DOCS] DataFrameWriter documentation formatting problems
srowen Sep 16, 2016
39e2bad
[SPARK-17549][SQL] Only collect table size stat in driver for cached …
Sep 16, 2016
69cb049
Correct fetchsize property name in docs
darabos Sep 17, 2016
f15d41b
[SPARK-17567][DOCS] Use valid url to Spark RDD paper
keypointt Sep 17, 2016
25cbbe6
[SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously …
willb Sep 17, 2016
9dbd4b8
[SPARK-17529][CORE] Implement BitSet.clearUntil and use it during mer…
Sep 17, 2016
bbe0b1d
[SPARK-17575][DOCS] Remove extra table tags in configuration document
phalodi Sep 17, 2016
86c2d39
[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.leng…
HyukjinKwon Sep 17, 2016
8faa521
[SPARK-17491] Close serialization stream to fix wrong answer bug in p…
JoshRosen Sep 17, 2016
3a3c9ff
[SPARK-17518][SQL] Block Users to Specify the Internal Data Source Pr…
gatorsmile Sep 18, 2016
3fe630d
[SPARK-17541][SQL] fix some DDL bugs about table management when same…
cloud-fan Sep 18, 2016
5d3f461
[SPARK-17506][SQL] Improve the check double values equality rule.
jiangxb1987 Sep 18, 2016
342c0e6
[SPARK-17546][DEPLOY] start-* scripts should use hostname -f
srowen Sep 18, 2016
7151011
[SPARK-17586][BUILD] Do not call static member via instance reference
HyukjinKwon Sep 18, 2016
1dbb725
[SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null value…
lw-lin Sep 18, 2016
8f0c35a
[SPARK-17571][SQL] AssertOnQuery.condition should always return Boole…
petermaxlee Sep 18, 2016
d720a40
[SPARK-17297][DOCS] Clarify window/slide duration as absolute time, n…
srowen Sep 19, 2016
cdea1d1
[SPARK-17473][SQL] fixing docker integration tests error due to diffe…
sureshthalamati Sep 19, 2016
80d6655
[SPARK-17438][WEBUI] Show Application.executorLimit in the applicatio…
zsxwing Sep 19, 2016
e063206
[SPARK-16439] [SQL] bring back the separator in SQL UI
Sep 19, 2016
d810415
[SPARK-17100] [SQL] fix Python udf in filter on top of outer join
Sep 19, 2016
e719b1c
[SPARK-17160] Properly escape field names in code-generated error mes…
JoshRosen Sep 20, 2016
26145a5
[SPARK-17163][ML] Unified LogisticRegression interface
sethah Sep 20, 2016
be9d57f
[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
petermaxlee Sep 20, 2016
f039d96
Revert "[SPARK-17513][SQL] Make StreamExecution garbage-collect its m…
cloud-fan Sep 20, 2016
4a426ff
[SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext
apetresc Sep 20, 2016
d5ec5db
[SPARK-17502][SQL] Fix Multiple Bugs in DDL Statements on Temporary V…
gatorsmile Sep 20, 2016
eb004c6
[SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable
cloud-fan Sep 20, 2016
a6aade0
[SPARK-15698][SQL][STREAMING] Add the ability to remove the old Metad…
jerryshao Sep 20, 2016
9ac68db
[SPARK-17549][SQL] Revert "[] Only collect table size stat in driver …
yhuai Sep 20, 2016
7e418e9
[SPARK-17611][YARN][TEST] Make shuffle service test really test auth.
Sep 20, 2016
976f3b1
[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
petermaxlee Sep 21, 2016
1ea4991
[MINOR][BUILD] Fix CheckStyle Error
weiqingy Sep 21, 2016
e48ebc4
[SPARK-15698][SQL][STREAMING][FOLLW-UP] Fix FileStream source and sin…
jerryshao Sep 21, 2016
61876a4
[CORE][DOC] Fix errors in comments
wangmiao1981 Sep 21, 2016
d3b8869
[SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports ad…
yanboliang Sep 21, 2016
7654385
[SPARK-17595][MLLIB] Use a bounded priority queue to find synonyms in…
willb Sep 21, 2016
3977223
[SPARK-17617][SQL] Remainder(%) expression.eval returns incorrect res…
clockfly Sep 21, 2016
28fafa3
[SPARK-17599] Prevent ListingFileCatalog from failing if path doesn't…
brkyvz Sep 21, 2016
b366f18
[SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Posi…
Sep 21, 2016
57dc326
[SPARK-17219][ML] Add NaN value handling in Bucketizer
Sep 21, 2016
25a020b
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-…
HyukjinKwon Sep 21, 2016
dd7561d
[CORE][MINOR] Add minor code change to TaskState and Task
erenavsarogullari Sep 21, 2016
248922f
[SPARK-17590][SQL] Analyze CTE definitions at once and allow CTE subq…
viirya Sep 21, 2016
d7ee122
[SPARK-17418] Prevent kinesis-asl-assembly artifacts from being publi…
JoshRosen Sep 21, 2016
b4a4421
[SPARK-11918][ML] Better error from WLS for cases like singular input
srowen Sep 21, 2016
2cd1bfa
[SPARK-4563][CORE] Allow driver to advertise a different network addr…
Sep 21, 2016
9fcf1c5
[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task.
squito Sep 21, 2016
8c3ee2b
[SPARK-17512][CORE] Avoid formatting to python path for yarn and meso…
jerryshao Sep 21, 2016
7cbe216
[SPARK-17569] Make StructuredStreaming FileStreamSource batch generat…
brkyvz Sep 22, 2016
c133907
[SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and…
yanboliang Sep 22, 2016
6902eda
[SPARK-17315][FOLLOW-UP][SPARKR][ML] Fix print of Kolmogorov-Smirnov …
yanboliang Sep 22, 2016
3497ebe
[SPARK-17627] Mark Streaming Providers Experimental
marmbrus Sep 22, 2016
8bde03b
[SPARK-17494][SQL] changePrecision() on compact decimal should respec…
Sep 22, 2016
b50b34f
[SPARK-17609][SQL] SessionCatalog.tableExists should not check temp view
cloud-fan Sep 22, 2016
cb324f6
[SPARK-17425][SQL] Override sameResult in HiveTableScanExec to make R…
watermen Sep 22, 2016
3a80f92
[SPARK-17492][SQL] Fix Reading Cataloged Data Sources without Extendi…
gatorsmile Sep 22, 2016
de7df7d
[SPARK-17625][SQL] set expectedOutputAttributes when converting Simpl…
wzhfy Sep 22, 2016
646f383
[SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.
frreiss Sep 22, 2016
72d9fba
[SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for AFTSurv…
WeichenXu123 Sep 22, 2016
8a02410
[SQL][MINOR] correct the comment of SortBasedAggregationIterator.safe…
cloud-fan Sep 22, 2016
17b72d3
[SPARK-17365][CORE] Remove/Kill multiple executors together to reduce…
Sep 22, 2016
9f24a17
Skip building R vignettes if Spark is not built
shivaram Sep 22, 2016
85d609c
[SPARK-17613] S3A base paths with no '/' at the end return empty Data…
brkyvz Sep 22, 2016
3cdae0f
[SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python pr…
zsxwing Sep 22, 2016
0d63487
[SPARK-17616][SQL] Support a single distinct aggregate combined with …
hvanhovell Sep 22, 2016
f4f6bd8
[SPARK-16240][ML] ML persistence backward compatibility for LDA
GayathriMurali Sep 22, 2016
a166196
[SPARK-17569][SPARK-17569][TEST] Make the unit test added for work again
brkyvz Sep 22, 2016
79159a1
[SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec
Sep 23, 2016
a4aeb76
[SPARK-17639][BUILD] Add jce.jar to buildclasspath when building.
Sep 23, 2016
947b8c6
[SPARK-16719][ML] Random Forests should communicate fewer trees on ea…
jkbradley Sep 23, 2016
62ccf27
[SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStre…
zsxwing Sep 23, 2016
5c5396c
[BUILD] Closes some stale PRs
HyukjinKwon Sep 23, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
.idea/
.idea_modules/
.project
.pydevproject
.scala_dependencies
.settings
/lib/
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
build/*.jar
build/apache-maven*
build/scala*
Expand Down Expand Up @@ -72,7 +74,13 @@ metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

.Rproj.user
51 changes: 51 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Spark provides this Travis CI configuration file to help contributors
# check Scala/Java style conformance and JDK7/8 compilation easily
# during their preparing pull requests.
# - Scalastyle is executed during `maven install` implicitly.
# - Java Checkstyle is executed by `lint-java`.
# See the related discussion here.
# https://github.com/apache/spark/pull/12980

# 1. Choose OS (Ubuntu 14.04.3 LTS Server Edition 64bit, ~2 CORE, 7.5GB RAM)
sudo: required
dist: trusty

# 2. Choose language and target JDKs for parallel builds.
language: java
jdk:
- oraclejdk7
- oraclejdk8

# 3. Setup cache directory for SBT and Maven.
cache:
directories:
- $HOME/.sbt
- $HOME/.m2

# 4. Turn off notifications.
notifications:
email: false

# 5. Run maven install before running lint-java.
install:
- export MAVEN_SKIP_RC=1
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install

# 6. Run lint-java.
script:
- dev/lint-java
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down Expand Up @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(MIT License) blockUI (http://jquery.malsup.com/block/)
(MIT License) RowsGroup (http://datatables.net/license/mit)
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
13 changes: 5 additions & 8 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Apache Spark
Copyright 2014 The Apache Software Foundation.
Copyright 2014 and onwards The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Expand All @@ -12,7 +12,9 @@ Common Development and Distribution License 1.0
The following components are provided under the Common Development and Distribution License 1.0. See project link for details.

(CDDL 1.0) Glassfish Jasper (org.mortbay.jetty:jsp-2.1:6.1.14 - http://jetty.mortbay.org/project/modules/jsp-2.1)
(CDDL 1.0) JAX-RS (https://jax-rs-spec.java.net/)
(CDDL 1.0) Servlet Specification 2.5 API (org.mortbay.jetty:servlet-api-2.5:6.1.14 - http://jetty.mortbay.org/project/modules/servlet-api-2.5)
(CDDL 1.0) (GPL2 w/ CPE) javax.annotation API (https://glassfish.java.net/nonav/public/CDDL+GPL.html)
(COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0) (GNU General Public Library) Streaming API for XML (javax.xml.stream:stax-api:1.0-2 - no url defined)
(Common Development and Distribution License (CDDL) v1.0) JavaBeans Activation Framework (JAF) (javax.activation:activation:1.1 - http://java.sun.com/products/javabeans/jaf/index.jsp)

Expand All @@ -22,15 +24,10 @@ Common Development and Distribution License 1.1

The following components are provided under the Common Development and Distribution License 1.1. See project link for details.

(CDDL 1.1) (GPL2 w/ CPE) org.glassfish.hk2 (https://hk2.java.net)
(CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3 (javax.xml.bind:jaxb-api:2.2.2 - https://jaxb.dev.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) JAXB RI (com.sun.xml.bind:jaxb-impl:2.2.3-1 - http://jaxb.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.8 - https://jersey.dev.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.9 - https://jersey.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-guice (com.sun.jersey.contribs:jersey-guice:1.9 - https://jersey.java.net/jersey-contribs/jersey-guice/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.8 - https://jersey.dev.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.9 - https://jersey.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.8 - https://jersey.dev.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.9 - https://jersey.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) Jersey 2 (https://jersey.java.net)

========================================================================
Common Public License 1.0
Expand Down
2 changes: 2 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
lib
pkg/man
pkg/html
SparkR.Rcheck/
SparkR_*.tar.gz
12 changes: 6 additions & 6 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
SparkR documentation is generated by using in-source comments and annotated by using
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

```R
library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))
```
You can verify if your changes are good by running

R CMD check pkg/
32 changes: 18 additions & 14 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### Installing sparkR

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
```
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
./install-dev.sh
Expand All @@ -17,8 +18,9 @@ export R_HOME=/home/username/R
#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package

```bash
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR
Expand All @@ -37,8 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```R
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
Expand All @@ -55,23 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:

./bin/spark-submit examples/src/main/r/dataframe.R

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```bash
./bin/spark-submit examples/src/main/r/dataframe.R
```
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
```bash
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```

### Running on YARN

The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
```bash
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
```
32 changes: 31 additions & 1 deletion R/WINDOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,40 @@ To build SparkR on Windows, the following steps are required

1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
include Rtools and R in `PATH`.

2. Install
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
`JAVA_HOME` in the system environment variables.

3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
directory in Maven in `PATH`.

4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run

```bash
mvn.cmd -DskipTests -Psparkr package
```

`.\build\mvn` is a shell script so `mvn.cmd` should be used directly on Windows.

## Unit tests

To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:

1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.

2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).

3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.

4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.

5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:

```
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
```

64 changes: 64 additions & 0 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -o pipefail
set -e

FWDIR="$(cd `dirname $0`; pwd)"
pushd $FWDIR > /dev/null

if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`

CRAN_CHECK_OPTIONS="--as-cran"

if [ -n "$NO_TESTS" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests"
fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz

popd > /dev/null
30 changes: 28 additions & 2 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,26 @@
# limitations under the License.
#

# Script to create API docs for SparkR
# This requires `devtools` and `knitr` to be installed on the machine.
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html

set -o pipefail
set -e

# Figure out where the script is
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"

# Required for setting SPARK_SCALA_VERSION
. "${SPARK_HOME}"/bin/load-spark-env.sh

echo "Using Scala $SPARK_SCALA_VERSION"

pushd $FWDIR

# Install the package (this will also generate the Rd files)
Expand All @@ -43,4 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
Loading