spark-ml-learn spark MLlib官网翻译及补充 MLlib: RDD-based API Guide(以RDD为基础的mllib,在org.apache.spark.mllib包下) 基本数据类型 基本统计 Summary statistics(概括统计) Correlations(相关性系数) Stratified sampling(分层抽样)(看不太懂) Hypothesis testing(假设检验) Streaming Significance Testing(流重要性测试) Random data generation(随机数生成) Kernel density estimation(核密度估计)(看不太懂) Classification and Regression(分类与回归) Optimization (developer) MLlib: Main Guide(以DataFrame为基础的mllib, 在org.apache.spark.ml包下) Basic statistics