Author:Mark Hall <mhall{[at]}pentaho.com>
Category:Distributed
Changes:Fixed a bug in the conversion of sparse instances to sparse Vectors.
Date:2019-03-11
Depends:weka (>=3.9.1), distributedWekaBase (>=1.1.17)
Description:Development continuation of distributedWekaSpark that is compatible with Spark 2.x libraries. For Spark 1.6 comaptibility use distributedWekaSparkDev instead. Adds access to source files (CSV, Parquet and Avro) via Spark DataFrames, and adds support for both desktop and distributed execution of MLlib algorithms. Provides Spark wrappers for the classes in distributedWekaBase. Includes generic Spark 1.4.0 libraries - To run against Hadoop/HDFS, it is necessary to delete all the libraries, except those listed in the wekaSpark.props file, in ${user.home}/wekafiles/distributedWekaSpark2Dev/lib and copy in the libraries from the 'jars' directory of your Spark 2.x distribution.
License:GPL 3
Maintainer:Mark Hall <mhall{[at]}pentaho.com>
MessageToDisplayOnInstall:Includes generic Spark 2.4.0 libraries, which are sufficent for running local mode on the local filesystem out of the box. To run against Hadoop/HDFS, it is necessary to delete all the libraries in ${WEKA_HOME}/distributedWekaSpark/lib and copy in the spark-assembly-a.b.c-hadoopX.Y-Z.jar file that is bundled with the distribution of Spark compiled for your version of Hadoop.
PackageURL:http://downloads.sourceforge.net/weka/weka-packages/distributedWekaSpark2Dev1.1.4.zip
Precludes:distributedWekaSpark,distributedWekaSparkDev
URL:http://markahall.blogspot.co.nz/2017/07/integrating-spark-mllib-into-weka.html
Version:1.1.4