public class HDFSUtils
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
WEKA_LIBRARIES_LOCATION
The default location in HDFS to place weka.jar and other libraries for
inclusion in the classpath of the hadoop nodes
|
static java.lang.String |
WEKA_TEMP_DISTRIBUTED_CACHE_FILES
Staging location for non library files to be distributed to the nodes by
the distributed cache
|
static java.lang.String |
WINDOWS_ACCESSING_HADOOP_ON_LINUX_SYS_PROP
Users need to set HADOOP_ON_LINUX environment variable to "true" if
accessing a *nix Hadoop cluster from Windows so that we can post-process
the job classpath in the Configuration file to use ':' rather than ';' as
separators.
|
Constructor and Description |
---|
HDFSUtils() |
Modifier and Type | Method and Description |
---|---|
static void |
addFilesClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Adds a set of files in HDFS to the classpath for hadoop nodes (via the
DistributedCache)
|
static void |
addFilesToDistributedCache(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Adds a set of files to the distributed cache for the supplied Configuration
|
static void |
addFileToClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Adds a file in HDFS to the classpath for hadoop nodes (via the
DistributedCache)
|
static java.lang.String |
addFileToDistributedCache(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Adds a file to the distributed cache for the supplied Configuration
|
static void |
addWekaInstalledFilesToClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Add a list of files, relative to the root of the Weka installation
directory in HDFS (i.e.
|
static void |
checkForWindowsAccessingHadoopOnLinux(org.apache.hadoop.conf.Configuration conf)
If accessing Hadoop running on a *nix system from Windows then we have to
post-process the classpath setup for the job because it will contain ';'
rather than ':' as the separator.
|
static void |
copyFilesToWekaHDFSInstallationDirectory(java.util.List<java.lang.String> localFiles,
HDFSConfig config,
Environment env,
boolean overwrite)
Copy a set of local files into the Weka installation directory in HDFS
|
static void |
copyToHDFS(java.lang.String localFile,
java.lang.String hdfsPath,
HDFSConfig config,
Environment env,
boolean overwrite)
Copy a local file into HDFS
|
static void |
deleteDirectory(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Delete a directory in HDFS
|
static void |
deleteFile(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Delete a file in HDFS
|
static void |
main(java.lang.String[] args) |
static void |
moveInHDFS(java.lang.String source,
java.lang.String target,
HDFSConfig config,
Environment env)
Move a file from one location to another in HDFS
|
static java.lang.String |
resolvePath(java.lang.String path,
Environment env)
Utility method to resolve all environment variables in a given path
|
static void |
serializeObjectToDistributedCache(java.lang.Object toSerialize,
HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String fileNameInCache,
Environment env)
Serializes the given object into a file in the staging area in HDFS and
then adds that file to the distributed cache for the configuration
|
public static final java.lang.String WEKA_LIBRARIES_LOCATION
public static final java.lang.String WINDOWS_ACCESSING_HADOOP_ON_LINUX_SYS_PROP
public static final java.lang.String WEKA_TEMP_DISTRIBUTED_CACHE_FILES
public static java.lang.String resolvePath(java.lang.String path, Environment env)
path
- the path in HDFSenv
- environment variables to usepublic static void moveInHDFS(java.lang.String source, java.lang.String target, HDFSConfig config, Environment env) throws java.io.IOException
source
- the source path in HDFStarget
- the target path in HDFSconfig
- the HDFSConfig with connection detailsenv
- environment variablesjava.io.IOException
- if a problem occurspublic static void copyToHDFS(java.lang.String localFile, java.lang.String hdfsPath, HDFSConfig config, Environment env, boolean overwrite) throws java.io.IOException
localFile
- the path to the local filehdfsPath
- the destination path in HDFSconfig
- the HDFSConfig containing connection detailsenv
- environment variablesoverwrite
- true if the destination should be overwritten (if it
already exists)java.io.IOException
- if a problem occurspublic static void copyFilesToWekaHDFSInstallationDirectory(java.util.List<java.lang.String> localFiles, HDFSConfig config, Environment env, boolean overwrite) throws java.io.IOException
localFiles
- a list of local files to copyconfig
- the HDFSConfig containing connection detailsenv
- environment variablesoverwrite
- true if the destination file should be overwritten (if it
exists already)java.io.IOException
- if a problem occurspublic static void checkForWindowsAccessingHadoopOnLinux(org.apache.hadoop.conf.Configuration conf)
conf
- the Configuration to fix up.public static void addFileToClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig
- the HDFSConfig object with host and port setconf
- the Configuration object that will be changed by this operationpath
- the path to the file (in HDFS) to be added to the classpath for
hadopp nodesenv
- any environment variablesjava.io.IOException
- if a problem occurspublic static void addFilesClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.io.IOException
hdfsConfig
- the HDFSConfig object with host and port setconf
- the Configuration object that will be changed by this operationpaths
- a list of paths (in HDFS) to be added to the classpath for
hadopp nodesenv
- any environment variablesjava.io.IOException
- if a problem occurspublic static void addWekaInstalledFilesToClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.io.IOException
hdfsConfig
- conf
- paths
- a list of paths (relative to the Weka installation root in
HDFS) to add to the classpath for mappers and reducersenv
- java.io.IOException
public static void deleteDirectory(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig
- the HDFSConfig to use with connection details setconf
- the Configuration objectpath
- the path to deleteenv
- environment variablesjava.io.IOException
- if a problem occurspublic static void deleteFile(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig
- the HDFSConfig to use with connection details setconf
- the Configuration objectpath
- the path to deleteenv
- environment variablesjava.io.IOException
- if a problem occurspublic static void serializeObjectToDistributedCache(java.lang.Object toSerialize, HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String fileNameInCache, Environment env) throws java.io.IOException
toSerialize
- the object to serializehdfsConfig
- the hdfs configuration to useconf
- the job configuration to configurefileNameInCache
- the file name only for the serialized object in the
cacheenv
- environment variablesjava.io.IOException
- if a problem occurspublic static java.lang.String addFileToDistributedCache(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig
- the hdfs configuration to useconf
- the job configuration to configurepath
- the path to the file to add. This can be a local file, in which
case it is first staged in HDFS, or a file in HDFS.env
- environment variablesjava.io.IOException
- if a problem occurspublic static void addFilesToDistributedCache(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.lang.Exception
hdfsConfig
- the hdfs configuration to useconf
- the job configuration to configurepaths
- a list of paths to to add to the distributed cache. These can
be a local files, in which case they are first staged in HDFS, or
a files in HDFS, or a mixture of both.env
- environment variables from the distributed cache to a client via
standard Java file IO)java.io.IOException
- if a problem occursjava.lang.Exception
public static void main(java.lang.String[] args)