public class HDFSUtils
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
WEKA_LIBRARIES_LOCATION
The default location in HDFS to place weka.jar and other libraries for
inclusion in the classpath of the hadoop nodes
|
static java.lang.String |
WEKA_TEMP_DISTRIBUTED_CACHE_FILES
Staging location for non library files to be distributed to the nodes by
the distributed cache
|
static java.lang.String |
WINDOWS_ACCESSING_HADOOP_ON_LINUX_SYS_PROP
Users need to set HADOOP_ON_LINUX environment variable to "true" if
accessing a *nix Hadoop cluster from Windows so that we can post-process
the job classpath in the Configuration file to use ':' rather than ';' as
separators.
|
| Constructor and Description |
|---|
HDFSUtils() |
| Modifier and Type | Method and Description |
|---|---|
static void |
addFilesClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Adds a set of files in HDFS to the classpath for hadoop nodes (via the
DistributedCache)
|
static void |
addFilesToDistributedCache(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Adds a set of files to the distributed cache for the supplied Configuration
|
static void |
addFileToClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Adds a file in HDFS to the classpath for hadoop nodes (via the
DistributedCache)
|
static java.lang.String |
addFileToDistributedCache(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Adds a file to the distributed cache for the supplied Configuration
|
static void |
addWekaInstalledFilesToClasspath(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.util.List<java.lang.String> paths,
Environment env)
Add a list of files, relative to the root of the Weka installation
directory in HDFS (i.e.
|
static void |
checkForWindowsAccessingHadoopOnLinux(org.apache.hadoop.conf.Configuration conf)
If accessing Hadoop running on a *nix system from Windows then we have to
post-process the classpath setup for the job because it will contain ';'
rather than ':' as the separator.
|
static void |
copyFilesToWekaHDFSInstallationDirectory(java.util.List<java.lang.String> localFiles,
HDFSConfig config,
Environment env,
boolean overwrite)
Copy a set of local files into the Weka installation directory in HDFS
|
static void |
copyToHDFS(java.lang.String localFile,
java.lang.String hdfsPath,
HDFSConfig config,
Environment env,
boolean overwrite)
Copy a local file into HDFS
|
static void |
deleteDirectory(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Delete a directory in HDFS
|
static void |
deleteFile(HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String path,
Environment env)
Delete a file in HDFS
|
static void |
main(java.lang.String[] args) |
static void |
moveInHDFS(java.lang.String source,
java.lang.String target,
HDFSConfig config,
Environment env)
Move a file from one location to another in HDFS
|
static java.lang.String |
resolvePath(java.lang.String path,
Environment env)
Utility method to resolve all environment variables in a given path
|
static void |
serializeObjectToDistributedCache(java.lang.Object toSerialize,
HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String fileNameInCache,
Environment env)
Serializes the given object into a file in the staging area in HDFS and
then adds that file to the distributed cache for the configuration
|
public static final java.lang.String WEKA_LIBRARIES_LOCATION
public static final java.lang.String WINDOWS_ACCESSING_HADOOP_ON_LINUX_SYS_PROP
public static final java.lang.String WEKA_TEMP_DISTRIBUTED_CACHE_FILES
public static java.lang.String resolvePath(java.lang.String path,
Environment env)
path - the path in HDFSenv - environment variables to usepublic static void moveInHDFS(java.lang.String source,
java.lang.String target,
HDFSConfig config,
Environment env)
throws java.io.IOException
source - the source path in HDFStarget - the target path in HDFSconfig - the HDFSConfig with connection detailsenv - environment variablesjava.io.IOException - if a problem occurspublic static void copyToHDFS(java.lang.String localFile,
java.lang.String hdfsPath,
HDFSConfig config,
Environment env,
boolean overwrite)
throws java.io.IOException
localFile - the path to the local filehdfsPath - the destination path in HDFSconfig - the HDFSConfig containing connection detailsenv - environment variablesoverwrite - true if the destination should be overwritten (if it
already exists)java.io.IOException - if a problem occurspublic static void copyFilesToWekaHDFSInstallationDirectory(java.util.List<java.lang.String> localFiles,
HDFSConfig config,
Environment env,
boolean overwrite)
throws java.io.IOException
localFiles - a list of local files to copyconfig - the HDFSConfig containing connection detailsenv - environment variablesoverwrite - true if the destination file should be overwritten (if it
exists already)java.io.IOException - if a problem occurspublic static void checkForWindowsAccessingHadoopOnLinux(org.apache.hadoop.conf.Configuration conf)
conf - the Configuration to fix up.public static void addFileToClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig - the HDFSConfig object with host and port setconf - the Configuration object that will be changed by this operationpath - the path to the file (in HDFS) to be added to the classpath for
hadopp nodesenv - any environment variablesjava.io.IOException - if a problem occurspublic static void addFilesClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.io.IOException
hdfsConfig - the HDFSConfig object with host and port setconf - the Configuration object that will be changed by this operationpaths - a list of paths (in HDFS) to be added to the classpath for
hadopp nodesenv - any environment variablesjava.io.IOException - if a problem occurspublic static void addWekaInstalledFilesToClasspath(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.io.IOException
hdfsConfig - conf - paths - a list of paths (relative to the Weka installation root in
HDFS) to add to the classpath for mappers and reducersenv - java.io.IOExceptionpublic static void deleteDirectory(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig - the HDFSConfig to use with connection details setconf - the Configuration objectpath - the path to deleteenv - environment variablesjava.io.IOException - if a problem occurspublic static void deleteFile(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig - the HDFSConfig to use with connection details setconf - the Configuration objectpath - the path to deleteenv - environment variablesjava.io.IOException - if a problem occurspublic static void serializeObjectToDistributedCache(java.lang.Object toSerialize,
HDFSConfig hdfsConfig,
org.apache.hadoop.conf.Configuration conf,
java.lang.String fileNameInCache,
Environment env)
throws java.io.IOException
toSerialize - the object to serializehdfsConfig - the hdfs configuration to useconf - the job configuration to configurefileNameInCache - the file name only for the serialized object in the
cacheenv - environment variablesjava.io.IOException - if a problem occurspublic static java.lang.String addFileToDistributedCache(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.lang.String path, Environment env) throws java.io.IOException
hdfsConfig - the hdfs configuration to useconf - the job configuration to configurepath - the path to the file to add. This can be a local file, in which
case it is first staged in HDFS, or a file in HDFS.env - environment variablesjava.io.IOException - if a problem occurspublic static void addFilesToDistributedCache(HDFSConfig hdfsConfig, org.apache.hadoop.conf.Configuration conf, java.util.List<java.lang.String> paths, Environment env) throws java.lang.Exception
hdfsConfig - the hdfs configuration to useconf - the job configuration to configurepaths - a list of paths to to add to the distributed cache. These can
be a local files, in which case they are first staged in HDFS, or
a files in HDFS, or a mixture of both.env - environment variables from the distributed cache to a client via
standard Java file IO)java.io.IOException - if a problem occursjava.lang.Exceptionpublic static void main(java.lang.String[] args)