|Brian Albert Monroe e16f62faab||2 years ago|
|R||2 years ago|
|man||2 years ago|
|src||3 years ago|
|.Rbuildignore||5 years ago|
|.gitignore||3 years ago|
|DESCRIPTION||2 years ago|
|NAMESPACE||3 years ago|
|README.md||5 years ago|
This helper script allows code to be portable between use on a MPI cluster computer and allows the use of the same code for parallel functions on Linux, Mac OS, and Windows.
Linux and Mac OS are POSIX compliant and therefore are able to make use of 'fork()' to enable parallel tasks with significant speed benefits and ease of coding, this feature however does not exist in Windows. Instead, Windows machines can make use of a PSOCK cluster to do parallel computing. The commands used for a PSOCK cluster are similar to those used for an MPI cluster, but differ in syntax and occasional arguments. Thus we have 3 ways to do parallel computing, with different syntax being used when on a Linux or Mac OS machine vs a Windows machine or while utilizing an MPI cluster. This helper script unifies the syntax and allows the code to be portable and suitably parallel across all platforms.
The Rhpc package is necessary to run the code with MPI, which is used in
place of the standard
snow packages as it suports long vectors. The version patched by my is required, but
CRAN maintains the official Rhpc CRAN Version, which contains documentaion and
links to the maintainer.
The remaining functionality is provided by the parallel package,
which is shipped with R by default.
Please take a look through the R high performance computing page to get an overview of all the possible ways to use high performance computing with R. Some of the functionality of these packages may be incorporated at a later date...if I find myself using them regularly.
To install the package, you'll need the devtools package developed by Hadley Wickham.
Next, install the package directly from GitHub
And you should be good to go!
With the package installed, it should be loaded as any other package would:
Loading the package into namespace automatically creates a PSOCKS cluster if run on Windows and an MPI cluster if MPI is detected. If running a Unix-like machine, e.g. OSX, Linux, FORK style parallelization will be used. Under all methods, the default configuration utilizes the maximum number of CPUs available to R.
c.config() function accepts a numerical argument and attempts to set the number of cores available to R equal to the argument.
If the number of cores specified is less than 1, 1 core will be used.
If the number of cores specified is greater than the number available, the maximum number available will be used.
Note that this function does nothing for MPI clusters, the user should specify the number of CPUs or "slots" available to R in the machine file parsed by
Note also that changing the number of cores for a PSOCKS style cluster, i.e. when running code on Windows, will close the existing cluster and start a new one with the specified number of cores.
Thus, any objects exported to the original cluster will have to be re-exported.
When utilizing FORK style parallelization, i.e. running code on Unix-like systems, the number of cores can be changed at any point without any issues.
The script comes with 7 wrapper functions as of right now:
These functions are only useful when using cluster style parallelizatin However, these commands will do absolutely nothing while using FORK and allow the script to be completely portable across platforms. So rather code as if you need to export objects to workers.
c.export("obj1","obj2",..., push = TRUE, clear = FALSE )
Export the named objects to all cluseter nodes by default.
push = FALSE the object names are added to a list of objects to exported and exported with a subsequent call to
push = TRUE along with any other objects named.
clear = TRUE clear the names from this stored export list before doing anything else.
var <- 1:10 c.export("var")
var <- 1:10 c.export("var", push = F) # Adds 'var' to list of objects to be exported later
NOTE: This process is uncessary if you are not using PSOCKS or MPI,
evaluates the expression on every worker
c.eval( vec <- 1:10 )
creates a vector named
vec from 1 to 10 on every worker
clusterCall. Not a wrapper for serial
call, which does not do the same thing.
parLapplywhen forking is not available. An additional wrapper function was written around
mclapplyfor forking that results in a parallel equivalent of the apply function not included in the
parApplyLBwhen forking is not available. An additional wrapper function was written around
mclapply(preschedule = FALSE)for forking that results in a parallel equivalent of the apply function not included in the
mclapplywhen forking is available,
parLapplywhen forking is not available.
mclapply(preschedule = FALSE)when forking is available,
parLapplyLBwhen forking is not available.
parLapplywhen forking is not available. An additional wrapper was written around
mclapplythat results in a parallel version of sapply not included in the built in
parLapplyLBwhen forking is not available. An additional wrapper was written around
mclapply(preschedule = FALSE)that results in a parallel version of sapply not included in the built in