4.7 KiB
This helper package allows code to be portable between use on a MPI cluster computer and allows the use of the same code for parallel functions on Linux, Mac OS, and Windows.
Linux and Mac OS are POSIX compliant and therefore are able to make use of 'fork()' to enable parallel tasks with significant speed benefits and ease of coding, this feature however does not exist in Windows. Instead, Windows machines can make use of a PSOCK cluster to do parallel computing. This helper package unifies the syntax and allows the code to be portable and suitably parallel across all platforms.
####Requires: The entire functionality is provided by the parallel package, which is shipped with R by default. Take a look through that documentation, you may decide that ptools is unnecessary for your use-case.
Installing
Packages are hosted at bamonroe.com/drat.
Load the library and configure as desired:
With the package installed, it should be loaded as any other package would:
library(ptools)
A configuration command needs to be run before any functions are run in parallel:
p_config(type = <type>, host_list = list(), cores = <cores>)
The p_config()
function accepts type
argument, either "FORK"
or
"PSOCK"
, a list named by hostname with elements equal to each core for the
hostname, and/or a numeric value for cores
.
If the number of cores specified is less than 1, 1 core will be used. If the number of cores specified is greater than the number available, the maximum number available will be used.
When utilizing FORK style parallelization, i.e. running code on Unix-like systems, the number of cores can be changed at any point without any issues.
Utility Functions:
The script comes with 7 wrapper functions as of right now:
p_cores()
- returns the number of cores in use.
p_library(("libname1", "libname2", ...)
- load all libraries listed into host and all cluster nodes if they exist.
- for fork() implementations, it can safely replace library()
- all library names need to be quoted, unlike library()
- can take multiple library names at once, unlike library()
Cluster Only Functions:
These functions are only useful when using cluster style parallelizatin However, these commands will do absolutely nothing while using FORK and allow the script to be completely portable across platforms. So rather code as if you need to export objects to workers.
p_export("obj1", "obj2", ...)
Export the named objects to all cluseter nodes by default.
- EXAMPLE 1:
var <- 1:10
c.export("var")
- NOTE: This process is uncessary if you are not using PSOCKS or MPI,
p_eval(expression)
- evaluates the expression on every worker
- EXAMPLE:
c.eval( vec <- 1:10 )
creates a vector named vec
from 1 to 10 on every worker
c_call(FUN)
- a wrapper for
clusterCall
. Not a wrapper for serialcall
, which does not do the same thing.
Apply family of functions
p_apply(X, MARGIN, FUN)
- The wrapped apply function. Utilizes
parApply
for PSOCK. An additional wrapper function was written aroundmclapply
for forking that results in a parallel equivalent of the apply function not included in theparallel
package. - X, MARGIN, and FUN are exactly as they are in the serial version apply.
p_applyLB(X, FUN)
- The wrapped apply function with load balancing. Utilizes
parApplyLB
for PSOCK. An additional wrapper function was written aroundmclapply(preschedule = FALSE)
for forking that results in a parallel equivalent of the apply function not included in theparallel
package. - X, MARGIN, and FUN are exactly as they are in the serial version lapply.
p_lapply(X, FUN)
- The wrapped lapply function. Utilizes
mclapply
for FORK,parLapply
for PSOCK. - X and FUN are exactly as they are in the serial version lapply.
p_lapplyLB(X,FUN)
- The wrapped lapply function with load balanncing. Utilizes
mclapply(preschedule = FALSE)
with FORK,parLapplyLB
with PSOCK. - X and FUN are exactly as they are in the serial version lapply.
p_sapply(X,FUN)
- The wrapped sapply function. Utilizes
parSapply
with PSOCK. An additional wrapper was written aroundmclapply
that results in a parallel version of sapply not included in the built inparallel
package - X and FUN are exactly as they are in the serial version sapply.
p_sapplyLB(X,FUN)
- The wrapped sapply function with load balancing. Utilizes
parSapplyLB
with PSOCK. An additional wrapper was written aroundmclapply(preschedule = FALSE)
that results in a parallel version of sapply not included in the built inparallel
package - X and FUN are exactly as they are in the serial version sapply.