You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Brian Albert Monroe 940ffe486d
fix: update syntax for exists function
3 months ago
R fix: update syntax for exists function 3 months ago
man config: Allow ... options to be passed to p_config 1 year ago
.Rbuildignore Move stuff up one dir for github 6 years ago
.gitignore Get rid of Rhpc2, require MPI_cores to be set in options() to use MPI 4 years ago
DESCRIPTION fix: update syntax for exists function 3 months ago
NAMESPACE rm_mpi: Big rename 2 years ago
README.md README: Update it 2 years ago

README.md

This helper package allows code to be portable between use on a MPI cluster computer and allows the use of the same code for parallel functions on Linux, Mac OS, and Windows.

Linux and Mac OS are POSIX compliant and therefore are able to make use of 'fork()' to enable parallel tasks with significant speed benefits and ease of coding, this feature however does not exist in Windows. Instead, Windows machines can make use of a PSOCK cluster to do parallel computing. This helper package unifies the syntax and allows the code to be portable and suitably parallel across all platforms.

####Requires: The entire functionality is provided by the parallel package, which is shipped with R by default. Take a look through that documentation, you may decide that ptools is unnecessary for your use-case.

Installing

Packages are hosted at bamonroe.com/drat.

Load the library and configure as desired:

With the package installed, it should be loaded as any other package would:

library(ptools)

A configuration command needs to be run before any functions are run in parallel:

p_config(type = <type>, host_list = list(), cores = <cores>)

The p_config() function accepts type argument, either "FORK" or "PSOCK", a list named by hostname with elements equal to each core for the hostname, and/or a numeric value for cores.

If the number of cores specified is less than 1, 1 core will be used. If the number of cores specified is greater than the number available, the maximum number available will be used.

When utilizing FORK style parallelization, i.e. running code on Unix-like systems, the number of cores can be changed at any point without any issues.

Utility Functions:

The script comes with 7 wrapper functions as of right now:

p_cores()
  • returns the number of cores in use.
p_library(("libname1", "libname2", ...)
  • load all libraries listed into host and all cluster nodes if they exist.
  • for fork() implementations, it can safely replace library()
  • all library names need to be quoted, unlike library()
  • can take multiple library names at once, unlike library()

Cluster Only Functions:

These functions are only useful when using cluster style parallelizatin However, these commands will do absolutely nothing while using FORK and allow the script to be completely portable across platforms. So rather code as if you need to export objects to workers.

p_export("obj1", "obj2", ...)

Export the named objects to all cluseter nodes by default.

  • EXAMPLE 1:
   var <- 1:10
   c.export("var")
  • NOTE: This process is uncessary if you are not using PSOCKS or MPI,
p_eval(expression)
  • evaluates the expression on every worker
  • EXAMPLE:
    c.eval( vec <- 1:10 )

creates a vector named vec from 1 to 10 on every worker

c_call(FUN)
  • a wrapper for clusterCall. Not a wrapper for serial call, which does not do the same thing.

Apply family of functions

p_apply(X, MARGIN, FUN)
  • The wrapped apply function. Utilizes parApply for PSOCK. An additional wrapper function was written around mclapply for forking that results in a parallel equivalent of the apply function not included in the parallel package.
  • X, MARGIN, and FUN are exactly as they are in the serial version apply.
p_applyLB(X, FUN)
  • The wrapped apply function with load balancing. Utilizes parApplyLB for PSOCK. An additional wrapper function was written around mclapply(preschedule = FALSE) for forking that results in a parallel equivalent of the apply function not included in the parallel package.
  • X, MARGIN, and FUN are exactly as they are in the serial version lapply.
p_lapply(X, FUN)
  • The wrapped lapply function. Utilizes mclapply for FORK, parLapply for PSOCK.
  • X and FUN are exactly as they are in the serial version lapply.
p_lapplyLB(X,FUN)
  • The wrapped lapply function with load balanncing. Utilizes mclapply(preschedule = FALSE) with FORK, parLapplyLB with PSOCK.
  • X and FUN are exactly as they are in the serial version lapply.
p_sapply(X,FUN)
  • The wrapped sapply function. Utilizes parSapply with PSOCK. An additional wrapper was written around mclapply that results in a parallel version of sapply not included in the built in parallel package
  • X and FUN are exactly as they are in the serial version sapply.
p_sapplyLB(X,FUN)
  • The wrapped sapply function with load balancing. Utilizes parSapplyLB with PSOCK. An additional wrapper was written around mclapply(preschedule = FALSE) that results in a parallel version of sapply not included in the built in parallel package
  • X and FUN are exactly as they are in the serial version sapply.