[UNMAINTAINED] See ptools for new package
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
This repo is archived. You can view files and clone it, but cannot push or open issues/pull-requests.
 
 
Brian Albert Monroe e16f62faab
Update description
2 years ago
R Change c.library to no longer use dots 2 years ago
man Change c.library to no longer use dots 2 years ago
src Bump version, big changeup 3 years ago
.Rbuildignore Move stuff up one dir for github 5 years ago
.gitignore Get rid of Rhpc2, require MPI_cores to be set in options() to use MPI 3 years ago
DESCRIPTION Update description 2 years ago
NAMESPACE Bump version, big changeup 3 years ago
README.md Update README to include newly included functions. 5 years ago

README.md

This helper script allows code to be portable between use on a MPI cluster computer and allows the use of the same code for parallel functions on Linux, Mac OS, and Windows.

Linux and Mac OS are POSIX compliant and therefore are able to make use of 'fork()' to enable parallel tasks with significant speed benefits and ease of coding, this feature however does not exist in Windows. Instead, Windows machines can make use of a PSOCK cluster to do parallel computing. The commands used for a PSOCK cluster are similar to those used for an MPI cluster, but differ in syntax and occasional arguments. Thus we have 3 ways to do parallel computing, with different syntax being used when on a Linux or Mac OS machine vs a Windows machine or while utilizing an MPI cluster. This helper script unifies the syntax and allows the code to be portable and suitably parallel across all platforms.

####Requires: The Rhpc package is necessary to run the code with MPI, which is used in place of the standard Rmpi and snow packages as it suports long vectors. The version patched by my is required, but CRAN maintains the official Rhpc CRAN Version, which contains documentaion and links to the maintainer. The remaining functionality is provided by the parallel package, which is shipped with R by default.

Please take a look through the R high performance computing page to get an overview of all the possible ways to use high performance computing with R. Some of the functionality of these packages may be incorporated at a later date...if I find myself using them regularly.

Installing

To install the package, you'll need the devtools package developed by Hadley Wickham.

install.packages("devtools")

Next, install the package directly from GitHub

library("devtools")
install_github("bamonroe/ctools")

And you should be good to go!

Load the library and configure as desired:

With the package installed, it should be loaded as any other package would:

library("ctools")

Loading the package into namespace automatically creates a PSOCKS cluster if run on Windows and an MPI cluster if MPI is detected. If running a Unix-like machine, e.g. OSX, Linux, FORK style parallelization will be used. Under all methods, the default configuration utilizes the maximum number of CPUs available to R.

c.config(<integer>)

The c.config() function accepts a numerical argument and attempts to set the number of cores available to R equal to the argument. If the number of cores specified is less than 1, 1 core will be used. If the number of cores specified is greater than the number available, the maximum number available will be used.

Note that this function does nothing for MPI clusters, the user should specify the number of CPUs or "slots" available to R in the machine file parsed by mpiexec. Note also that changing the number of cores for a PSOCKS style cluster, i.e. when running code on Windows, will close the existing cluster and start a new one with the specified number of cores. Thus, any objects exported to the original cluster will have to be re-exported. When utilizing FORK style parallelization, i.e. running code on Unix-like systems, the number of cores can be changed at any point without any issues.

Utility Functions:

The script comes with 7 wrapper functions as of right now:

c.cores()
  • returns the number of cores in use.
c.library(("libname1","libname2",...)
  • load all libraries listed into host and all cluster nodes if they exist.
  • for fork() implementations, it can safely replace library()
  • all library names need to be quoted, unlike library()
  • can take multiple library names at once, unlike library()

Cluster Only Functions:

These functions are only useful when using cluster style parallelizatin However, these commands will do absolutely nothing while using FORK and allow the script to be completely portable across platforms. So rather code as if you need to export objects to workers.

c.export("obj1","obj2",..., push = TRUE, clear = FALSE )

Export the named objects to all cluseter nodes by default. If push = FALSE the object names are added to a list of objects to exported and exported with a subsequent call to c.export where push = TRUE along with any other objects named. If clear = TRUE clear the names from this stored export list before doing anything else.

  • EXAMPLE 1:

    var <- 1:10
    c.export("var")
    
  • EXAMPLE 2:

    var <- 1:10
    c.export("var", push = F) # Adds 'var' to list of objects to be exported later
    
  • NOTE: This process is uncessary if you are not using PSOCKS or MPI,

c.eval(expression)
  • evaluates the expression on every worker

  • EXAMPLE:

    c.eval( vec <- 1:10 )
    

creates a vector named vec from 1 to 10 on every worker

c.call(FUN)
  • a wrapper for Rhpc_woker_call and clusterCall. Not a wrapper for serial call, which does not do the same thing.

Apply family of functions

c.apply(X,MARGIN,FUN)
  • The wrapped apply function. Utilizes Rhpc_apply with MPI, parLapply when forking is not available. An additional wrapper function was written around mclapply for forking that results in a parallel equivalent of the apply function not included in the parallel package.
  • X, MARGIN, and FUN are exactly as they are in the serial version lapply.
c.applyLB(X,FUN)
  • The wrapped apply function with load balancing. Utilizes Rhpc_applyLB with MPI, parApplyLB when forking is not available. An additional wrapper function was written around mclapply(preschedule = FALSE) for forking that results in a parallel equivalent of the apply function not included in the parallel package.
  • X, MARGIN, and FUN are exactly as they are in the serial version lapply.
c.lapply(X,FUN)
  • The wrapped lapply function. Utilizes Rhpc_lapply with MPI, mclapply when forking is available, parLapply when forking is not available.
  • X and FUN are exactly as they are in the serial version lapply.
c.lapplyLB(X,FUN)
  • The wrapped lapply function with load balanncing. Utilizes Rhpc_lapplyLB with MPI, mclapply(preschedule = FALSE) when forking is available, parLapplyLB when forking is not available.
  • X and FUN are exactly as they are in the serial version lapply.
c.sapply(X,FUN)
  • The wrapped sapply function. Utilizes Rhpc_sapply with MPI, parLapply when forking is not available. An additional wrapper was written around mclapply that results in a parallel version of sapply not included in the built in parallel package
  • X and FUN are exactly as they are in the serial version sapply.
c.sapplyLB(X,FUN)
  • The wrapped sapply function with load balancing. Utilizes Rhpc_sapplyLB with MPI, parLapplyLB when forking is not available. An additional wrapper was written around mclapply(preschedule = FALSE) that results in a parallel version of sapply not included in the built in parallel package
  • X and FUN are exactly as they are in the serial version sapply.