Overview
ggplot2.SparkR is an R package for scalable visualization of big data represented in Spark DataFrame.
It is an extension to the original ggplot2 package and can seamlessly handle both R data.frame and Spark DataFrame with no modifications to the original API.
Installation
SparkR Installation
Build Spark
Build Spark with Maven and include the -PsparkR
profile to build the R package. For example to use the default Hadoop versions you can run
build/mvn -DskipTests -Psparkr package
Using SparkR from RStudio
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
More details on https://github.com/apache/spark/tree/master/R
ggplot2.SparkR Installation
Get the development version from github:
# install.packages("devtools")
devtools::install_github("PAPL-SKKU/ggplot2.SparkR")
Getting Started
Mailing List
If you have any problems or questions, please post your question at our group ggplot2.SparkR
Or send an email to ggplot2-sparkr@googlegroups.com
You must be a member to post messages, but anyone can read the archived discussions.