ggplot2.SparkR

Welcome to ggplot2.SparkR

View project on GitHub

Overview

ggplot2.SparkR is an R package for scalable visualization of big data represented in Spark DataFrame.

It is an extension to the original ggplot2 package and can seamlessly handle both R data.frame and Spark DataFrame with no modifications to the original API.

Installation

SparkR Installation

Build Spark

Build Spark with Maven and include the -PsparkR profile to build the R package. For example to use the default Hadoop versions you can run

  build/mvn -DskipTests -Psparkr package

Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example

# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")

More details on https://github.com/apache/spark/tree/master/R

ggplot2.SparkR Installation

Get the development version from github:

# install.packages("devtools")
devtools::install_github("PAPL-SKKU/ggplot2.SparkR")

Getting Started

Mailing List

If you have any problems or questions, please post your question at our group ggplot2.SparkR

Or send an email to ggplot2-sparkr@googlegroups.com

You must be a member to post messages, but anyone can read the archived discussions.