An implementation of stats::cor(), which returns a correlation data frame rather than a matrix. See details below. Additional adjustment include the use of pairwise deletion by default.

correlate(
  x,
  y = NULL,
  use = "pairwise.complete.obs",
  method = "pearson",
  diagonal = NA,
  quiet = FALSE
)

Arguments

x

a numeric vector, matrix or data frame.

y

NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).

use

an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".

method

a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.

diagonal

Value (typically numeric or NA) to set the diagonal to

quiet

Set as TRUE to suppress message about method and use parameters.

Value

A correlation data frame cor_df

Details

  • A tibble (see tibble)

  • An additional class, "cor_df"

  • A "term" column

  • Standardized variances (the matrix diagonal) set to missing values by default (NA) so they can be ignored in calculations.

As of version 0.4.3, the first column of a cor_df object is named "term". In previous versions this first column was named "rowname".

Examples

if (FALSE) { correlate(iris) } correlate(iris[-5])
#> #> Correlation method: 'pearson' #> Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 4 x 5 #> term Sepal.Length Sepal.Width Petal.Length Petal.Width #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Sepal.Length NA -0.118 0.872 0.818 #> 2 Sepal.Width -0.118 NA -0.428 -0.366 #> 3 Petal.Length 0.872 -0.428 NA 0.963 #> 4 Petal.Width 0.818 -0.366 0.963 NA
correlate(mtcars)
#> #> Correlation method: 'pearson' #> Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 11 x 12 #> term mpg cyl disp hp drat wt qsec vs am #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 mpg NA -0.852 -0.848 -0.776 0.681 -0.868 0.419 0.664 0.600 #> 2 cyl -0.852 NA 0.902 0.832 -0.700 0.782 -0.591 -0.811 -0.523 #> 3 disp -0.848 0.902 NA 0.791 -0.710 0.888 -0.434 -0.710 -0.591 #> 4 hp -0.776 0.832 0.791 NA -0.449 0.659 -0.708 -0.723 -0.243 #> 5 drat 0.681 -0.700 -0.710 -0.449 NA -0.712 0.0912 0.440 0.713 #> 6 wt -0.868 0.782 0.888 0.659 -0.712 NA -0.175 -0.555 -0.692 #> 7 qsec 0.419 -0.591 -0.434 -0.708 0.0912 -0.175 NA 0.745 -0.230 #> 8 vs 0.664 -0.811 -0.710 -0.723 0.440 -0.555 0.745 NA 0.168 #> 9 am 0.600 -0.523 -0.591 -0.243 0.713 -0.692 -0.230 0.168 NA #> 10 gear 0.480 -0.493 -0.556 -0.126 0.700 -0.583 -0.213 0.206 0.794 #> 11 carb -0.551 0.527 0.395 0.750 -0.0908 0.428 -0.656 -0.570 0.0575 #> # … with 2 more variables: gear <dbl>, carb <dbl>
if (FALSE) { # Also supports DB backend and collects results into memory library(sparklyr) sc <- spark_connect(master = "local") mtcars_tbl <- copy_to(sc, mtcars) mtcars_tbl %>% correlate(use = "pairwise.complete.obs", method = "spearman") spark_disconnect(sc) }