An implementation of stats::cor(), which returns a correlation data frame rather than a matrix. See details below. Additional adjustment include the use of pairwise deletion by default.
Usage
correlate(
  x,
  y = NULL,
  use = "pairwise.complete.obs",
  method = "pearson",
  diagonal = NA,
  quiet = FALSE
)Arguments
- x
- a numeric vector, matrix or data frame. 
- y
- NULL(default) or a vector, matrix or data frame with compatible dimensions to- x. The default is equivalent to- y = x(but more efficient).
- use
- an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings - "everything",- "all.obs",- "complete.obs",- "na.or.complete", or- "pairwise.complete.obs".
- method
- a character string indicating which correlation coefficient (or covariance) is to be computed. One of - "pearson"(default),- "kendall", or- "spearman": can be abbreviated.
- diagonal
- Value (typically numeric or NA) to set the diagonal to 
- quiet
- Set as TRUE to suppress message about - methodand- useparameters.
Details
This function returns a correlation matrix as a correlation data frame in the following format:
- A tibble (see - tibble)
- An additional class, "cor_df" 
- A "term" column 
- Standardized variances (the matrix diagonal) set to missing values by default ( - NA) so they can be ignored in calculations.
The use argument and its possible values are inherited from stats::cor():
- "everything": NAs will propagate conceptually, i.e. a resulting value will be NA whenever one of its contributing observations is NA 
- "all.obs": the presence of missing observations will produce an error 
- "complete.obs": correlations will be computed from complete observations, with an error being raised if there are no complete cases. 
- "na.or.complete": correlations will be computed from complete observations, returning an NA if there are no complete cases. 
- "pairwise.complete.obs": the correlation between each pair of variables is computed using all complete pairs of those particular variables. 
As of version 0.4.3, the first column of a cor_df object is named "term".
In previous versions this first column was named "rowname".
There is a ggplot2::autoplot() method for quickly visualizing the
correlation matrix, for more information see autoplot.cor_df().
Examples
if (FALSE) {
correlate(iris)
}
correlate(iris[-5])
#> Correlation computed with
#> • Method: 'pearson'
#> • Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 4 × 5
#>   term         Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   <chr>               <dbl>       <dbl>        <dbl>       <dbl>
#> 1 Sepal.Length       NA          -0.118        0.872       0.818
#> 2 Sepal.Width        -0.118      NA           -0.428      -0.366
#> 3 Petal.Length        0.872      -0.428       NA           0.963
#> 4 Petal.Width         0.818      -0.366        0.963      NA    
correlate(mtcars)
#> Correlation computed with
#> • Method: 'pearson'
#> • Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 11 × 12
#>    term     mpg    cyl   disp     hp    drat     wt    qsec     vs      am
#>    <chr>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
#>  1 mpg   NA     -0.852 -0.848 -0.776  0.681  -0.868  0.419   0.664  0.600 
#>  2 cyl   -0.852 NA      0.902  0.832 -0.700   0.782 -0.591  -0.811 -0.523 
#>  3 disp  -0.848  0.902 NA      0.791 -0.710   0.888 -0.434  -0.710 -0.591 
#>  4 hp    -0.776  0.832  0.791 NA     -0.449   0.659 -0.708  -0.723 -0.243 
#>  5 drat   0.681 -0.700 -0.710 -0.449 NA      -0.712  0.0912  0.440  0.713 
#>  6 wt    -0.868  0.782  0.888  0.659 -0.712  NA     -0.175  -0.555 -0.692 
#>  7 qsec   0.419 -0.591 -0.434 -0.708  0.0912 -0.175 NA       0.745 -0.230 
#>  8 vs     0.664 -0.811 -0.710 -0.723  0.440  -0.555  0.745  NA      0.168 
#>  9 am     0.600 -0.523 -0.591 -0.243  0.713  -0.692 -0.230   0.168 NA     
#> 10 gear   0.480 -0.493 -0.556 -0.126  0.700  -0.583 -0.213   0.206  0.794 
#> 11 carb  -0.551  0.527  0.395  0.750 -0.0908  0.428 -0.656  -0.570  0.0575
#> # … with 2 more variables: gear <dbl>, carb <dbl>
#> # ℹ Use `colnames()` to see all variable names
if (FALSE) {
# Also supports DB backend and collects results into memory
library(sparklyr)
sc <- spark_connect(master = "local")
mtcars_tbl <- copy_to(sc, mtcars)
mtcars_tbl %>%
  correlate(use = "pairwise.complete.obs", method = "spearman")
spark_disconnect(sc)
}
