Soil data has its specificities, but generally speaking we can get away with treating it like a spatial dataset for a significant part of the analysis workflow. This is why in this tutorial, we will present tools to read and process spatial data.

A lot (most?) of soil data is stored as using spatial formats – for better or worse. While the soil profile is a specific case (we can consider it as a set of horizons generated from a spatial point located at the surface of our planet, soil surveys, landsacpe attributes and enviornemental covariates are best represented as spatial datasets.

When reading such data, the main study cases are:

  • Text file formats, such as txt or csv
  • Spatial file formats, such as ESRI Shapefiles or Geopackages
  • Database (spatial or not), such as Postgres, PostGIS, etc.

In this tutorial, we will mostly cover the two first cases.

Packages

In the old days…

The “classic approach” is a collection of 4 packages:

  • sp: provides a set of dedicated classes for the different vector and raster datatypes , and some analysis tools (overlays, etc.),
  • rgdal: input/output library built over GDAL/OGR to read and write spatial data,
  • rgeos: geometry library built over GEOS for all geometry operations (union, intersections, buffer, etc).

The bleeding edge ⚡

Recently (this year), a new package has been introduced to manipulate vector data: sf (for Simple Features). It’s a pretty interesting improvement on sp/rgdal/rgeos, and we will focus on it in this tutorial.

In a few words, here’s what is interesting about sf:

  • It’s providing users with one unique class for all vector types,
  • It’s based on Simple Features, a formal standard (ISO 19125-1:2004) widely used in the GIS world that describes how objects in the real world can be represented,
  • The main class provided by sf is a data.frame – which means that a lot of data analysis methods are readily available,
  • It combines the capabilities of sp, rgdal, and rgeos under one unique package,
  • It is easier to install on some platforms than rgdal,
  • It is much faster, and scales better than sp and rgdal — the upcoming version will include spatial indexing!

Simple Features?

Quick definition

According to the standard:

“A simple feature is defined by the OpenGIS Abstract specification to have both spatial and non-spatial attributes. Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices.”

Basically, spatial is not that special anymore: spatial attributes can be described as a geometry, which is just another attribute of the dataset.

Several types of geometry can be implemented using this standard: POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION. There’s also more fancy stuff like CURVEPOLYGON, etc.

Implementation in R

Simple Features are represented using WKT/WKB (Well Known Text/Well Known Binary). It looks like:

## POINT(0 1)
## MULTIPOINT(1 6, 2 7, 3 8, 4 9, 5 10)
## MULTIPOLYGON(((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1), (5 5, 5 6, 6 6, 6 5, 5 5)), ((12 12, 22 12, 22 22, 12 22, 12 12), (13 13, 13 14, 14 14, 14 13, 13 13)), ((24 24, 34 24, 34 34, 24 34, 24 24)))

Simple Features are represented by sf as a data.frame. The spatial attributes of the Simple Features (or geometry) are represenetd using WKT. Technically speaking, there are 3 types of classes implemented by the sf package:

  • sf is the data.frame with spatial attributes,
  • sfc is the column storing the geometries of the different records in the data.frame,
  • sfg is the geometry of each individual record.
Illustration from Edzer Pebesma

Illustration from Edzer Pebesma

The functions provided by sf are prefixed by st_ — similarly to PostGIS. That makes it easy to search for them on the command line too.

Reading and writing spatial datasets

Loading spatial data from text files

It is very often that point data (such as profiles) is exchanged using a text delimited format such as CSV.

Reading text files

CSV and TXT files can be read using the classic read.csv, read.delim, etc., in base R. however, the readr package — from the tidyverse — does provide interesting alternatives that are much faster in read and write.

library(tidyverse)

df <- read_csv('./path/to/my/file.csv')

Converting data.frame to Simple Features

Let’s take the example dataset meuse from the sp package. meuse is a data.frame — similar to what could have been read from a CSV file using read_csv:

data('meuse', package = "sp")
head(meuse)
##        x      y cadmium copper lead zinc  elev       dist   om ffreq soil
## 1 181072 333611    11.7     85  299 1022 7.909 0.00135803 13.6     1    1
## 2 181025 333558     8.6     81  277 1141 6.983 0.01222430 14.0     1    1
## 3 181165 333537     6.5     68  199  640 7.800 0.10302900 13.0     1    1
## 4 181298 333484     2.6     81  116  257 7.655 0.19009400  8.0     1    2
## 5 181307 333330     2.8     48  117  269 7.480 0.27709000  8.7     1    2
## 6 181390 333260     3.0     61  137  281 7.791 0.36406700  7.8     1    2
##   lime landuse dist.m
## 1    1      Ah     50
## 2    1      Ah     30
## 3    1      Ah    150
## 4    0      Ga    270
## 5    0      Ah    380
## 6    0      Ga    470

The easiest way to modifiy this data.frame so that its coordinates (x and y) are used to generate a geometry column is using st_as_sf (“as simple feature”):

ms <- st_as_sf(
  meuse, 
  coords = c('x', 'y'),
  crs = "+init=epsg:28992"
)

ms
## Simple feature collection with 155 features and 12 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 178605 ymin: 329714 xmax: 181390 ymax: 333611
## epsg (SRID):    28992
## proj4string:    +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.4171,50.3319,465.5524,-0.398957388243134,0.343987817378283,-1.87740163998045,4.0725 +units=m +no_defs
## First 20 features:
##    cadmium copper lead zinc  elev       dist   om ffreq soil lime landuse
## 1     11.7     85  299 1022 7.909 0.00135803 13.6     1    1    1      Ah
## 2      8.6     81  277 1141 6.983 0.01222430 14.0     1    1    1      Ah
## 3      6.5     68  199  640 7.800 0.10302900 13.0     1    1    1      Ah
## 4      2.6     81  116  257 7.655 0.19009400  8.0     1    2    0      Ga
## 5      2.8     48  117  269 7.480 0.27709000  8.7     1    2    0      Ah
## 6      3.0     61  137  281 7.791 0.36406700  7.8     1    2    0      Ga
## 7      3.2     31  132  346 8.217 0.19009400  9.2     1    2    0      Ah
## 8      2.8     29  150  406 8.490 0.09215160  9.5     1    1    0      Ab
## 9      2.4     37  133  347 8.668 0.18461400 10.6     1    1    0      Ab
## 10     1.6     24   80  183 9.049 0.30970200  6.3     1    2    0       W
## 11     1.4     25   86  189 9.015 0.31511600  6.4     1    2    0      Fh
## 12     1.8     25   97  251 9.073 0.22812300  9.0     1    1    0      Ag
## 13    11.2     93  285 1096 7.320 0.00000000 15.4     1    1    1       W
## 14     2.5     31  183  504 8.815 0.11393200  8.4     1    1    0      Ah
## 15     2.0     27  130  326 8.937 0.16833600  9.1     1    1    0      Ah
## 16     9.5     86  240 1032 7.702 0.00000000 16.2     1    1    1       W
## 17     7.0     74  133  606 7.160 0.01222430 16.0     1    1    1       W
## 18     7.1     69  148  711 7.100 0.01222430 16.0     1    1    1       W
## 19     8.7     69  207  735 7.020 0.00000000 13.7     1    1    1       W
## 20    12.9     95  284 1052 6.860 0.00000000 14.8     1    1    1    <NA>
##    dist.m             geometry
## 1      50 POINT(181072 333611)
## 2      30 POINT(181025 333558)
## 3     150 POINT(181165 333537)
## 4     270 POINT(181298 333484)
## 5     380 POINT(181307 333330)
## 6     470 POINT(181390 333260)
## 7     240 POINT(181165 333370)
## 8     120 POINT(181027 333363)
## 9     240 POINT(181060 333231)
## 10    420 POINT(181232 333168)
## 11    400 POINT(181191 333115)
## 12    300 POINT(181032 333031)
## 13     20 POINT(180874 333339)
## 14    130 POINT(180969 333252)
## 15    220 POINT(181011 333161)
## 16     10 POINT(180830 333246)
## 17     10 POINT(180763 333104)
## 18     10 POINT(180694 332972)
## 19     10 POINT(180625 332847)
## 20     10 POINT(180555 332707)

There is a simple plotting funtion including in sf. It is very similar to the old sp::spplot:

plot(ms)
## Warning: plotting the first 10 out of 12 attributes; use max.plot = 12 to
## plot all

Loading spatial data formats

A wide selecton of spatial data formats can be read using the st_read command (it’s using GDAL/OGR behind the scenes). Unlike readOGR from the rgdal package, generally the command can guess which driver it should use.

file_name <- system.file("shape/nc.shp", package="sf")
nc <- st_read(file_name)
## Reading layer `nc' from data source `/usr/local/lib/R/site-library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
print(nc)
## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## First 20 features:
##     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188
## 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508
## 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
## 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
## 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286
## 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420
## 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968
## 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612
## 11 0.114     1.352  1838    1838     Caswell 37033  37033       17  1035
## 12 0.153     1.616  1839    1839  Rockingham 37157  37157       79  4449
## 13 0.143     1.663  1840    1840   Granville 37077  37077       39  1671
## 14 0.109     1.325  1841    1841      Person 37145  37145       73  1556
## 15 0.072     1.085  1842    1842       Vance 37181  37181       91  2180
## 16 0.190     2.204  1846    1846     Halifax 37083  37083       42  3608
## 17 0.053     1.171  1848    1848  Pasquotank 37139  37139       70  1638
## 18 0.199     1.984  1874    1874      Wilkes 37193  37193       97  3146
## 19 0.081     1.288  1880    1880     Watauga 37189  37189       95  1323
## 20 0.063     1.000  1881    1881  Perquimans 37143  37143       72   484
##    SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1      1      10  1364     0      19 MULTIPOLYGON(((-81.47275543...
## 2      0      10   542     3      12 MULTIPOLYGON(((-81.23989105...
## 3      5     208  3616     6     260 MULTIPOLYGON(((-80.45634460...
## 4      1     123   830     2     145 MULTIPOLYGON(((-76.00897216...
## 5      9    1066  1606     3    1197 MULTIPOLYGON(((-77.21766662...
## 6      7     954  1838     5    1237 MULTIPOLYGON(((-76.74506378...
## 7      0     115   350     2     139 MULTIPOLYGON(((-76.00897216...
## 8      0     254   594     2     371 MULTIPOLYGON(((-76.56250762...
## 9      4     748  1190     2     844 MULTIPOLYGON(((-78.30876159...
## 10     1     160  2038     5     176 MULTIPOLYGON(((-80.02567291...
## 11     2     550  1253     2     597 MULTIPOLYGON(((-79.53050994...
## 12    16    1243  5386     5    1369 MULTIPOLYGON(((-79.53050994...
## 13     4     930  2074     4    1058 MULTIPOLYGON(((-78.74912261...
## 14     4     613  1790     4     650 MULTIPOLYGON(((-78.80680084...
## 15     4    1179  2753     6    1492 MULTIPOLYGON(((-78.49252319...
## 16    18    2365  4463    17    2980 MULTIPOLYGON(((-77.33220672...
## 17     3     622  2275     4     933 MULTIPOLYGON(((-76.29892730...
## 18     4     200  3725     7     222 MULTIPOLYGON(((-81.02056884...
## 19     1      17  1775     1      33 MULTIPOLYGON(((-81.80622100...
## 20     1     230   676     0     310 MULTIPOLYGON(((-76.48052978...
plot(nc)
## Warning: plotting the first 10 out of 14 attributes; use max.plot = 14 to
## plot all

plot(nc['AREA'])

Its counterpart is st_write and is just as easy to use:

st_write(nc, "nc.shp")
st_write(nc, "nc.shp", delete_layer = TRUE)

# Note that write_st erases existing layers silently:
write_sf(nc, "nc.shp")

The drivers available on your machine can be printed using the st_drivers command:

my_drivers <- st_drivers()
head(my_drivers)
##                          name                                  long_name
## PCIDSK                 PCIDSK                       PCIDSK Database File
## netCDF                 netCDF                 Network Common Data Format
## JP2OpenJPEG       JP2OpenJPEG JPEG-2000 driver based on OpenJPEG library
## PDF                       PDF                             Geospatial PDF
## ESRI Shapefile ESRI Shapefile                             ESRI Shapefile
## MapInfo File     MapInfo File                               MapInfo File
##                write  copy is_raster is_vector
## PCIDSK          TRUE FALSE      TRUE      TRUE
## netCDF          TRUE  TRUE      TRUE      TRUE
## JP2OpenJPEG    FALSE  TRUE      TRUE      TRUE
## PDF             TRUE  TRUE      TRUE      TRUE
## ESRI Shapefile  TRUE FALSE     FALSE      TRUE
## MapInfo File    TRUE FALSE     FALSE      TRUE

The case of databases

GDAL/OGR can read data from databases — so sf does too:

# Using GDAL
meuse <- st_read("PG:dbname=postgis", "meuse")

# Using the Postgres package
library(RPostgreSQL)

# Connect to a PostGIS instance
conn <- dbConnect(PostgreSQL(), dbname = "postgis")

# Use this conection to make a query
x <- st_read_db(conn, "meuse", query = "SELECT * FROM meuse LIMIT 3;")
x <- st_read_db(conn, table = "public.meuse")

print(st_crs(x)) # SRID resolved by the database, not by GDAL!

# Close the DB connection
dbDisconnect(conn)

# Write spatial data as SQLite database
sf::write_sf(nc, 'nc.sqlite')

Manipulating Simple Features

Very roughly, the sf package provides two types of functions:

  1. Spatial data operators (read, write, buffer, intersection, etc), prefixed by st_:
##  [1] "st_agr<-.sf"          "st_agr.sf"            "st_as_sf.sf"         
##  [4] "st_bbox.sf"           "st_boundary.sf"       "st_buffer.sf"        
##  [7] "st_cast.sf"           "st_centroid.sf"       "st_convex_hull.sf"   
## [10] "st_coordinates.sf"    "st_crs<-.sf"          "st_crs.sf"           
## [13] "st_difference.sf"     "st_geometry<-.sf"     "st_geometry.sf"      
## [16] "st_intersection.sf"   "st_is.sf"             "st_line_merge.sf"    
## [19] "st_make_valid.sf"     "st_polygonize.sf"     "st_precision.sf"     
## [22] "st_segmentize.sf"     "st_set_precision.sf"  "st_simplify.sf"      
## [25] "st_split.sf"          "st_sym_difference.sf" "st_transform.sf"     
## [28] "st_triangulate.sf"    "st_union.sf"          "st_voronoi.sf"       
## [31] "st_zm.sf"
  1. A suite of dplyr verbs addapted for the sf data:
##  [1] "aggregate.sf"                "cbind.sf"                   
##  [3] "coerce,oldClass,S3-method"   "coerce,sf,Spatial-method"   
##  [5] "coerce,Spatial,sf-method"    "initialize,oldClass-method" 
##  [7] "merge.sf"                    "plot.sf"                    
##  [9] "print.sf"                    "rbind.sf"                   
## [11] "[.sf"                        "[[<-.sf"                    
## [13] "$<-.sf"                      "show,oldClass-method"       
## [15] "slotsFromS3,oldClass-method"

tidy verbs

sf is very well integrated with dplyr and the tidyverse: remember that Simple Features as implemented by sf are fundamentally just data.frames!

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Example: joining non-spatial attributes
# 
additional_data <- data.frame(
  CNTY_ID = nc$CNTY_ID,
  my_data <- runif(nrow(nc))
)

left_join(nc, additional_data, by = 'CNTY_ID')
## Simple feature collection with 100 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## First 20 features:
##     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188
## 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508
## 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
## 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
## 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286
## 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420
## 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968
## 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612
## 11 0.114     1.352  1838    1838     Caswell 37033  37033       17  1035
## 12 0.153     1.616  1839    1839  Rockingham 37157  37157       79  4449
## 13 0.143     1.663  1840    1840   Granville 37077  37077       39  1671
## 14 0.109     1.325  1841    1841      Person 37145  37145       73  1556
## 15 0.072     1.085  1842    1842       Vance 37181  37181       91  2180
## 16 0.190     2.204  1846    1846     Halifax 37083  37083       42  3608
## 17 0.053     1.171  1848    1848  Pasquotank 37139  37139       70  1638
## 18 0.199     1.984  1874    1874      Wilkes 37193  37193       97  3146
## 19 0.081     1.288  1880    1880     Watauga 37189  37189       95  1323
## 20 0.063     1.000  1881    1881  Perquimans 37143  37143       72   484
##    SID74 NWBIR74 BIR79 SID79 NWBIR79 my_data....runif.nrow.nc..
## 1      1      10  1364     0      19                0.008945796
## 2      0      10   542     3      12                0.293739612
## 3      5     208  3616     6     260                0.277374958
## 4      1     123   830     2     145                0.813574215
## 5      9    1066  1606     3    1197                0.260427771
## 6      7     954  1838     5    1237                0.724405893
## 7      0     115   350     2     139                0.906092151
## 8      0     254   594     2     371                0.949040221
## 9      4     748  1190     2     844                0.073144469
## 10     1     160  2038     5     176                0.754675027
## 11     2     550  1253     2     597                0.286000621
## 12    16    1243  5386     5    1369                0.100053522
## 13     4     930  2074     4    1058                0.954068775
## 14     4     613  1790     4     650                0.415607119
## 15     4    1179  2753     6    1492                0.455102418
## 16    18    2365  4463    17    2980                0.971055656
## 17     3     622  2275     4     933                0.583987980
## 18     4     200  3725     7     222                0.962204624
## 19     1      17  1775     1      33                0.761702403
## 20     1     230   676     0     310                0.714508535
##                          geometry
## 1  MULTIPOLYGON(((-81.47275543...
## 2  MULTIPOLYGON(((-81.23989105...
## 3  MULTIPOLYGON(((-80.45634460...
## 4  MULTIPOLYGON(((-76.00897216...
## 5  MULTIPOLYGON(((-77.21766662...
## 6  MULTIPOLYGON(((-76.74506378...
## 7  MULTIPOLYGON(((-76.00897216...
## 8  MULTIPOLYGON(((-76.56250762...
## 9  MULTIPOLYGON(((-78.30876159...
## 10 MULTIPOLYGON(((-80.02567291...
## 11 MULTIPOLYGON(((-79.53050994...
## 12 MULTIPOLYGON(((-79.53050994...
## 13 MULTIPOLYGON(((-78.74912261...
## 14 MULTIPOLYGON(((-78.80680084...
## 15 MULTIPOLYGON(((-78.49252319...
## 16 MULTIPOLYGON(((-77.33220672...
## 17 MULTIPOLYGON(((-76.29892730...
## 18 MULTIPOLYGON(((-81.02056884...
## 19 MULTIPOLYGON(((-81.80622100...
## 20 MULTIPOLYGON(((-76.48052978...

sf is also providing verbs that are adapted to working with geometries. For example summarise will also operate a spatial union:

nc %>% 
  mutate(
    group = sample(LETTERS[1:3], size = n(), replace = TRUE),
    area = st_area(.), # Note the computation of area on the fly
    sqrt_area = sqrt(area)
  ) %>% 
  group_by(group) %>% 
  summarise(mean_area = mean(sqrt_area)) %>% 
  plot
## Warning in Math.units(area): Operation sqrt not meaningful for units

Projection

The main functions are st_crs, which retrieves the Coordinate Reference System (CRS) of the sf object, and st_transform, which allows to re-project the dataset into a different CRS.

The CRS can be specified as an EPSG number (that is: an integer), or as a character string follwoing the Proj.4 standard.

# Retrieve projection information
st_crs(nc)
## $epsg
## [1] 4267
## 
## $proj4string
## [1] "+proj=longlat +datum=NAD27 +no_defs"
## 
## attr(,"class")
## [1] "crs"
# Test whether data is projected or not
st_is_longlat(nc)
## [1] TRUE
# Projection
# 

# Using a Proj.4 string
nc_p <- st_transform(nc, crs = "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")
# using an EPSG code
nc_p <- st_transform(nc, crs = 32119)

st_crs(nc_p)
## $epsg
## [1] 32119
## 
## $proj4string
## [1] "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## 
## attr(,"class")
## [1] "crs"

Geometry

The sf library wraps the capabilities of the GEOS (Geometry Engine Open Source) library — which is a very powerful piece of software used throughout the GIS industry, notably in PostGIS. The geometry functions in sf are prefixed using st_ to mimic the PostGIS counterpart functions.

# Area of features
st_area(nc)
## Units: m^2
##   [1] 1137388604  611077263 1423489919  694546292 1520740530  967727952
##   [7]  615942210  903650119 1179347051 1232769242 1136287383 1524655447
##  [13] 1427072339 1085967384  718190417 1894045426  524418608 1986966652
##  [19]  812293946  626337436  439717946  640703090  863300187 1276536764
##  [25] 1084141138 1697937846 1109973986 1800646340 1036414044  770546970
##  [31] 1423171283  585232978 1311628320 1225109692  800263351 1186420419
##  [37] 2194592880 1179139746 1550312884  690606074  665152821 1457875905
##  [43] 1340538971 1005743617  988324117 1163892506 2019778925 1810512144
##  [49]  944425951 1350102629 1685154840 1068192109 1691487743 2082117118
##  [55] 1447094476  943881740 2045540782 1420919020  707671880  653368537
##  [61] 1471077596 1436135571 1550981080 1186028306  788535203 1265443511
##  [67] 1829439189 1447873644  918216585 1312723474 1043728456  961860121
##  [73]  781910957 1046066043  986730350  917752685  601326678 1321916259
##  [79] 2437929994  833538942 1210326313 1738550168 1228689799 1648412173
##  [85] 1400582685  995107225 1678077407 2071844302 1228274216  519198288
##  [91] 1784932020  808600209 1978619669 2439553215 1264050755 2288682896
##  [97] 2181174167 2450241815  430695499 2165843695
# Distances between features
st_distance(nc_p[1:3,], nc_p[3:6,])
## Units: m
##          [,1]     [,2]     [,3]     [,4]
## [1,] 25652.53 440561.1 299772.2 361529.5
## [2,]     0.00 409428.5 268944.3 332589.8
## [3,]     0.00 367555.5 227017.5 290297.1
# Sanity check on the geometries
st_is_valid(nc[1:3,])
## [1] TRUE TRUE TRUE

The commands st_intersects, st_disjoint, st_touches, st_crosses, st_within, st_contains, st_overlaps, st_equals, st_covers, st_covered_by, st_equals_exact and st_is_within_distance are logical tests. They return a sparse matrix with matching (TRUE) indexes, or a full logical matrix:

# Intersects
st_intersects(nc_p[1:5,], nc_p[1:4,])
## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] 1 2 3
## 
## [[3]]
## [1] 2 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## integer(0)
st_intersects(nc_p[1:5,], nc_p[1:4,], sparse = FALSE)
##       [,1]  [,2]  [,3]  [,4]
## [1,]  TRUE  TRUE FALSE FALSE
## [2,]  TRUE  TRUE  TRUE FALSE
## [3,] FALSE  TRUE  TRUE FALSE
## [4,] FALSE FALSE FALSE  TRUE
## [5,] FALSE FALSE FALSE FALSE

The commands st_buffer, st_boundary, st_convexhull, st_union_cascaded, st_simplify, st_triangulate, st_polygonize, st_centroid, st_segmentize, and st_union return new geometries, e.g.:

buf <- st_buffer(nc_p[c(1, 2, 14),], dist = 30000)
plot(buf, border = 'red')
## Warning: plotting the first 10 out of 14 attributes; use max.plot = 14 to
## plot all

The st_buffer function computes a buffer outwards or inwards from features. Bonus trick: if you have invalid fatures, try to make a buffer of size zero

fix <- st_buffer(corrupted, dist = 0)

Commands st_intersection, st_union, st_difference, st_sym_difference return new geometries that are a function of pairs of geometries:

# Intersection
i <- st_intersection(nc_p[1:5,], nc_p[1:4,])
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
plot(i)
## Warning: plotting the first 9 out of 28 attributes; use max.plot = 28 to
## plot all

# Union
u <- st_union(nc)
plot(u)

Coercion to and from the sp classes

Note that you can convert sf objects back to either data.frame or one of the old sp classes very easily:

# Convert to data.frame
as.data.frame(nc) %>% head
##    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 2 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 3 0.143     1.630  1828    1828       Surry 37171  37171       86  3188
## 4 0.070     2.968  1831    1831   Currituck 37053  37053       27   508
## 5 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
## 6 0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
##   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1     1      10  1364     0      19 MULTIPOLYGON(((-81.47275543...
## 2     0      10   542     3      12 MULTIPOLYGON(((-81.23989105...
## 3     5     208  3616     6     260 MULTIPOLYGON(((-80.45634460...
## 4     1     123   830     2     145 MULTIPOLYGON(((-76.00897216...
## 5     9    1066  1606     3    1197 MULTIPOLYGON(((-77.21766662...
## 6     7     954  1838     5    1237 MULTIPOLYGON(((-76.74506378...
# convert to one of the sp classes
as(nc, "Spatial") %>% summary
##                   Length                    Class                     Mode 
##                      100 SpatialPolygonsDataFrame                       S4

ggplot2 compatibility

In the upcoming version of ggplot2 (note that I’m running the development version), sf object can be plotted very easily using geom_sf — so the incredible power of this plotting sytem can be takn advantage of:

library(ggplot2)
packageVersion('ggplot2')
## [1] '2.2.1.9000'
# Unprojected data
ggplot(nc) + 
  geom_sf(aes(fill = AREA))

# Facetting (on projected data this time!)

nc_grouped <- nc_p %>% 
  mutate(
    # Creating a (random) group attribute
    group = sample(LETTERS[1:3], size = n(), replace = TRUE)
  ) 

ggplot(nc_grouped) +
  geom_sf(aes(fill = sqrt(AREA))) +
  facet_wrap(~group, ncol = 2)

# Going full fancy
p <- nc_grouped %>% 
  select(-group) %>% 
  ggplot() + 
    geom_sf(colour = "grey70")

p +
  geom_sf(data = nc_grouped, aes(fill = sqrt(AREA))) +
  facet_wrap(~group, ncol = 2) +
  theme_bw()