For this module you need to have the following libraries installed and loaded:

Data file needed is RSpatialDataTypes.zip.

## 1. Conceptualizing a spatial vector objects in R

In vector GIS we deal with, points, lines, and polygons, like so:

### Exercise 1

Discuss with your neighbor: What information do we need to store in order to define points, lines, polygons in geographic space?

# - lat/lon coordinates
# - projection
# - what type (point/line/poly)
# - if polygon, is it a hole or not
# - attribute data
# * ... ?

There are currently two main approaches in R to handle geographic vector data.

### The sp package

The first package to provide classes and methods for spatial data types in R is called sp1. Development of the sp package began in the early 2000s in an attempt to standardize how spatial data would be treated in R and to allow for better interoperability between different analysis packages that use spatial data. The package (first release on CRAN in 2005) provides classes and methods to create points, lines, polygons, and grids and to operate on them. About 350 of the spatial analysis packages use the spatial data types that are implemented in sp i.e. they “depend” on the sp package and many more are indirectly dependent.

The foundational structure for any spatial object in sp is the Spatial class. It has two “slots” (new-style S4 class objects in R have pre-defined components called slots):

• a bounding box

• a CRS class object to define the Coordinate Reference System

This basic structure is then extended, depending on the characteristics of the spatial object (point, line, polygon).

To build up a spatial object in sp we could follow these steps:

I. Create geometric objects (topology)

Points (which may have 2 or 3 dimensions) are the most basic spatial data objects. They are generated out of either a single coordinate or a set of coordinates, like a two-column matrix or a dataframe with a column for latitude and one for longitude.
Lines are generated out of Line objects. A Line object is a spaghetti collection of 2D coordinates2 and is generated out of a two-column matrix or a dataframe with a column for latitude and one for longitude. A Lines object is a list of one or more Line objects, for example all the contours at a single elevation.
Polygons are generated out of Polygon objects. A Polygon object is a spaghetti collection of 2D coordinates with equal first and last coordinates and is generated out of a two-column matrix or a dataframe with a column for latitude and one for longitude. A Polygons object is a list of one or more Polygon objects, for example islands belonging to the same country.

See here for a very simple example for how to create a Line object:

ln <- Line(matrix(runif(6), ncol=2))
str(ln)
Formal class 'Line' [package "sp"] with 1 slot
..@ coords: num [1:3, 1:2] 0.248 0.993 0.536 0.772 0.67 ...

See here for a very simple example for how to create a Lines object:

lns <- Lines(list(ln), ID = "a") # this contains just one Line!
str(lns)
Formal class 'Lines' [package "sp"] with 2 slots
..@ Lines:List of 1
.. ..$:Formal class 'Line' [package "sp"] with 1 slot .. .. .. ..@ coords: num [1:3, 1:2] 0.248 0.993 0.536 0.772 0.67 ... ..@ ID : chr "a" 1. Create spatial objects Spatial* object (* stands for Points, Lines, or Polygons). This step adds the bounding box (automatically) and the slot for the Coordinate Reference System or CRS (which needs to be filled with a value manually). SpatialPoints can be directly generated out of the coordinates. SpatialLines and SpatialPolygons objects are generated using lists of Lines or Polygons objects respectively (more below). See here for how to create a SpatialLines object: sp_lns <- SpatialLines(list(lns)) str(sp_lns) Formal class 'SpatialLines' [package "sp"] with 3 slots ..@ lines :List of 1 .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
.. .. .. ..@ Lines:List of 1
.. .. .. .. ..$:Formal class 'Line' [package "sp"] with 1 slot .. .. .. .. .. .. ..@ coords: num [1:3, 1:2] 0.248 0.993 0.536 0.772 0.67 ... .. .. .. ..@ ID : chr "a" ..@ bbox : num [1:2, 1:2] 0.248 0.261 0.993 0.772 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:2] "x" "y"
.. .. ..$: chr [1:2] "min" "max" ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot .. .. ..@ projargs: chr NA 1. Add attributes (Optional:) Add a data frame with attribute data, which will turn your Spatial* object into a Spatial*DataFrame object. The points in a SpatialPoints object may be associated with a row of attributes to create a SpatialPointsDataFrame object. The coordinates and attributes may, but do not have to be keyed to each other using ID values. SpatialLinesDataFrame and SpatialPolygonsDataFrame objects are defined using SpatialLines and SpatialPolygons objects and data frames. The ID fields are here required to match the data frame row names. See here for how to create a SpatialLinesDataframe: dfr <- data.frame(id = "a", use = "road", cars_per_hour = 10) # note how we use the ID from above! sp_lns_dfr <- SpatialLinesDataFrame(sp_lns, dfr, match.ID = "id") str(sp_lns_dfr) Formal class 'SpatialLinesDataFrame' [package "sp"] with 4 slots ..@ data :'data.frame': 1 obs. of 3 variables: .. ..$ id           : Factor w/ 1 level "a": 1
.. ..$use : Factor w/ 1 level "road": 1 .. ..$ cars_per_hour: num 10
..@ lines      :List of 1
.. ..$:Formal class 'Lines' [package "sp"] with 2 slots .. .. .. ..@ Lines:List of 1 .. .. .. .. ..$ :Formal class 'Line' [package "sp"] with 1 slot
.. .. .. .. .. .. ..@ coords: num [1:3, 1:2] 0.248 0.993 0.536 0.772 0.67 ...
.. .. .. ..@ ID   : chr "a"
..@ bbox       : num [1:2, 1:2] 0.248 0.261 0.993 0.772
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$: chr [1:2] "x" "y" .. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr NA

A number of spatial methods are available for the classes in sp. Among the ones I use more frequently are:

function and what it does
bbox() returns the bounding box coordinates
proj4string() sets or retrieves projection attributes using the CRS object.
CRS() creates an object of class of coordinate reference system arguments
spplot() plots a separate map of all the attributes unless specified otherwise
coordinates() set or retrieve the spatial coordinates. For spatial polygons it returns the centroids.
over(a, b) used for example to retrieve the polygon or grid indices on a set of points
spsample() sampling of spatial points within the spatial extent of objects

### The sf package

The second package, first released on CRAN in late October 2016, is called sf3. It implements a formal standard called “Simple Features” that specifies a storage and access model of spatial geometries (point, line, polygon). A feature geometry is called simple when it consists of points connected by straight line pieces, and does not intersect itself. This standard has been adopted widely, not only by spatial databases such as PostGIS, but also more recent standards such as GeoJSON.

If you work with PostGis or GeoJSON you may have come across the WKT (well-known text) format, for example like these:

POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))

sf implements this standard natively in R. Data are structured and conceptualized very differently from the sp approach.

In sf spatial objects are stored as a simple data frame with a special column that contains the information for the geographic coordinates. That special column is a list with the same length as the number of rows in the data frame. Each of the individual list elements then can be of any length needed to hold the coordinates that correspond to an individual feature.

To create a spatial object manually the basic steps would be:

I. Create geometric objects (topology)

Geometric objects (simple features) can be created from a numeric vector, matrix or a list with the coordinates. They are called sfg objects for Simple Feature Geometry.

See here for an example of how a LINESTRING sfg object is created:

lnstr_sfg <- st_linestring(matrix(runif(6), ncol=2))
class(lnstr_sfg)
[1] "XY"         "LINESTRING" "sfg"       
1. Combine all individual single feature objects for the special column.

In order to work our way towards a data frame for all features we create what is called an sfc object with all individual features, which stands for Simple Feature Collection. The sfc object also holds the bounding box and the projection information.

See here for an example of how a sfc object is created:

(lnstr_sfc <- st_sfc(lnstr_sfg)) # just one feature here
Geometry set for 1 feature
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 0.4720064 ymin: 0.01635082 xmax: 0.905913 ymax: 0.2816711
epsg (SRID):    NA
proj4string:    NA
class(lnstr_sfc) 
[1] "sfc_LINESTRING" "sfc"           

We now combine the dataframe with the attributes and the simple feature collection. See here how its done.

(lnstr_sf <- st_sf(dfr , lnstr_sfc))
Simple feature collection with 1 feature and 3 fields
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 0.4720064 ymin: 0.01635082 xmax: 0.905913 ymax: 0.2816711
epsg (SRID):    NA
proj4string:    NA
id  use cars_per_hour                      lnstr_sfc
1  a road            10 LINESTRING(0.47200637566857...
class(lnstr_sf)
[1] "sf"         "data.frame"

There are many methods available in the sf package, to find out use methods(class="sp")

Here are some of the other highlights of sf you might be interested in:

• provides fast I/O, particularly relevant for large files

(I did a quick microbenchmarking: st_read() took 23.1749 milliseconds and readOGR() took 462.1470 milliseconds for the philly shapefile.)

• directly reads from and writes to spatial databases such as PostGIS

• stay tuned for a new ggplot release that will be able to read and plot the sf format without the need of conversion to a data frame, like the sp format

Note that sp and sf are not the only way spatial objects are conceptualized in R. Other spatial packages may use their own class definitions for spatial data (for example spatstat). Usuallly you can find functions that convert sp and increasingly sf objects to and from these formats.

### Exercise 2

Similarly to the example above generate a Point object in R. Use both, the sp and the sf “approach”.

1. Create a matrix pts of random numbers with two columns and as many rows as you like. These are your points.
2. Create a dataframe attrib_df with the same number of rows as your pts matrix and a column that holds an attribute. You can make up any attribute.
3. Use the appropriate commands and pts to create
• a SpatialPointsDataFrame and
• an sf object with a gemoetry column of class sfc_POINT.
1. Try to subset your spatial object using the attribute you have added and the way you are used to from regular data frames.
2. How do you determine the bounding box of your spatial object?

Try before you peek!

pts <- matrix(runif(20), ncol=2) # the matrix with the points
attrib_df <- data.frame(an_attribute = rep(LETTERS[1:5], each = 2)) # attribute table

## sp approach ##
pts_sp <- SpatialPoints(pts) # create sp object
pts_spdf <- SpatialPointsDataFrame(pts_sp, attrib_df) # add attributes
summary(pts_spdf)

## sf approach ##
pts_sfg_list <- lapply(seq_len(nrow(pts)), function(i) st_point(pts[i,])) # a simple feature geometry
pts_sfc <- st_sfc(pts_sfg_list)     # a simple feature collection
pts_sf <- st_sf(pts_sfc, attrib_df) # an sf object
pts_sf

# Some subsetting with sp:
pts_spdf$an_attribute # column with attribute only -- this is a vecor subset(pts_spdf, an_attribute == "A") # subset with attribute A -- this is an SP object # Some subsetting with sf: pts_sf$an_attribute # column with attribute only -- this is a vecor
subset(pts_sf, an_attribute == "A")  # subset with attribute A -- this is an SF object

# bounding box:
bbox(pts_spdf)
st_bbox(pts_sf)

# 2. Creating a spatial object from a lat/lon table

Often in your research might have a spreadsheet that contains latitude, longitude and perhaps some attribute values. You know how to read the spreadsheet into a data frame with read.table or read.csv. We can then very easily convert the table into a spatial object in R.

A SpatialPointsDataFrame object can be created directly from a table by specifying which columns contain the coordinates. This can be done in one step by using the coordinates() function. As mentioned above this function can be used not only to retrieve spatial coordinates but also to set them, which is done in R fashion with:

coordinates(myDataframe) <- value

value can have different forms – in this context needs to be a character vector which specifies the data frame’s columns for the longitude and latitude (x,y) coordinates.

If we use this on a data frame it automatically converts the data frame object into a SpatialPointsDataFrame object.

An sf object can be created from a data frame in a similarly easy way. We take advantage of the st_as_sf() function which converts any foreign object into an sf object. Similarly to above, it requires an argument coords, which in the case of point data needs to be a vector that specifies the data frame’s columns for the longitude and latitude (x,y) coordinates.

my_sf_object <- st_as_sf(myDataframe, coords)

Note that coordinates() replaces the original data frame, while st_as_sf() creates a new object and leaves the original data frame untouched.

### Exercise 3

1. Download and unzip RSpatialDataTypes.zip
2. Use read.csv() to read PhiladelphiaZIPHousing.csv into a dataframe in R and name it ph_df.
3. Use head() to examine the first few lines of the dataframe. What information does it contain?
4. Use class() to examine which object class the table belongs to.
5. Convert the ph_df data frame into an sf object with st_as_sf()
6. Convert the ph_df data frame into a spatial object with using the coordinates function.
7. Use class(ph_df)again to examine which object class the table belongs to now.

Try before you peek!

ph_df <- read.csv("~/Desktop/RSpatialDataTypes/PhiladelphiaZIPHousing.csv")
class(ph_df)

# sp
ph_sf <- st_as_sf(ph_df , coords = c("lon", "lat"))
class(ph_sf)

# sf
coordinates(ph_df) <- c("lon", "lat")
class(ph_df) # !!

### A brief, but important word about projection.

Note that both the SpatialPointsDataFrame and the sf POINTS object you just created do not have a projection defined. It is ok to plot, but be aware that for any meaningful spatial operation you will need to define a projection.

This is how it’s done:

is.projected(ph_df) # see if a projection is defined
proj4string(ph_df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") # this is WGS84
is.projected(ph_df) # voila! hm. wait a minute..

# For the sf object you want to use
st_crs(ph_sf)
st_crs(ph_sf) <- 4326 # we can use EPSG as numeric here
st_crs(ph_sf)

### How to work with rgdal

In order to read spatial data into R and turn them into Spatial* family objects we rely on the rgdal package. It provides us direct access to the powerful GDAL library from within R.

We can read in and write out spatial data using:

readOGR() and writeOGR() (for vector)
readGDAL() and writeGDAL() (for raster/grids)

The parameters provided for each function vary depending on the exact spatial file type you are reading. We will take an ESRI shapefile as an example. A shapefile - as you know - consists of various files of the same name, but with different extensions. They should all be in one directory and that is what R expects.

When reading in a shapefile, readOGR() requires the following two arguments:

datasource name (dsn)  # the path to the folder that contains the files
# this is a path to the folder, not a filename!
layer name (layer)     # the shapefile name WITHOUT extension
# this is not a path but just the name of the file!

Setting these arguments correctly can be cause of much headache for beginners, so let me spell it out:

• Firstly, you obviously need to know the name of shapefile.

• Secondly, you need to know the name and location of the folder that contains all the shapefile parts.

• Lastly, readOGR only reads the file and dumps it on your screen. But similarly to reading in csv tables you want to actually work with the file, so you need to assign it to an R object.

For example:

• I have a shapefile called myShapefile.shp and all its associated files (like .dbf, .prj, .shx, …) in a directory called myShapefileDir in my desktop folder,
• I have my R working directory set to my desktop folder,
• I want to assign the shape file to an R object called myShape.

Then my command to read this shapefile would look like this:

myShape <- readOGR(dsn = "myShapefileDir", layer = "myShapefile")

or in short:

myShape <- readOGR("myShapefileDir", "myShapefile")

Now let’s do this.

#### Exercise 4

1. Load the rgdal package.
2. Determine the location of the folder enclosing the PhillyTotalPopHHinc shapefile.
3. Read PhillyTotalPopHHinc into an object called philly. Make sure you provide the appropriate directory information.
4. Examine the object, for example with summary() or class()
5. Plot it.
6. Take a look at the column names of the attribute data with names()
7. Take a look at the attribute data with head()
8. Select a subset of polygons with a median household income (medHHinc) of over 60000.
9. Add that to the plot. In red.

Try before you peek!

library(rgdal)
# side note: unlike read.csv readOGR does not understand the ~ as valid element of a path. This (on Mac) will not work:
summary(philly)
class(philly)
names(philly)
plot(philly)
philly_rich <- subset(philly, medHHinc > 60000)
plot(philly_rich, add=T, col="red")

GDAL supports over 200 raster formats and vector formats. Use ogrDrivers() and gdalDrivers() (without arguments) to find out which formats your rgdal install can handle.

### How to do this in sf

sf also relies on GDAL, but we don’t need to load a separate R library to read data in. We can use st_read(), which simply takes the path of the directory with the shapefile as argument.

So let’s do the same as above using the sf package.

# read in

# take a look at what we've got
names(philly_sf)
# note the added geometry column, as compared to:
names(philly)

# plot works differently here:
plot(philly_sf)
# to do the same as above we need to directly print the geometry column
st_geometry(philly_sf)        # use this method to retreive geometry
plot(st_geometry(philly_sf))
# subset the familar way
philly_sf_rich <- subset(philly_sf, medHHinc > 60000)
plot(st_geometry(philly_sf_rich), add=T, col="red")

# 4. Raster data

Dealing with raster data and map algebra deserves its own separate workshop, so this is just to acknowledge that you can work with raster data in R as well.

Raster files, as you probably know, have a much more compact data structure than vectors. Because of their regular structure the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster is defined by:

• a CRS
• coordinates of its origin
• a distance or cell size in each direction
• a dimension or numbers of cells in each direction
• an array of cell values

Given this structure, coordinates for any cell can be computed and don’t need to be stored.

In sp the GridTopology class is the key element of raster representations4. It contains

• the center coordinate pair of the south-west raster cell,
• the two cell sizes in the metric of the coordinates, giving the step to successive centres, and
• the numbers of cells for each dimension.

A simple grid can be built like this:

# specify the grid topology with the following parameters:
# - the smallest coordinates for each dimension, here: 0,0
# - cell size in each dimension, here: 1,1
# - number of cells in each dimension, here: 5,5
gtopo <- GridTopology(c(0,0), c(1,1), c(5,5)) # create the grid
datafr <- data.frame(runif(25)) # make up some data
SpGdf <- SpatialGridDataFrame(gtopo, datafr) # create the grid data frame
summary(SpGdf)

A very good alternative is the raster package, which works slightly differently.
The raster package is a major extension of spatial data classes to access large rasters and in particular to process very large files. It includes object classes for RasterLayer, RasterStacks, and RasterBricks, functions for converting among these classes, and operators for computations on the raster data. Conversion from sp type objects into raster type objects is easy.

If we wanted to do the same as above, namely creating the same raster object from scratch we would do the following:

# specify the RasterLayer with the following parameters:
# - minimum x coordinate (left border)
# - minimum y coordinate (bottom border)
# - maximum x coordinate (right border)
# - maximum y coordinate (top border)
# - resolution (cell size) in each dimension
r <- raster(xmn=-0.5, ymn=-0.5, xmx=4.5, ymx=4.5, resolution=c(1,1))
r

So here we have created an object of type RasterLayer, as compared to above, where we created an object of type GridTopology.

Compare this to the output from above and note something important here: Different from the grid object we generated from scratch, this raster object has a CRS defined! If the crs argument is missing when creating the Raster object, the x coordinates are within -360 and 360 and the y coordinates are within -90 and 90, the WGS84 projection is used by default!

Good to know.

To add some values to the cells we could the following. Be aware that different from the GridTopology object above, which we converted to a SpatialGridDataFrame when adding values, this object here remains a RasterLayer.

class(r)
r <- setValues(r, runif(25))
class(r)
plot(r); points(coordinates(r), pch=3)

(See the rasterVis package for more advanced plotting of Raster* objects.)

RasterLayer objects can also be created from a matrix.

class(volcano)
volcano.r <- raster(volcano)
class(volcano.r)

To read in a raster file we can use readGDAL() from the sp package, which requires only the filename of the raster as argument.

The respective function in raster package is called raster().

#### Exercise 5

1. Load the raster library
2. Read in the DEM using the raster() function
3. Examine by typing the name you gave the DEM
4. Extract contour lines and plot them with contour()

Try before you peek!

library(raster)
dem.r <- raster("~/Desktop/RSpatialDataTypes/DEM_10m/bushkill_pa.dem")
dem.r
contour(dem.r)

There are currently over 170 R packages on CRAN for reading, visualising, and analysing (geographical) spatial data. I highly recommend taking a look at that page if you are exploring spatial analysis with R.

1. R Bivand (2011) Introduction to representing spatial objects in R

2. Coordinates should be of type double and will be promoted if not.

3. E. Pebesma & R. Bivand (2016)Spatial data in R: simple features and future perspectives

4. There is also a SpatialPixels object which stores grid topology and coordinates of the actual points.