Find and Replace in R, Part 1: recode in the library car

Background

Recoding a variable in R has never seemed like a straightforward task to me. In the next few entries, I will present the do’s and dont’s of different find/replace (recoding) solutions in R. Today: car::recode.

1. Recoding vectors:

Load the library car, and create a vector of random integers from 1 to 4:

library(car)
x <- sample(1:4, size = 10, replace = T)
x

##  [1] 4 2 3 1 1 1 4 1 1 3

Simply, the arguments of recode are:

  • The vector we wish to change
  • A list of the recodes, enclosed in single or double quotes, and separated by semi-colons.

Let’s start by recoding 1 2 3 4 in x to 2 5 8 10:

x1 <- recode(x, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
x1

##  [1] 10  5  8  2  2  2 10  2  2  8

The same is true to recode 1 2 3 4 to A B C D:

x2 <- recode(x, "1 = "A" ; 2 = "B" ; 3 = "C" ; 4 = "D"")
x2

##  [1] "D" "B" "C" "A" "A" "A" "D" "A" "A" "C"

To recode 1 2 3 4 to 0 0 5 5, we can use several different approaches to achieve the same result:

x3.1 <- recode(x, '1 = 0 ; 2 = 0 ; 3 = 5 ; 4 = 5')     # specify individual values
x3.2 <- recode(x, 'c(1, 2) = 0 ; c(3, 4) = 5')         # vectors of values
x3.3 <- recode(x, 'lo : 2 = 0 ; 3 : hi = 5')           # lo and hi
x3.4 <- recode(x, '1 : 2 = 0 ; else = 5')              # specify some values and else for remaining values

Where all solutions give:

##  [1] 5 0 5 0 0 0 5 0 0 5

If we only need to change some of the values, only those values need to be specified. For example, to recode 1 2 3 4 to 1 2 3 5:

x4 <- recode(x, "4=5")
x4

##  [1] 5 2 3 1 1 1 5 1 1 3

We can do the same with character vectors. Let’s create one with some fish species:

y <- sample(c("Perch", "Goby", "Trout", "Salmon"), size = 10, replace = T)
y

##  [1] "Goby"   "Salmon" "Trout"  "Perch"  "Salmon" "Perch"  "Trout"
##  [8] "Goby"   "Perch"  "Salmon"

We wish to recode each fish into its biological order, i.e. Perch, Goby, Trout, Salmon will become Perciform, Perciform, Salmonid, Salmonid:

y1 <- recode(y, 'c("Perch", "Goby") = "Perciform" ; c("Trout", "Salmon") = "Salmonid"')
y1

##  [1] "Perciform" "Salmonid"  "Salmonid"  "Perciform" "Salmonid"
##  [6] "Perciform" "Salmonid"  "Perciform" "Perciform" "Salmonid"

2. Recoding Data Frames

Let’s create a simple dataframe, z:

z <- data.frame(Letter = sample(factor(letters[1:4]), size = 10, replace = T),
                Count1 = sample(c(1:4), size = 10, replace = T),
                Count2 = sample(c(1:4), size = 10, replace = T),
                Count3 = sample(c(1:4), size = 10, replace = T))
z

##    Letter Count1 Count2 Count3
## 1       a      1      2      3
## 2       a      2      1      3
## 3       a      3      4      4
## 4       c      3      3      2
## 5       b      1      2      4
## 6       a      4      3      3
## 7       d      3      4      3
## 8       b      3      2      4
## 9       a      4      3      4
## 10      b      3      3      2

In this data frame, we wish to recode 1 2 3 4 values to 2 5 8 10. Let’s do this by recoding the data frame to create a new object, z1:

z1 <- recode(z, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")

## Error: (list) object cannot be coerced to type 'double'

z1

## Error: object 'z1' not found

No, this didn’t work. It seems that recode only works with vectors.

We could try a for loop across all columns:

z1 <- z
for (i in 1:ncol(z)) z1[, i] <- recode(z1[, i], "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")

gives:

z1

##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5

By far, the best workaround is to use lapply

z1 <- lapply(z, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10"))
z1

## $Letter
##  [1] a a a c b a d b a b
## Levels: a b c d
##
## $Count1
##  [1]  2  5  8  8  2 10  8  8 10  8
##
## $Count2
##  [1]  5  2 10  8  5  8 10  5  8  8
##
## $Count3
##  [1]  8  8 10  5 10  8  8 10 10  5
##

But now we have lost our data frame! Let’s fix this with data.frame():

z1 <- data.frame(z1)
z1

##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5

Conclusions

car::recode is a simple and powerful find/replace solution. However, it can be frustrating to write out recode lists when there are many values that need to be changed. In the next entry, I’ll present some alternative recoding solutions using gsub, indexing and others.

Creating a large scale map using ggplot2: a step by step guide.

* UPDATED 2nd SEPTEMBER 2013 *

Whenever I have needed a map in a presentation, I have always slipped in a cheeky screenshot from Google and prayed that the copyright police won’t burst into the door and arrest me (talks are stressful enough). However, I was determined to change this, and after extensive trawling of the web, I’ve put together a quick template to make a simple large scale map for talks, presentations and posters using GIS shape files and ggplot2.

Here is the finished object (full code at the end of this page):

Europe Map

I started by downloading this excellent world map template from thematicmapping, and loading it into R using the maptools package:

library(maptools) 
gpclibPermit() 
## [1] TRUE 
worldmap <- readShapeSpatial("world_borders.shp") 

*** NB: You will require the library gpclib to read in the shapefile – if gpclibPermit() returns FALSE, then try installing gpclib again. As of Sep 2013, gpclib was not working in R v3.0.1. Argh. ***

The fortify function in ggplot2 will convert this to a data frame:

library(ggplot2)
worldmap <- fortify(worldmap) 
## Using CAT to define regions. 

Let’s have a quick look:

head(worldmap) 
## long lat order hole piece group id 
## 1 -69.88 12.41 1 FALSE 1 1.1 1 
## 2 -69.95 12.44 2 FALSE 1 1.1 1 
## 3 -70.06 12.53 3 FALSE 1 1.1 1 
## 4 -70.06 12.54 4 FALSE 1 1.1 1 
## 5 -70.06 12.54 5 FALSE 1 1.1 1 
## 6 -70.06 12.62 6 FALSE 1 1.1 1 

Essentially, this data frame is a list of paths/polygons (each one is part of a unique ‘group’) that we can plot using geom_path and geom_polygon. There are more than 400,000 points in this map which makes plotting quite slow, so I created a smaller data frame, worldmap2, to create rough maps for initial tweaking and formatting.

worldmap2 <- worldmap[seq(1, nrow(worldmap), 200), ] 

A quick plot using geom_polygon, creating groups based on the variable ‘group’:

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon() 

Grouped Map

The same with geom_path:

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_path() 

Path map

And using both (Put geom_path second in order to plot the borders):

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon() + 
  geom_path() 

Group and Path Map

Changing the colour of the polygons and path is straightforward:

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "darkseagreen") + 
  geom_path(colour = "grey40") 

Coloured map and path

Creating a blue blackground required delving further into the mysterious world of theme(). I also changed the colour of the plot border and major grid lines to reduce the contrast slightly:

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "darkseagreen") + 
  geom_path(colour = "grey40") + 
  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"),
       panel.grid.major = element_line(colour = "grey90")) 

unnamed-chunk-11

I wanted a map of Northern Europe, so I defined my latitude and longitude limits, and used coord_cartesian to zoom in onto the graph. With this mapfile, you can change latlimits and loglimits to your specific region of interest:

latlimits <- c(40, 75) 
longlimits <- c(-25, 50) 

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "darkseagreen") + 
  geom_path(colour = "grey40") +
  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"), 
       panel.grid.major = element_line(colour = "grey90")) +
  coord_cartesian(xlim = longlimits, ylim = latlimits) 

unnamed-chunk-12

The scales are still more suited to the full graph, so now we can define our scale to be more relevant.

ggplot(worldmap2, aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "darkseagreen") + 
  geom_path(colour = "grey40") +
  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"), 
       panel.grid.major = element_line(colour = "grey90")) + 
  scale_y_continuous(breaks = seq(40, 70, 10)) + 
  scale_x_continuous(breaks = seq(-20, 40, 20)) +
  coord_cartesian(xlim = longlimits, ylim = latlimits) 

unnamed-chunk-13

Now time to customise the graph a little more: removing the axis titles, relabelling the axes, removing the axis ticks, and making the text larger. Then we can use the original worldmap data frame to produce the final product:

# Final code:

library(maptools)
require(gpclib)
gpclibPermit()
worldmap <- readShapeSpatial("world_borders.shp")

library(ggplot2)
worldmap <- fortify(worldmap)

latlimits <- c(40, 75) 
longlimits <- c(-25, 50) 

ggplot(worldmap, aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "darkseagreen") + 
  geom_path(colour = "grey40") + 
  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"), 
       panel.grid.major = element_line(colour = "grey90"),
       panel.grid.minor = element_blank(),
       axis.ticks = element_blank(), 
       axis.text.x = element_text (size = 14, vjust = 0), 
       axis.text.y = element_text (size = 14, hjust = 1.3)) + 
  coord_cartesian(xlim = longlimits, ylim = latlimits) + 
  scale_y_continuous(breaks=seq(40,70,10), labels = c("40ºN", "50ºN", "60ºN", "70ºN")) + 
  scale_x_continuous(breaks=seq(-20,40,20), labels = c("20ºW", "0º" ,"20ºE", "40ºE")) + 
  labs(y="",x="") 

Europe Map