Integers in R are stored in 32-bits. Out of the box, the largest integer R can handle is 2147483647. Sometimes this isn’t enough space. I’ve run into this issue a few times when dealing with large ID numbers. Here’s an example ID from Twitter 912411948163137536, let’s see if it’s larger than R can handle:

912411948163137536 > .Machine$integer.max
## [1] TRUE

If we try to store this number as an integer, R warns us that it does not fit within the 32-bit integer range and it returns NA as this integer value is not available.

as.integer(912411948163137536)
## Warning: NAs introduced by coercion to integer range
## [1] NA

Reading

Like R, readr::read_csv cannot parse integers larger than 2147483647:

library(readr)

parse_integer(s)

## [1] NA
## attr(,"problems")
## # A tibble: 1 x 4
##     row   col   expected      actual
##   <int> <int>      <chr>       <chr>
## 1     1    NA an integer 21474836470

Following the advice of Win-Vector, we can read large integers as character vectors.

# Write a test file
command <- paste0("echo 'big\n", s, "' > in.csv")
system(command)

# Read the integer data as character
my_data <- read_csv(
    "in.csv",
    col_types = cols(
        big = col_character()
    )
)

# Show we got the right value
my_data$big[1] == s

## [1] TRUE

Writing

The bit64 library facilitates working with 64-bit integers in R.

library(bit64)

# We can store `s` as a 64-bit integer
i <- as.integer64(s)
as.character(i) == s

## [1] TRUE

It looks like readr::write_csv can handle integer64 objects:

# Write out an integer64 object
my_data$big <- as.integer64(my_data$big)
write_csv(my_data, "out.csv")

# Make sure it was written properly
text <- system("cat out.csv", intern = TRUE)
text[2] == s

## [1] TRUE

Answer

Thanks to Hadley Wickham for a pragmatic answer: “use doubles”

There is an important caveat associated with using doubles. Win-Vector notes:

IEEE 754 doubles define a 53 bit mantissa (separate from the sign and exponent), so with a proper floating point implementation we expect a double can faithfully represent an integer range of -2^53 through 2^53. But only as long as you don’t accidentally convert to or round-trip through a string/character type.

Here is an example where using doubles can get one into trouble:

> library(bit64)
> library(readr)

> # Set up CSV
> a <- as.integer64(2) ^ 53
> b <- a + 1
> text <- paste0("x\n", a, "\n", b, "\n")
> cat(text)
x
9007199254740992
9007199254740993

> # Read CSV
> data <- read_csv(text, col_types = cols(x = col_double()))

> # Show that b was not read properly
> as.integer64(data$x)
integer64
[1] 9007199254740992 9007199254740992

My plan moving forward is to use doubles and check that abs(x) <= as.integer64(2) ^ 53.

Conclusion

The safest approach seems to be reading and writing string representations when integers cannot be stored in 32-bits.

readr::write_csv appears capable of writing integer64 objects, can we be certain this will always work?

References