-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Adding or changing a column using [[<- seems to bring a data.table into a broken state. set() fails afterwards, and printing the data.table changes its truelength().
The following is with GitHub HEAD. Some versions seem to segfault here, so don't run this if you want to keep your R session:
library("data.table")
#> data.table 1.16.99 IN DEVELOPMENT built 2025-01-18 17:55:52 UTC; user using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) :
#> It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.
truelength(dt)
#> [1] 1026
dt
#> a b
#> <num> <num>
#> 1: 1 2
truelength(dt) # this changed after printing for some reason
#> [1] 2
> set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) :
#> Internal error in assign: input dt has not been allocated enough column slots. l=2, tl=2, adding 1. Please report to the data.table issues tracker.Using using a column name instead of an index in [[ also fails:
dt <- data.table(a = 1, b = 2)
dt[["b"]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) :
#> It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.Adding a new column, instead of changing an existing one, gives a different error message:
dt <- data.table(a = 1, b = 2)
dt[[3]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) :
#> This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.
dt
#> a b V3
#> <num> <num> <num>
#> 1: 1 2 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) :
#> Internal error in assign: input dt has not been allocated enough column slots. l=3, tl=3, adding 1. Please report to the data.table issues tracker.When using CRAN data.table, instead of the current GitHub version, the first two examples cause segfaults:
library("data.table")
#> data.table 1.16.4 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#>
#> *** caught segfault ***
#> address 0x140000014, cause 'memory not mapped'
#>
#> Traceback:
#> 1: set(dt, , "c", 1)
#>
#> Possible actions:
#> 1: abort (with core dump, if enabled)
#> 2: normal R exit
#> 3: exit R without saving workspace
#> 4: exit R saving workspaceMaybe the operation corrupts memory in some way and that the new GitHub version just happens to not segfault because memory is laid out differently for some unrelated reason? Maybe there are two underlying bugs, one of which is already fixed on GitHub.
Using $ does not seem to cause this issues:
dt <- data.table(a = 1, b = 2)
dt$b <- 2
set(dt, , "c", 1)
dt
#> a b c
#> <num> <num> <num>
#> 1: 1 2 1sessionInfo():
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.0
(or GitHub data.table version in the first example)