Skip to content

Broken behaviour after [[<-; segfaults on CRAN version #6734

@mb706

Description

@mb706

Adding or changing a column using [[<- seems to bring a data.table into a broken state. set() fails afterwards, and printing the data.table changes its truelength().

The following is with GitHub HEAD. Some versions seem to segfault here, so don't run this if you want to keep your R session:

library("data.table")
#> data.table 1.16.99 IN DEVELOPMENT built 2025-01-18 17:55:52 UTC; user using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.
truelength(dt)
#> [1] 1026
dt
#>        a     b
#>    <num> <num>
#> 1:     1     2
truelength(dt)  # this changed after printing for some reason
#> [1] 2
> set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   Internal error in assign: input dt has not been allocated enough column slots. l=2, tl=2, adding 1. Please report to the data.table issues tracker.

Using using a column name instead of an index in [[ also fails:

dt <- data.table(a = 1, b = 2)
dt[["b"]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.

Adding a new column, instead of changing an existing one, gives a different error message:

dt <- data.table(a = 1, b = 2)
dt[[3]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.
dt
#>        a     b    V3
#>    <num> <num> <num>
#> 1:     1     2     2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   Internal error in assign: input dt has not been allocated enough column slots. l=3, tl=3, adding 1. Please report to the data.table issues tracker.

When using CRAN data.table, instead of the current GitHub version, the first two examples cause segfaults:

library("data.table")
#> data.table 1.16.4 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#> 
#>  *** caught segfault ***
#> address 0x140000014, cause 'memory not mapped'
#> 
#> Traceback:
#>  1: set(dt, , "c", 1)
#> 
#> Possible actions:
#> 1: abort (with core dump, if enabled)
#> 2: normal R exit
#> 3: exit R without saving workspace
#> 4: exit R saving workspace

Maybe the operation corrupts memory in some way and that the new GitHub version just happens to not segfault because memory is laid out differently for some unrelated reason? Maybe there are two underlying bugs, one of which is already fixed on GitHub.

Using $ does not seem to cause this issues:

dt <- data.table(a = 1, b = 2)
dt$b <- 2
set(dt, , "c", 1)
dt
#>        a     b     c
#>    <num> <num> <num>
#> 1:     1     2     1

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.0

(or GitHub data.table version in the first example)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions