Quantcast
Channel: Append an object to a list in R in amortized constant time, O(1)? - Stack Overflow
Viewing all articles
Browse latest Browse all 18

Answer by JanKanis for Append an object to a list in R in amortized constant time, O(1)?

$
0
0

In the other answers, only the list approach results in O(1) appends, but it results in a deeply nested list structure, and not a plain single list. I have used the below datastructures, they supports O(1) (amortized) appends, and allow the result to be converted back to a plain list.

expandingList <- function(capacity = 10) {    buffer <- vector('list', capacity)    length <- 0    methods <- list()    methods$double.size <- function() {        buffer <<- c(buffer, vector('list', capacity))        capacity <<- capacity * 2    }    methods$add <- function(val) {        if(length == capacity) {            methods$double.size()        }        length <<- length + 1        buffer[[length]] <<- val    }    methods$as.list <- function() {        b <- buffer[0:length]        return(b)    }    methods}

and

linkedList <- function() {    head <- list(0)    length <- 0    methods <- list()    methods$add <- function(val) {        length <<- length + 1        head <<- list(head, val)    }    methods$as.list <- function() {        b <- vector('list', length)        h <- head        for(i in length:1) {            b[[i]] <- head[[2]]            head <- head[[1]]        }        return(b)    }    methods}

Use them as follows:

> l <- expandingList()> l$add("hello")> l$add("world")> l$add(101)> l$as.list()[[1]][1] "hello"[[2]][1] "world"[[3]][1] 101

These solutions could be expanded into full objects that support al list-related operations by themselves, but that will remain as an exercise for the reader.

Another variant for a named list:

namedExpandingList <- function(capacity = 10) {    buffer <- vector('list', capacity)    names <- character(capacity)    length <- 0    methods <- list()    methods$double.size <- function() {        buffer <<- c(buffer, vector('list', capacity))        names <<- c(names, character(capacity))        capacity <<- capacity * 2    }    methods$add <- function(name, val) {        if(length == capacity) {            methods$double.size()        }        length <<- length + 1        buffer[[length]] <<- val        names[length] <<- name    }    methods$as.list <- function() {        b <- buffer[0:length]        names(b) <- names[0:length]        return(b)    }    methods}

Benchmarks

Performance comparison using @phonetagger's code (which is based on @Cron Arconis' code). I have also added a better_env_as_container and changed the env_as_container_ a bit. The original env_as_container_ was broken and doesn't actually store all the numbers.

library(microbenchmark)lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(lab)]] <- obj}### Store list inside new environmentenvAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} env2list <- function(env, len) {    l <- vector('list', len)    for (i in 1:len) {        l[[i]] <- env[[as.character(i)]]    }    l}envl2list <- function(env, len) {    l <- vector('list', len)    for (i in 1:len) {        l[[i]] <- env[[paste(as.character(i), 'L', sep='')]]    }    l}runBenchmark <- function(n) {    microbenchmark(times = 5,          env_with_list_ = {            listptr <- new.env(parent=globalenv())            listptr$list <- NULL            for(i in 1:n) {envAppendList(listptr, i)}            listptr$list        },        c_ = {            a <- list(0)            for(i in 1:n) {a = c(a, list(i))}        },        list_ = {            a <- list(0)            for(i in 1:n) {a <- list(a, list(i))}        },        by_index = {            a <- list(0)            for(i in 1:n) {a[length(a) + 1] <- i}            a        },        append_ = {             a <- list(0)                for(i in 1:n) {a <- append(a, i)}             a        },        env_as_container_ = {            listptr <- new.env(hash=TRUE, parent=globalenv())            for(i in 1:n) {lPtrAppend(listptr, i, i)}             envl2list(listptr, n)        },        better_env_as_container = {            env <- new.env(hash=TRUE, parent=globalenv())            for(i in 1:n) env[[as.character(i)]] <- i            env2list(env, n)        },        linkedList = {            a <- linkedList()            for(i in 1:n) { a$add(i) }            a$as.list()        },        inlineLinkedList = {            a <- list()            for(i in 1:n) { a <- list(a, i) }            b <- vector('list', n)            head <- a            for(i in n:1) {                b[[i]] <- head[[2]]                head <- head[[1]]            }                        },        expandingList = {            a <- expandingList()            for(i in 1:n) { a$add(i) }            a$as.list()        },        inlineExpandingList = {            l <- vector('list', 10)            cap <- 10            len <- 0            for(i in 1:n) {                if(len == cap) {                    l <- c(l, vector('list', cap))                    cap <- cap*2                }                len <- len + 1                l[[len]] <- i            }            l[1:len]        }    )}# We need to repeatedly add an element to a list. With normal list concatenation# or element setting this would lead to a large number of memory copies and a# quadratic runtime. To prevent that, this function implements a bare bones# expanding array, in which list appends are (amortized) constant time.    expandingList <- function(capacity = 10) {        buffer <- vector('list', capacity)        length <- 0        methods <- list()        methods$double.size <- function() {            buffer <<- c(buffer, vector('list', capacity))            capacity <<- capacity * 2        }        methods$add <- function(val) {            if(length == capacity) {                methods$double.size()            }            length <<- length + 1            buffer[[length]] <<- val        }        methods$as.list <- function() {            b <- buffer[0:length]            return(b)        }        methods    }    linkedList <- function() {        head <- list(0)        length <- 0        methods <- list()        methods$add <- function(val) {            length <<- length + 1            head <<- list(head, val)        }        methods$as.list <- function() {            b <- vector('list', length)            h <- head            for(i in length:1) {                b[[i]] <- head[[2]]                head <- head[[1]]            }            return(b)        }        methods    }# We need to repeatedly add an element to a list. With normal list concatenation# or element setting this would lead to a large number of memory copies and a# quadratic runtime. To prevent that, this function implements a bare bones# expanding array, in which list appends are (amortized) constant time.    namedExpandingList <- function(capacity = 10) {        buffer <- vector('list', capacity)        names <- character(capacity)        length <- 0        methods <- list()        methods$double.size <- function() {            buffer <<- c(buffer, vector('list', capacity))            names <<- c(names, character(capacity))            capacity <<- capacity * 2        }        methods$add <- function(name, val) {            if(length == capacity) {                methods$double.size()            }            length <<- length + 1            buffer[[length]] <<- val            names[length] <<- name        }        methods$as.list <- function() {            b <- buffer[0:length]            names(b) <- names[0:length]            return(b)        }        methods    }

result:

> runBenchmark(1000)Unit: microseconds                    expr       min        lq      mean    median        uq       max neval          env_with_list_  3128.291  3161.675  4466.726  3361.837  3362.885  9318.943     5                      c_  3308.130  3465.830  6687.985  8578.913  8627.802  9459.252     5                   list_   329.508   343.615   389.724   370.504   449.494   455.499     5                by_index  3076.679  3256.588  5480.571  3395.919  8209.738  9463.931     5                 append_  4292.321  4562.184  7911.882 10156.957 10202.773 10345.177     5       env_as_container_ 24471.511 24795.849 25541.103 25486.362 26440.591 26511.200     5 better_env_as_container  7671.338  7986.597  8118.163  8153.726  8335.659  8443.493     5              linkedList  1700.754  1755.439  1829.442  1804.746  1898.752  1987.518     5        inlineLinkedList  1109.764  1115.352  1163.751  1115.631  1206.843  1271.166     5           expandingList  1422.440  1439.970  1486.288  1519.728  1524.268  1525.036     5     inlineExpandingList   942.916   973.366  1002.461  1012.197  1017.784  1066.044     5> runBenchmark(10000)Unit: milliseconds                    expr        min         lq       mean     median         uq        max neval          env_with_list_ 357.760419 360.277117 433.810432 411.144799 479.090688 560.779139     5                      c_ 685.477809 734.055635 761.689936 745.957553 778.330873 864.627811     5                   list_   3.257356   3.454166   3.505653   3.524216   3.551454   3.741071     5                by_index 445.977967 454.321797 515.453906 483.313516 560.374763 633.281485     5                 append_ 610.777866 629.547539 681.145751 640.936898 760.570326 763.896124     5       env_as_container_ 281.025606 290.028380 303.885130 308.594676 314.972570 324.804419     5 better_env_as_container  83.944855  86.927458  90.098644  91.335853  92.459026  95.826030     5              linkedList  19.612576  24.032285  24.229808  25.461429  25.819151  26.223597     5        inlineLinkedList  11.126970  11.768524  12.216284  12.063529  12.392199  13.730200     5           expandingList  14.735483  15.854536  15.764204  16.073485  16.075789  16.081726     5     inlineExpandingList  10.618393  11.179351  13.275107  12.391780  14.747914  17.438096     5> runBenchmark(20000)Unit: milliseconds                    expr         min          lq       mean      median          uq         max neval          env_with_list_ 1723.899913 1915.003237 1921.23955 1938.734718 1951.649113 2076.910767     5                      c_ 2759.769353 2768.992334 2810.40023 2820.129738 2832.350269 2870.759474     5                   list_    6.112919    6.399964    6.63974    6.453252    6.910916    7.321647     5                by_index 2163.585192 2194.892470 2292.61011 2209.889015 2436.620081 2458.063801     5                 append_ 2832.504964 2872.559609 2983.17666 2992.634568 3004.625953 3213.558197     5       env_as_container_  573.386166  588.448990  602.48829  597.645221  610.048314  642.912752     5 better_env_as_container  154.180531  175.254307  180.26689  177.027204  188.642219  206.230191     5              linkedList   38.401105   47.514506   46.61419   47.525192   48.677209   50.952958     5        inlineLinkedList   25.172429   26.326681   32.33312   34.403442   34.469930   41.293126     5           expandingList   30.776072   30.970438   34.45491   31.752790   38.062728   40.712542     5     inlineExpandingList   21.309278   22.709159   24.64656   24.290694   25.764816   29.158849     5

I have added linkedList and expandingList and an inlined version of both. The inlinedLinkedList is basically a copy of list_, but it also converts the nested structure back into a plain list. Beyond that the difference between the inlined and non-inlined versions is due to the overhead of the function calls.

All variants of expandingList and linkedList show O(1) append performance, with the benchmark time scaling linearly with the number of items appended. linkedList is slower than expandingList, and the function call overhead is also visible. So if you really need all the speed you can get (and want to stick to R code), use an inlined version of expandingList.

I've also had a look at the C implementation of R, and both approaches should be O(1) append for any size up until you run out of memory.

I have also changed env_as_container_, the original version would store every item under index "i", overwriting the previously appended item. The better_env_as_container I have added is very similar to env_as_container_ but without the deparse stuff. Both exhibit O(1) performance, but they have an overhead that is quite a bit larger than the linked/expanding lists.

Memory overhead

In the C R implementation there is an overhead of 4 words and 2 ints per allocated object. The linkedList approach allocates one list of length two per append, for a total of (4*8+4+4+2*8=) 56 bytes per appended item on 64-bit computers (excluding memory allocation overhead, so probably closer to 64 bytes). The expandingList approach uses one word per appended item, plus a copy when doubling the vector length, so a total memory usage of up to 16 bytes per item. Since the memory is all in one or two objects the per-object overhead is insignificant. I haven't looked deeply into the env memory usage, but I think it will be closer to linkedList.


Viewing all articles
Browse latest Browse all 18

Trending Articles