Quantcast
Channel: Append an object to a list in R in amortized constant time, O(1)? - Stack Overflow
Viewing all articles
Browse latest Browse all 18

Answer by phonetagger for Append an object to a list in R in amortized constant time, O(1)?

$
0
0

The OP (in the April 2012 updated revision of the question) is interested in knowing if there's a way to add to a list in amortized constant time, such as can be done, for example, with a C++vector<> container. The best answer(s?) here so far only show the relative execution times for various solutions given a fixed-size problem, but do not address any of the various solutions'algorithmic efficiency directly. Comments below many of the answers discuss the algorithmic efficiency of some of the solutions, but in every case to date (as of April 2015) they come to the wrong conclusion.

Algorithmic efficiency captures the growth characteristics, either in time (execution time) or space (amount of memory consumed) as a problem size grows. Running a performance test for various solutions given a fixed-size problem does not address the various solutions' growth rate. The OP is interested in knowing if there is a way to append objects to an R list in "amortized constant time". What does that mean? To explain, first let me describe "constant time":

  • Constant or O(1) growth:

    If the time required to perform a given task remains the same as the size of the problem doubles, then we say the algorithm exhibits constant time growth, or stated in "Big O" notation, exhibits O(1) time growth. When the OP says "amortized" constant time, he simply means "in the long run"... i.e., if performing a single operation occasionally takes much longer than normal (e.g. if a preallocated buffer is exhausted and occasionally requires resizing to a larger buffer size), as long as the long-term average performance is constant time, we'll still call it O(1).

    For comparison, I will also describe "linear time" and "quadratic time":

  • Linear or O(n) growth:

    If the time required to perform a given task doubles as the size of the problem doubles, then we say the algorithm exhibits linear time, or O(n) growth.

  • Quadratic or O(n2) growth:

    If the time required to perform a given task increases by the square of the problem size, them we say the algorithm exhibits quadratic time, or O(n2) growth.

There are many other efficiency classes of algorithms; I defer to the Wikipedia article for further discussion.

I thank @CronAcronis for his answer, as I am new to R and it was nice to have a fully-constructed block of code for doing a performance analysis of the various solutions presented on this page. I am borrowing his code for my analysis, which I duplicate (wrapped in a function) below:

library(microbenchmark)### Using environment as a containerlPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}### Store list inside new environmentenvAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} runBenchmark <- function(n) {    microbenchmark(times = 5,          env_with_list_ = {            listptr <- new.env(parent=globalenv())            listptr$list <- NULL            for(i in 1:n) {envAppendList(listptr, i)}            listptr$list        },        c_ = {            a <- list(0)            for(i in 1:n) {a = c(a, list(i))}        },        list_ = {            a <- list(0)            for(i in 1:n) {a <- list(a, list(i))}        },        by_index = {            a <- list(0)            for(i in 1:n) {a[length(a) + 1] <- i}            a        },        append_ = {             a <- list(0)                for(i in 1:n) {a <- append(a, i)}             a        },        env_as_container_ = {            listptr <- new.env(parent=globalenv())            for(i in 1:n) {lPtrAppend(listptr, i, i)}             listptr        }       )}

The results posted by @CronAcronis definitely seem to suggest that the a <- list(a, list(i)) method is fastest, at least for a problem size of 10000, but the results for a single problem size do not address the growth of the solution. For that, we need to run a minimum of two profiling tests, with differing problem sizes:

> runBenchmark(2e+3)Unit: microseconds              expr       min        lq      mean    median       uq       max neval    env_with_list_  8712.146  9138.250 10185.533 10257.678 10761.33 12058.264     5                c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738     5             list_   854.110   913.407  1064.463   914.167  1301.50  1339.132     5          by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363     5           append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560     5 env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502     5> runBenchmark(2e+4)Unit: milliseconds              expr         min         lq        mean    median          uq         max neval    env_with_list_  534.955014  550.57150  550.329366  553.5288  553.955246  558.636313     5                c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706     5             list_    8.746356    8.79615    9.162577    8.8315    9.601226    9.837655     5          by_index  953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200     5           append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874     5 env_as_container_  204.134468  205.35348  208.011525  206.4490  208.279580  215.841129     5> 

First of all, a word about the min/lq/mean/median/uq/max values: Since we are performing the exact same task for each of 5 runs, in an ideal world, we could expect that it would take exactly the same amount of time for each run. But the first run is normally biased toward longer times due to the fact that the code we are testing is not yet loaded into the CPU's cache. Following the first run, we would expect the times to be fairly consistent, but occasionally our code may be evicted from the cache due to timer tick interrupts or other hardware interrupts that are unrelated to the code we are testing. By testing the code snippets 5 times, we are allowing the code to be loaded into the cache during the first run and then giving each snippet 4 chances to run to completion without interference from outside events. For this reason, and because we are really running the exact same code under the exact same input conditions each time, we will consider only the 'min' times to be sufficient for the best comparison between the various code options.

Note that I chose to first run with a problem size of 2000 and then 20000, so my problem size increased by a factor of 10 from the first run to the second.

Performance of the list solution: O(1) (constant time)

Let's first look at the growth of the list solution, since we can tell right away that it's the fastest solution in both profiling runs: In the first run, it took 854 microseconds (0.854 milliseconds) to perform 2000 "append" tasks. In the second run, it took 8.746 milliseconds to perform 20000 "append" tasks. A naïve observer would say, "Ah, the list solution exhibits O(n) growth, since as the problem size grew by a factor of ten, so did the time required to execute the test." The problem with that analysis is that what the OP wants is the growth rate of a single object insertion, not the growth rate of the overall problem. Knowing that, it's clear then that the list solution provides exactly what the OP wants: a method of appending objects to a list in O(1) time.

Performance of the other solutions

None of the other solutions come even close to the speed of the list solution, but it is informative to examine them anyway:

Most of the other solutions appear to be O(n) in performance. For example, the by_index solution, a very popular solution based on the frequency with which I find it in other SO posts, took 11.6 milliseconds to append 2000 objects, and 953 milliseconds to append ten times that many objects. The overall problem's time grew by a factor of 100, so a naïve observer might say "Ah, the by_index solution exhibits O(n2) growth, since as the problem size grew by a factor of ten, the time required to execute the test grew by a factor of 100." As before, this analysis is flawed, since the OP is interested in the growth of a single object insertion. If we divide the overall time growth by the problem's size growth, we find that the time growth of appending objects increased by a factor of only 10, not a factor of 100, which matches the growth of the problem size, so the by_index solution is O(n). There are no solutions listed which exhibit O(n2) growth for appending a single object.


Viewing all articles
Browse latest Browse all 18

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>