matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 807852 43.2 1315788 70.3 1315788 70.3
Vcells 1498901 11.5 8388608 64.0 3562027 27.2
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 5.9754 | 6.43200 | 6.878917 | 6.5684 | 6.68900 | 13.1917 |
2 | hist | 11.5389 | 12.30655 | 12.916438 | 12.4926 | 12.62675 | 19.1192 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.931067 | 1.913332 | 1.877685 | 1.901924 | 1.887689 | 1.449336 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 969795 51.8 1954774 104.4 1315788 70.3
Vcells 1824343 14.0 8388608 64.0 8386590 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.2297 | 1.56915 | 2.051464 | 1.65505 | 1.72965 | 7.8569 |
2 | hist | 3.7337 | 4.12480 | 4.656045 | 4.51275 | 4.63205 | 13.4317 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.036269 | 2.628684 | 2.269621 | 2.726655 | 2.678027 | 1.709542 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 969930 51.8 1954774 104.4 1954774 104.4
Vcells 1874747 14.4 8388608 64.0 8386890 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 10.5039 | 11.7464 | 12.25321 | 12.11045 | 12.20440 | 19.0747 |
2 | hist | 11.6019 | 13.5030 | 14.08809 | 13.69295 | 13.91275 | 20.3897 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.104533 | 1.149544 | 1.149747 | 1.130672 | 1.139978 | 1.068939 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 969996 51.9 1954774 104.4 1954774 104.4
Vcells 1875303 14.4 8388608 64.0 8388323 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.1680 | 2.76695 | 3.048870 | 2.86545 | 3.03605 | 8.1864 |
2 | hist | 4.3223 | 4.88730 | 5.577062 | 5.15950 | 5.44775 | 13.3829 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 |
2 | hist | 3.700599 | 1.766313 | 1.829223 | 1.80059 | 1.794355 | 1.634772 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R Under development (unstable) (2023-11-06 r85483 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.1.0 ggplot2_3.4.4
[4] knitr_1.45 R.devices_2.17.1 R.utils_2.12.2
[7] R.oo_1.25.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.4 cli_3.6.1 rlang_1.1.2 xfun_0.41
[5] labeling_0.4.3 glue_1.6.2 colorspace_2.1-0 markdown_1.11
[9] scales_1.2.1 fansi_1.0.5 R.cache_0.16.0 grid_4.4.0
[13] munsell_0.5.0 tibble_3.2.1 R.rsp_0.45.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1
[21] digest_0.6.33 R6_2.5.1 utf8_1.2.4 pillar_1.9.0
[25] magrittr_2.0.3 withr_2.5.2 tools_4.4.0 gtable_0.3.4
Total processing time was 8.67 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2023-11-07 04:52:01 (+0100 UTC). Powered by RSP.