matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 833705 44.6 1621234 86.6 1206634 64.5
Vcells 1549062 11.9 8388608 64.0 3586643 27.4
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 6.1714 | 7.88065 | 9.091921 | 8.40505 | 8.99735 | 19.9068 |
2 | hist | 12.0584 | 13.88825 | 15.687789 | 14.66350 | 15.50745 | 57.9829 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.953917 | 1.762323 | 1.725465 | 1.744606 | 1.723558 | 2.912718 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014744 54.2 1621234 86.6 1621234 86.6
Vcells 1895068 14.5 8388608 64.0 8385753 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.5290 | 2.16265 | 3.241491 | 2.59395 | 2.86575 | 14.4017 |
2 | hist | 5.0762 | 6.07795 | 7.668135 | 6.72570 | 7.48010 | 41.7290 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.319948 | 2.810418 | 2.36562 | 2.592841 | 2.610172 | 2.897505 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016259 54.3 1621234 86.6 1621234 86.6
Vcells 1952991 15.0 8388608 64.0 8385753 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 11.1848 | 12.11090 | 13.17789 | 12.57105 | 13.52485 | 22.9117 |
2 | hist | 13.1387 | 15.06135 | 17.22445 | 16.53210 | 17.53320 | 32.5319 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.174692 | 1.243619 | 1.307072 | 1.315093 | 1.296369 | 1.419882 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016271 54.3 1621234 86.6 1621234 86.6
Vcells 1953511 15.0 8388608 64.0 8388219 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 2.0868 | 3.98335 | 5.621486 | 4.44715 | 5.24885 | 31.1366 |
2 | hist | 5.0087 | 6.76295 | 8.472206 | 7.52250 | 8.99830 | 39.1896 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 2.400182 | 1.697805 | 1.507111 | 1.691533 | 1.714337 | 1.258635 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R Under development (unstable) (2024-09-02 r87090 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.4.0 ggplot2_3.5.1
[4] knitr_1.48 R.devices_2.17.2 R.utils_2.12.3
[7] R.oo_1.26.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.47
[5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-1 markdown_1.13
[9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.5.0
[13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2
[21] digest_0.6.37 R6_2.5.1 utf8_1.2.4 pillar_1.9.0
[25] magrittr_2.0.3 withr_3.0.1 tools_4.5.0 gtable_0.3.5
Total processing time was 11.33 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2024-09-03 18:07:00 (+0200 UTC). Powered by RSP.