[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 833705 44.6 1621234 86.6 1206634 64.5 Vcells 1549062 11.9 8388608 64.0 3586643 27.4 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|---------:|--------:|--------:|-------:| |1 |binCounts | 6.1714| 7.88065| 9.091921| 8.40505| 8.99735| 19.9068| |2 |hist | 12.0584| 13.88825| 15.687789| 14.66350| 15.50745| 57.9829| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.953917| 1.762323| 1.725465| 1.744606| 1.723558| 2.912718| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1014744 54.2 1621234 86.6 1621234 86.6 Vcells 1895068 14.5 8388608 64.0 8385753 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 1.5290| 2.16265| 3.241491| 2.59395| 2.86575| 14.4017| |2 |hist | 5.0762| 6.07795| 7.668135| 6.72570| 7.48010| 41.7290| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|-------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.00000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.319948| 2.810418| 2.36562| 2.592841| 2.610172| 2.897505| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1016259 54.3 1621234 86.6 1621234 86.6 Vcells 1952991 15.0 8388608 64.0 8385753 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 11.1848| 12.11090| 13.17789| 12.57105| 13.52485| 22.9117| |2 |hist | 13.1387| 15.06135| 17.22445| 16.53210| 17.53320| 32.5319| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.174692| 1.243619| 1.307072| 1.315093| 1.296369| 1.419882| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1016271 54.3 1621234 86.6 1621234 86.6 Vcells 1953511 15.0 8388608 64.0 8388219 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 2.0868| 3.98335| 5.621486| 4.44715| 5.24885| 31.1366| |2 |hist | 5.0087| 6.76295| 8.472206| 7.52250| 8.99830| 39.1896| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 2.400182| 1.697805| 1.507111| 1.691533| 1.714337| 1.258635| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R Under development (unstable) (2024-09-02 r87090 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4.10 matrixStats_1.4.0 ggplot2_3.5.1 [4] knitr_1.48 R.devices_2.17.2 R.utils_2.12.3 [7] R.oo_1.26.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.47 [5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-1 markdown_1.13 [9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.5.0 [13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2 [21] digest_0.6.37 R6_2.5.1 utf8_1.2.4 pillar_1.9.0 [25] magrittr_2.0.3 withr_3.0.1 tools_4.5.0 gtable_0.3.5 ``` Total processing time was 11.33 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2024-09-03 18:07:00 (+0200 UTC). Powered by [RSP].