[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 827516 44.2 1294397 69.2 1294397 69.2 Vcells 1537519 11.8 8388608 64.0 3770071 28.8 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|--------:|---------:|--------:|--------:|-------:| |1 |binCounts | 4.7201| 6.13755| 6.836211| 6.54710| 6.71110| 15.4372| |2 |hist | 8.9065| 11.31465| 12.251769| 12.53195| 12.76785| 18.4921| | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.00000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.88693| 1.843513| 1.792187| 1.914122| 1.902497| 1.197892| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1006376 53.8 2069815 110.6 1294397 69.2 Vcells 1879259 14.4 8388608 64.0 8381817 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 0.9862| 1.48735| 1.884892| 1.61415| 1.72045| 7.1963| |2 |hist | 3.7436| 4.37895| 4.850464| 4.56450| 4.77025| 10.8815| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.795985| 2.944129| 2.573338| 2.827804| 2.772676| 1.512096| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1007882 53.9 2069815 110.6 2069815 110.6 Vcells 1937167 14.8 8388608 64.0 8385326 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|-------:|-------:| |1 |binCounts | 10.2775| 11.49410| 12.03821| 11.80905| 11.9789| 17.2332| |2 |hist | 12.8235| 13.21065| 13.85622| 13.60840| 13.7925| 18.8832| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|-------:|-------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.00000| 1.00000| 1.000000| 1.000000| |2 |hist | 1.247726| 1.149342| 1.15102| 1.15237| 1.151399| 1.095745| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1007909 53.9 2069815 110.6 2069815 110.6 Vcells 1937712 14.8 8388608 64.0 8387357 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 1.1107| 2.45555| 2.966433| 2.66765| 2.76275| 7.7006| |2 |hist | 4.3027| 4.75330| 5.283610| 5.10245| 5.29500| 10.3809| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.873863| 1.935737| 1.781132| 1.912713| 1.916569| 1.348064| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R version 4.4.0 beta (2024-04-09 r86391 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4.10 matrixStats_1.3.0 ggplot2_3.5.0 [4] knitr_1.46 R.devices_2.17.2 R.utils_2.12.3 [7] R.oo_1.26.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.3 xfun_0.43 [5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-0 markdown_1.12 [9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.4.0 [13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1 [21] digest_0.6.35 R6_2.5.1 utf8_1.2.4 pillar_1.9.0 [25] magrittr_2.0.3 withr_3.0.0 tools_4.4.0 gtable_0.3.4 ``` Total processing time was 8.57 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2024-04-10 21:56:19 (+0200 UTC). Powered by [RSP].