[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 832976 44.5 1602994 85.7 1238948 66.2 Vcells 1550951 11.9 8388608 64.0 3747923 28.6 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|-------:|--------:|--------:|--------:|-------:| |1 |binCounts | 6.0648| 6.5985| 7.15556| 6.67395| 6.88525| 14.4020| |2 |hist | 10.8401| 12.5433| 13.15207| 12.76730| 13.05785| 20.3017| | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.00000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.78738| 1.900932| 1.838021| 1.913005| 1.896496| 1.409644| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1014082 54.2 1602994 85.7 1602994 85.7 Vcells 1897117 14.5 8388608 64.0 8387727 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|------:|-------:|-------:| |1 |binCounts | 1.1791| 1.43800| 1.981658| 1.6019| 1.67240| 6.9819| |2 |hist | 3.6734| 4.15625| 4.655433| 4.3932| 4.60495| 10.7287| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.115427| 2.890299| 2.349262| 2.742493| 2.753498| 1.536645| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1015597 54.3 1602994 85.7 1602994 85.7 Vcells 1955040 15.0 8388608 64.0 8387727 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 9.9433| 11.51275| 12.05104| 11.91325| 12.07075| 17.6203| |2 |hist | 11.2240| 13.21570| 13.97113| 13.55680| 13.93655| 20.2583| | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|--------:|-------:|-------:|--------:|--------:| |1 |binCounts | 1.0000| 1.000000| 1.00000| 1.00000| 1.000000| 1.000000| |2 |hist | 1.1288| 1.147919| 1.15933| 1.13796| 1.154572| 1.149714| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1015609 54.3 1602994 85.7 1602994 85.7 Vcells 1955560 15.0 8388608 64.0 8387727 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 0.7632| 2.69325| 3.262256| 2.79545| 3.11185| 9.5363| |2 |hist | 4.4010| 5.07225| 5.598413| 5.37100| 5.68055| 11.0009| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 5.766509| 1.883319| 1.716117| 1.921337| 1.825457| 1.153582| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R Under development (unstable) (2025-01-06 r87534 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default LAPACK version 3.12.0 locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.5.0 matrixStats_1.5.0 ggplot2_3.5.1 [4] knitr_1.49 R.devices_2.17.2 R.utils_2.12.3 [7] R.oo_1.27.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.50 [5] labeling_0.4.3 glue_1.8.0 colorspace_2.1-1 markdown_1.13 [9] scales_1.3.0 R.cache_0.16.0 grid_4.5.0 munsell_0.5.1 [13] evaluate_1.0.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2 [21] digest_0.6.37 R6_2.5.1 pillar_1.10.1 magrittr_2.0.3 [25] withr_3.0.2 tools_4.5.0 gtable_0.3.6 ``` Total processing time was 8.72 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2025-01-07 19:51:39 (+0100 UTC). Powered by [RSP].