[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 833928 44.6 1622094 86.7 1206194 64.5 Vcells 1549561 11.9 8388608 64.0 2908138 22.2 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|---------:|---------:|--------:|---------:|-------:|-------:| |1 |binCounts | 6.830401| 8.638901| 10.16872| 9.267901| 10.4501| 23.7555| |2 |hist | 12.593701| 15.121951| 16.50035| 15.608601| 16.7102| 33.0765| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.843772| 1.750449| 1.622658| 1.684157| 1.599047| 1.392372| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1014991 54.3 1622094 86.7 1622094 86.7 Vcells 1895623 14.5 8388608 64.0 8386337 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 1.256401| 2.066951| 2.879923| 2.548052| 2.817652| 21.3416| |2 |hist | 3.816601| 5.192551| 7.086675| 6.426251| 7.236751| 28.0253| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.037725| 2.512179| 2.460717| 2.522026| 2.568363| 1.313177| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1016506 54.3 1622094 86.7 1622094 86.7 Vcells 1953546 15.0 8388608 64.0 8388300 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 10.0117| 12.03810| 14.02472| 13.50955| 14.63490| 32.6383| |2 |hist | 11.9117| 15.38895| 16.74322| 16.38320| 17.34775| 30.2571| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|---------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.0000000| |2 |hist | 1.189778| 1.278354| 1.193836| 1.212713| 1.185369| 0.9270428| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1016518 54.3 1622094 86.7 1622094 86.7 Vcells 1954066 15.0 8388608 64.0 8388300 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 1.744401| 3.355101| 4.668619| 4.183101| 4.877601| 15.7201| |2 |hist | 4.885101| 6.319501| 7.422926| 7.030701| 8.067950| 21.4517| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|-------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.00000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 2.800446| 1.88355| 1.589962| 1.680739| 1.654082| 1.364603| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R Under development (unstable) (2024-09-06 r87103 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.5.0 matrixStats_1.4.1 ggplot2_3.5.1 [4] knitr_1.48 R.devices_2.17.2 R.utils_2.12.3 [7] R.oo_1.26.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.47 [5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-1 markdown_1.13 [9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.5.0 [13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2 [21] digest_0.6.37 R6_2.5.1 utf8_1.2.4 pillar_1.9.0 [25] magrittr_2.0.3 withr_3.0.1 tools_4.5.0 gtable_0.3.5 ``` Total processing time was 11.7 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2024-09-07 04:09:29 (+0200 UTC). Powered by [RSP].