[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 819837 43.8 1348511 72.1 1348511 72.1 Vcells 1524459 11.7 8388608 64.0 3585137 27.4 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|-------:|---------:|-------:|--------:|-------:| |1 |binCounts | 5.5582| 6.5005| 7.036635| 6.5931| 6.79815| 13.6604| |2 |hist | 11.2025| 12.3327| 12.804920| 12.4359| 12.70115| 19.0652| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 2.015491| 1.897193| 1.819751| 1.886199| 1.868324| 1.395655| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 983450 52.6 1977214 105.6 1348511 72.1 Vcells 1853100 14.2 8388608 64.0 8387036 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 1.1696| 1.35795| 1.894568| 1.55545| 1.60035| 7.7298| |2 |hist | 3.7095| 4.13090| 4.709910| 4.40185| 4.63760| 10.4031| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.171597| 3.042012| 2.486007| 2.829953| 2.897866| 1.345843| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 983576 52.6 1977214 105.6 1977214 105.6 Vcells 1903489 14.6 8388608 64.0 8387036 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|--------:|--------:|--------:|-------:| |1 |binCounts | 10.8493| 11.40195| 11.95255| 11.73145| 11.88470| 17.3304| |2 |hist | 12.3958| 13.25920| 13.92087| 13.64710| 13.79745| 19.6925| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.142544| 1.162889| 1.164678| 1.163292| 1.160942| 1.136298| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 983642 52.6 1977214 105.6 1977214 105.6 Vcells 1904045 14.6 8388608 64.0 8388565 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 0.8767| 2.91920| 3.434081| 3.11045| 3.31895| 9.3621| |2 |hist | 4.2918| 4.82315| 5.577651| 5.15055| 5.51220| 17.7258| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 4.895403| 1.652216| 1.624205| 1.655886| 1.660826| 1.893357| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R Under development (unstable) (2023-12-09 r85665 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4.10 matrixStats_1.2.0 ggplot2_3.4.4 [4] knitr_1.45 R.devices_2.17.1 R.utils_2.12.3 [7] R.oo_1.25.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.2 xfun_0.41 [5] labeling_0.4.3 glue_1.6.2 colorspace_2.1-0 markdown_1.12 [9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.4.0 [13] munsell_0.5.0 tibble_3.2.1 R.rsp_0.45.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1 [21] digest_0.6.33 R6_2.5.1 utf8_1.2.4 pillar_1.9.0 [25] magrittr_2.0.3 withr_2.5.2 tools_4.4.0 gtable_0.3.4 ``` Total processing time was 8.56 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2023-12-11 22:25:22 (+0100 UTC). Powered by [RSP].