[matrixStats]: Benchmark report --------------------------------------- # binCounts() benchmarks This report benchmark the performance of binCounts() against alternative methods. ## Alternative methods * hist() as below ```r > hist <- graphics::hist > binCounts_hist <- function(x, bx, right = FALSE, ...) { + hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts + } ``` ## Data type "integer" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) int [1:100000] 722 285 591 3 349 509 216 91 150 383 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 807852 43.2 1315788 70.3 1315788 70.3 Vcells 1498901 11.5 8388608 64.0 3562027 27.2 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|--------:|---------:|-------:|--------:|-------:| |1 |binCounts | 5.9754| 6.43200| 6.878917| 6.5684| 6.68900| 13.1917| |2 |hist | 11.5389| 12.30655| 12.916438| 12.4926| 12.62675| 19.1192| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.931067| 1.913332| 1.877685| 1.901924| 1.887689| 1.449336| _Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 969795 51.8 1954774 104.4 1315788 70.3 Vcells 1824343 14.0 8388608 64.0 8386590 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 1.2297| 1.56915| 2.051464| 1.65505| 1.72965| 7.8569| |2 |hist | 3.7337| 4.12480| 4.656045| 4.51275| 4.63205| 13.4317| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 3.036269| 2.628684| 2.269621| 2.726655| 2.678027| 1.709542| _Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,integer,sorted,benchmark.png) ## Data type "double" ### Non-sorted simulated data ```r > set.seed(48879) > nx <- 1e+05 > xmax <- 0.01 * nx > x <- runif(nx, min = 0, max = xmax) > storage.mode(x) <- mode > str(x) num [1:100000] 722.11 285.54 591.33 3.42 349.14 ... > nb <- 10000 > bx <- seq(from = 0, to = xmax, length.out = nb + 1L) > bx <- c(-1, bx, xmax + 1) ``` ### Results ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 969930 51.8 1954774 104.4 1954774 104.4 Vcells 1874747 14.4 8388608 64.0 8386890 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|-------:|-------:|--------:|--------:|--------:|-------:| |1 |binCounts | 10.5039| 11.7464| 12.25321| 12.11045| 12.20440| 19.0747| |2 |hist | 11.6019| 13.5030| 14.08809| 13.69295| 13.91275| 20.3897| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|--------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| 1.000000| |2 |hist | 1.104533| 1.149544| 1.149747| 1.130672| 1.139978| 1.068939| _Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,unsorted,benchmark.png) ### Sorted simulated data ```r > x <- sort(x) ``` ```r > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 969996 51.9 1954774 104.4 1954774 104.4 Vcells 1875303 14.4 8388608 64.0 8388323 64.0 > stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms") ``` _Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times._ | |expr | min| lq| mean| median| uq| max| |:--|:---------|------:|-------:|--------:|-------:|-------:|-------:| |1 |binCounts | 1.1680| 2.76695| 3.048870| 2.86545| 3.03605| 8.1864| |2 |hist | 4.3223| 4.88730| 5.577062| 5.15950| 5.44775| 13.3829| | |expr | min| lq| mean| median| uq| max| |:--|:---------|--------:|--------:|--------:|-------:|--------:|--------:| |1 |binCounts | 1.000000| 1.000000| 1.000000| 1.00000| 1.000000| 1.000000| |2 |hist | 3.700599| 1.766313| 1.829223| 1.80059| 1.794355| 1.634772| _Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds._ ![](figures/binCounts,double,sorted,benchmark.png) ## Appendix ### Session information ```r R Under development (unstable) (2023-11-06 r85483 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows Server 2022 x64 (build 20348) Matrix products: default locale: [1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=C LC_NUMERIC=C [5] LC_TIME=C time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4.10 matrixStats_1.1.0 ggplot2_3.4.4 [4] knitr_1.45 R.devices_2.17.1 R.utils_2.12.2 [7] R.oo_1.25.0 R.methodsS3_1.8.2 loaded via a namespace (and not attached): [1] vctrs_0.6.4 cli_3.6.1 rlang_1.1.2 xfun_0.41 [5] labeling_0.4.3 glue_1.6.2 colorspace_2.1-0 markdown_1.11 [9] scales_1.2.1 fansi_1.0.5 R.cache_0.16.0 grid_4.4.0 [13] munsell_0.5.0 tibble_3.2.1 R.rsp_0.45.0 base64enc_0.1-3 [17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1 [21] digest_0.6.33 R6_2.5.1 utf8_1.2.4 pillar_1.9.0 [25] magrittr_2.0.3 withr_2.5.2 tools_4.4.0 gtable_0.3.4 ``` Total processing time was 8.67 secs. ### Reproducibility To reproduce this report, do: ```r html <- matrixStats:::benchmark('binCounts') ``` [RSP]: https://cran.r-project.org/package=R.rsp [matrixStats]: https://cran.r-project.org/package=matrixStats [StackOverflow:colMins?]: https://stackoverflow.com/questions/13676878 "Stack Overflow: fastest way to get Min from every column in a matrix?" [StackOverflow:colSds?]: https://stackoverflow.com/questions/17549762 "Stack Overflow: Is there such 'colsd' in R?" [StackOverflow:rowProds?]: https://stackoverflow.com/questions/20198801/ "Stack Overflow: Row product of matrix and column sum of matrix" --------------------------------------- Copyright Henrik Bengtsson. Last updated on 2023-11-07 04:52:01 (+0100 UTC). Powered by [RSP].