matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 832976 44.5 1602994 85.7 1238948 66.2
Vcells 1550951 11.9 8388608 64.0 3747923 28.6
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 6.0648 | 6.5985 | 7.15556 | 6.67395 | 6.88525 | 14.4020 |
2 | hist | 10.8401 | 12.5433 | 13.15207 | 12.76730 | 13.05785 | 20.3017 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.78738 | 1.900932 | 1.838021 | 1.913005 | 1.896496 | 1.409644 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014082 54.2 1602994 85.7 1602994 85.7
Vcells 1897117 14.5 8388608 64.0 8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.1791 | 1.43800 | 1.981658 | 1.6019 | 1.67240 | 6.9819 |
2 | hist | 3.6734 | 4.15625 | 4.655433 | 4.3932 | 4.60495 | 10.7287 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.115427 | 2.890299 | 2.349262 | 2.742493 | 2.753498 | 1.536645 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1015597 54.3 1602994 85.7 1602994 85.7
Vcells 1955040 15.0 8388608 64.0 8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 9.9433 | 11.51275 | 12.05104 | 11.91325 | 12.07075 | 17.6203 |
2 | hist | 11.2240 | 13.21570 | 13.97113 | 13.55680 | 13.93655 | 20.2583 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.0000 | 1.000000 | 1.00000 | 1.00000 | 1.000000 | 1.000000 |
2 | hist | 1.1288 | 1.147919 | 1.15933 | 1.13796 | 1.154572 | 1.149714 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1015609 54.3 1602994 85.7 1602994 85.7
Vcells 1955560 15.0 8388608 64.0 8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 0.7632 | 2.69325 | 3.262256 | 2.79545 | 3.11185 | 9.5363 |
2 | hist | 4.4010 | 5.07225 | 5.598413 | 5.37100 | 5.68055 | 11.0009 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 5.766509 | 1.883319 | 1.716117 | 1.921337 | 1.825457 | 1.153582 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R Under development (unstable) (2025-01-06 r87534 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
LAPACK version 3.12.0
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.5.0 matrixStats_1.5.0 ggplot2_3.5.1
[4] knitr_1.49 R.devices_2.17.2 R.utils_2.12.3
[7] R.oo_1.27.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.50
[5] labeling_0.4.3 glue_1.8.0 colorspace_2.1-1 markdown_1.13
[9] scales_1.3.0 R.cache_0.16.0 grid_4.5.0 munsell_0.5.1
[13] evaluate_1.0.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2
[21] digest_0.6.37 R6_2.5.1 pillar_1.10.1 magrittr_2.0.3
[25] withr_3.0.2 tools_4.5.0 gtable_0.3.6
Total processing time was 8.72 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2025-01-07 19:51:39 (+0100 UTC). Powered by RSP.