matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 833928 44.6 1622094 86.7 1206194 64.5
Vcells 1549561 11.9 8388608 64.0 2908138 22.2
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 6.830401 | 8.638901 | 10.16872 | 9.267901 | 10.4501 | 23.7555 |
2 | hist | 12.593701 | 15.121951 | 16.50035 | 15.608601 | 16.7102 | 33.0765 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.843772 | 1.750449 | 1.622658 | 1.684157 | 1.599047 | 1.392372 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014991 54.3 1622094 86.7 1622094 86.7
Vcells 1895623 14.5 8388608 64.0 8386337 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.256401 | 2.066951 | 2.879923 | 2.548052 | 2.817652 | 21.3416 |
2 | hist | 3.816601 | 5.192551 | 7.086675 | 6.426251 | 7.236751 | 28.0253 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.037725 | 2.512179 | 2.460717 | 2.522026 | 2.568363 | 1.313177 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016506 54.3 1622094 86.7 1622094 86.7
Vcells 1953546 15.0 8388608 64.0 8388300 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 10.0117 | 12.03810 | 14.02472 | 13.50955 | 14.63490 | 32.6383 |
2 | hist | 11.9117 | 15.38895 | 16.74322 | 16.38320 | 17.34775 | 30.2571 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0000000 |
2 | hist | 1.189778 | 1.278354 | 1.193836 | 1.212713 | 1.185369 | 0.9270428 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016518 54.3 1622094 86.7 1622094 86.7
Vcells 1954066 15.0 8388608 64.0 8388300 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.744401 | 3.355101 | 4.668619 | 4.183101 | 4.877601 | 15.7201 |
2 | hist | 4.885101 | 6.319501 | 7.422926 | 7.030701 | 8.067950 | 21.4517 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 2.800446 | 1.88355 | 1.589962 | 1.680739 | 1.654082 | 1.364603 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R Under development (unstable) (2024-09-06 r87103 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.5.0 matrixStats_1.4.1 ggplot2_3.5.1
[4] knitr_1.48 R.devices_2.17.2 R.utils_2.12.3
[7] R.oo_1.26.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.47
[5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-1 markdown_1.13
[9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.5.0
[13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.5.0 pkgconfig_2.0.3 farver_2.1.2
[21] digest_0.6.37 R6_2.5.1 utf8_1.2.4 pillar_1.9.0
[25] magrittr_2.0.3 withr_3.0.1 tools_4.5.0 gtable_0.3.5
Total processing time was 11.7 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2024-09-07 04:09:29 (+0200 UTC). Powered by RSP.