matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 827516 44.2 1294397 69.2 1294397 69.2
Vcells 1537519 11.8 8388608 64.0 3770071 28.8
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 4.7201 | 6.13755 | 6.836211 | 6.54710 | 6.71110 | 15.4372 |
2 | hist | 8.9065 | 11.31465 | 12.251769 | 12.53195 | 12.76785 | 18.4921 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.88693 | 1.843513 | 1.792187 | 1.914122 | 1.902497 | 1.197892 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1006376 53.8 2069815 110.6 1294397 69.2
Vcells 1879259 14.4 8388608 64.0 8381817 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 0.9862 | 1.48735 | 1.884892 | 1.61415 | 1.72045 | 7.1963 |
2 | hist | 3.7436 | 4.37895 | 4.850464 | 4.56450 | 4.77025 | 10.8815 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.795985 | 2.944129 | 2.573338 | 2.827804 | 2.772676 | 1.512096 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1007882 53.9 2069815 110.6 2069815 110.6
Vcells 1937167 14.8 8388608 64.0 8385326 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 10.2775 | 11.49410 | 12.03821 | 11.80905 | 11.9789 | 17.2332 |
2 | hist | 12.8235 | 13.21065 | 13.85622 | 13.60840 | 13.7925 | 18.8832 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.00000 | 1.00000 | 1.000000 | 1.000000 |
2 | hist | 1.247726 | 1.149342 | 1.15102 | 1.15237 | 1.151399 | 1.095745 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1007909 53.9 2069815 110.6 2069815 110.6
Vcells 1937712 14.8 8388608 64.0 8387357 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.1107 | 2.45555 | 2.966433 | 2.66765 | 2.76275 | 7.7006 |
2 | hist | 4.3027 | 4.75330 | 5.283610 | 5.10245 | 5.29500 | 10.3809 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.873863 | 1.935737 | 1.781132 | 1.912713 | 1.916569 | 1.348064 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R version 4.4.0 beta (2024-04-09 r86391 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.3.0 ggplot2_3.5.0
[4] knitr_1.46 R.devices_2.17.2 R.utils_2.12.3
[7] R.oo_1.26.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.3 xfun_0.43
[5] labeling_0.4.3 glue_1.7.0 colorspace_2.1-0 markdown_1.12
[9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.4.0
[13] munsell_0.5.1 tibble_3.2.1 R.rsp_0.46.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1
[21] digest_0.6.35 R6_2.5.1 utf8_1.2.4 pillar_1.9.0
[25] magrittr_2.0.3 withr_3.0.0 tools_4.4.0 gtable_0.3.4
Total processing time was 8.57 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2024-04-10 21:56:19 (+0200 UTC). Powered by RSP.