matrixStats: Benchmark report
This report benchmark the performance of binCounts() against alternative methods.
as below
> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+ hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 819837 43.8 1348511 72.1 1348511 72.1
Vcells 1524459 11.7 8388608 64.0 3585137 27.4
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 5.5582 | 6.5005 | 7.036635 | 6.5931 | 6.79815 | 13.6604 |
2 | hist | 11.2025 | 12.3327 | 12.804920 | 12.4359 | 12.70115 | 19.0652 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 2.015491 | 1.897193 | 1.819751 | 1.886199 | 1.868324 | 1.395655 |
Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 983450 52.6 1977214 105.6 1348511 72.1
Vcells 1853100 14.2 8388608 64.0 8387036 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.1696 | 1.35795 | 1.894568 | 1.55545 | 1.60035 | 7.7298 |
2 | hist | 3.7095 | 4.13090 | 4.709910 | 4.40185 | 4.63760 | 10.4031 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 3.171597 | 3.042012 | 2.486007 | 2.829953 | 2.897866 | 1.345843 |
Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 983576 52.6 1977214 105.6 1977214 105.6
Vcells 1903489 14.6 8388608 64.0 8387036 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 10.8493 | 11.40195 | 11.95255 | 11.73145 | 11.88470 | 17.3304 |
2 | hist | 12.3958 | 13.25920 | 13.92087 | 13.64710 | 13.79745 | 19.6925 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 1.142544 | 1.162889 | 1.164678 | 1.163292 | 1.160942 | 1.136298 |
Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.
> x <- sort(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 983642 52.6 1977214 105.6 1977214 105.6
Vcells 1904045 14.6 8388608 64.0 8388565 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")
Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 0.8767 | 2.91920 | 3.434081 | 3.11045 | 3.31895 | 9.3621 |
2 | hist | 4.2918 | 4.82315 | 5.577651 | 5.15055 | 5.51220 | 17.7258 |
expr | min | lq | mean | median | uq | max | |
---|---|---|---|---|---|---|---|
1 | binCounts | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2 | hist | 4.895403 | 1.652216 | 1.624205 | 1.655886 | 1.660826 | 1.893357 |
Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.
R Under development (unstable) (2023-12-09 r85665 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)
Matrix products: default
locale:
[1] LC_COLLATE=C LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.2.0 ggplot2_3.4.4
[4] knitr_1.45 R.devices_2.17.1 R.utils_2.12.3
[7] R.oo_1.25.0 R.methodsS3_1.8.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.2 xfun_0.41
[5] labeling_0.4.3 glue_1.6.2 colorspace_2.1-0 markdown_1.12
[9] scales_1.3.0 fansi_1.0.6 R.cache_0.16.0 grid_4.4.0
[13] munsell_0.5.0 tibble_3.2.1 R.rsp_0.45.0 base64enc_0.1-3
[17] lifecycle_1.0.4 compiler_4.4.0 pkgconfig_2.0.3 farver_2.1.1
[21] digest_0.6.33 R6_2.5.1 utf8_1.2.4 pillar_1.9.0
[25] magrittr_2.0.3 withr_2.5.2 tools_4.4.0 gtable_0.3.4
Total processing time was 8.56 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('binCounts')
Copyright Henrik Bengtsson. Last updated on 2023-12-11 22:25:22 (+0100 UTC). Powered by RSP.