binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  833705 44.6    1621234 86.6  1206634 64.5
Vcells 1549062 11.9    8388608 64.0  3586643 27.4
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 6.1714 7.88065 9.091921 8.40505 8.99735 19.9068
2 hist 12.0584 13.88825 15.687789 14.66350 15.50745 57.9829
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.953917 1.762323 1.725465 1.744606 1.723558 2.912718

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014744 54.2    1621234 86.6  1621234 86.6
Vcells 1895068 14.5    8388608 64.0  8385753 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.5290 2.16265 3.241491 2.59395 2.86575 14.4017
2 hist 5.0762 6.07795 7.668135 6.72570 7.48010 41.7290
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000
2 hist 3.319948 2.810418 2.36562 2.592841 2.610172 2.897505

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016259 54.3    1621234 86.6  1621234 86.6
Vcells 1952991 15.0    8388608 64.0  8385753 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 11.1848 12.11090 13.17789 12.57105 13.52485 22.9117
2 hist 13.1387 15.06135 17.22445 16.53210 17.53320 32.5319
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.174692 1.243619 1.307072 1.315093 1.296369 1.419882

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016271 54.3    1621234 86.6  1621234 86.6
Vcells 1953511 15.0    8388608 64.0  8388219 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 2.0868 3.98335 5.621486 4.44715 5.24885 31.1366
2 hist 5.0087 6.76295 8.472206 7.52250 8.99830 39.1896
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 2.400182 1.697805 1.507111 1.691533 1.714337 1.258635

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2024-09-02 r87090 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default


locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.4.0     ggplot2_3.5.1        
[4] knitr_1.48            R.devices_2.17.2      R.utils_2.12.3       
[7] R.oo_1.26.0           R.methodsS3_1.8.2    

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.3        rlang_1.1.4      xfun_0.47       
 [5] labeling_0.4.3   glue_1.7.0       colorspace_2.1-1 markdown_1.13   
 [9] scales_1.3.0     fansi_1.0.6      R.cache_0.16.0   grid_4.5.0      
[13] munsell_0.5.1    tibble_3.2.1     R.rsp_0.46.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.5.0   pkgconfig_2.0.3  farver_2.1.2    
[21] digest_0.6.37    R6_2.5.1         utf8_1.2.4       pillar_1.9.0    
[25] magrittr_2.0.3   withr_3.0.1      tools_4.5.0      gtable_0.3.5    

Total processing time was 11.33 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2024-09-03 18:07:00 (+0200 UTC). Powered by RSP.