binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  832976 44.5    1602994 85.7  1238948 66.2
Vcells 1550951 11.9    8388608 64.0  3747923 28.6
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 6.0648 6.5985 7.15556 6.67395 6.88525 14.4020
2 hist 10.8401 12.5433 13.15207 12.76730 13.05785 20.3017
expr min lq mean median uq max
1 binCounts 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.78738 1.900932 1.838021 1.913005 1.896496 1.409644

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014082 54.2    1602994 85.7  1602994 85.7
Vcells 1897117 14.5    8388608 64.0  8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.1791 1.43800 1.981658 1.6019 1.67240 6.9819
2 hist 3.6734 4.15625 4.655433 4.3932 4.60495 10.7287
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.115427 2.890299 2.349262 2.742493 2.753498 1.536645

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1015597 54.3    1602994 85.7  1602994 85.7
Vcells 1955040 15.0    8388608 64.0  8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 9.9433 11.51275 12.05104 11.91325 12.07075 17.6203
2 hist 11.2240 13.21570 13.97113 13.55680 13.93655 20.2583
expr min lq mean median uq max
1 binCounts 1.0000 1.000000 1.00000 1.00000 1.000000 1.000000
2 hist 1.1288 1.147919 1.15933 1.13796 1.154572 1.149714

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1015609 54.3    1602994 85.7  1602994 85.7
Vcells 1955560 15.0    8388608 64.0  8387727 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 0.7632 2.69325 3.262256 2.79545 3.11185 9.5363
2 hist 4.4010 5.07225 5.598413 5.37100 5.68055 11.0009
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 5.766509 1.883319 1.716117 1.921337 1.825457 1.153582

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2025-01-06 r87534 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default
  LAPACK version 3.12.0

locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.5.0 matrixStats_1.5.0    ggplot2_3.5.1       
[4] knitr_1.49           R.devices_2.17.2     R.utils_2.12.3      
[7] R.oo_1.27.0          R.methodsS3_1.8.2   

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.3        rlang_1.1.4      xfun_0.50       
 [5] labeling_0.4.3   glue_1.8.0       colorspace_2.1-1 markdown_1.13   
 [9] scales_1.3.0     R.cache_0.16.0   grid_4.5.0       munsell_0.5.1   
[13] evaluate_1.0.1   tibble_3.2.1     R.rsp_0.46.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.5.0   pkgconfig_2.0.3  farver_2.1.2    
[21] digest_0.6.37    R6_2.5.1         pillar_1.10.1    magrittr_2.0.3  
[25] withr_3.0.2      tools_4.5.0      gtable_0.3.6    

Total processing time was 8.72 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2025-01-07 19:51:39 (+0100 UTC). Powered by RSP.