binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  833928 44.6    1622094 86.7  1206194 64.5
Vcells 1549561 11.9    8388608 64.0  2908138 22.2
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 6.830401 8.638901 10.16872 9.267901 10.4501 23.7555
2 hist 12.593701 15.121951 16.50035 15.608601 16.7102 33.0765
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.843772 1.750449 1.622658 1.684157 1.599047 1.392372

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1014991 54.3    1622094 86.7  1622094 86.7
Vcells 1895623 14.5    8388608 64.0  8386337 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.256401 2.066951 2.879923 2.548052 2.817652 21.3416
2 hist 3.816601 5.192551 7.086675 6.426251 7.236751 28.0253
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.037725 2.512179 2.460717 2.522026 2.568363 1.313177

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016506 54.3    1622094 86.7  1622094 86.7
Vcells 1953546 15.0    8388608 64.0  8388300 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 10.0117 12.03810 14.02472 13.50955 14.63490 32.6383
2 hist 11.9117 15.38895 16.74322 16.38320 17.34775 30.2571
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 hist 1.189778 1.278354 1.193836 1.212713 1.185369 0.9270428

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1016518 54.3    1622094 86.7  1622094 86.7
Vcells 1954066 15.0    8388608 64.0  8388300 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.744401 3.355101 4.668619 4.183101 4.877601 15.7201
2 hist 4.885101 6.319501 7.422926 7.030701 8.067950 21.4517
expr min lq mean median uq max
1 binCounts 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000
2 hist 2.800446 1.88355 1.589962 1.680739 1.654082 1.364603

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2024-09-06 r87103 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default


locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.5.0 matrixStats_1.4.1    ggplot2_3.5.1       
[4] knitr_1.48           R.devices_2.17.2     R.utils_2.12.3      
[7] R.oo_1.26.0          R.methodsS3_1.8.2   

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.3        rlang_1.1.4      xfun_0.47       
 [5] labeling_0.4.3   glue_1.7.0       colorspace_2.1-1 markdown_1.13   
 [9] scales_1.3.0     fansi_1.0.6      R.cache_0.16.0   grid_4.5.0      
[13] munsell_0.5.1    tibble_3.2.1     R.rsp_0.46.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.5.0   pkgconfig_2.0.3  farver_2.1.2    
[21] digest_0.6.37    R6_2.5.1         utf8_1.2.4       pillar_1.9.0    
[25] magrittr_2.0.3   withr_3.0.1      tools_4.5.0      gtable_0.3.5    

Total processing time was 11.7 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2024-09-07 04:09:29 (+0200 UTC). Powered by RSP.