binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  819837 43.8    1348511 72.1  1348511 72.1
Vcells 1524459 11.7    8388608 64.0  3585137 27.4
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 5.5582 6.5005 7.036635 6.5931 6.79815 13.6604
2 hist 11.2025 12.3327 12.804920 12.4359 12.70115 19.0652
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 2.015491 1.897193 1.819751 1.886199 1.868324 1.395655

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  983450 52.6    1977214 105.6  1348511 72.1
Vcells 1853100 14.2    8388608  64.0  8387036 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.1696 1.35795 1.894568 1.55545 1.60035 7.7298
2 hist 3.7095 4.13090 4.709910 4.40185 4.63760 10.4031
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.171597 3.042012 2.486007 2.829953 2.897866 1.345843

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  983576 52.6    1977214 105.6  1977214 105.6
Vcells 1903489 14.6    8388608  64.0  8387036  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 10.8493 11.40195 11.95255 11.73145 11.88470 17.3304
2 hist 12.3958 13.25920 13.92087 13.64710 13.79745 19.6925
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.142544 1.162889 1.164678 1.163292 1.160942 1.136298

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  983642 52.6    1977214 105.6  1977214 105.6
Vcells 1904045 14.6    8388608  64.0  8388565  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 0.8767 2.91920 3.434081 3.11045 3.31895 9.3621
2 hist 4.2918 4.82315 5.577651 5.15055 5.51220 17.7258
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 4.895403 1.652216 1.624205 1.655886 1.660826 1.893357

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2023-12-09 r85665 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default


locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.2.0     ggplot2_3.4.4        
[4] knitr_1.45            R.devices_2.17.1      R.utils_2.12.3       
[7] R.oo_1.25.0           R.methodsS3_1.8.2    

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.2        rlang_1.1.2      xfun_0.41       
 [5] labeling_0.4.3   glue_1.6.2       colorspace_2.1-0 markdown_1.12   
 [9] scales_1.3.0     fansi_1.0.6      R.cache_0.16.0   grid_4.4.0      
[13] munsell_0.5.0    tibble_3.2.1     R.rsp_0.45.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.4.0   pkgconfig_2.0.3  farver_2.1.1    
[21] digest_0.6.33    R6_2.5.1         utf8_1.2.4       pillar_1.9.0    
[25] magrittr_2.0.3   withr_2.5.2      tools_4.4.0      gtable_0.3.4    

Total processing time was 8.56 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2023-12-11 22:25:22 (+0100 UTC). Powered by RSP.