binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  827516 44.2    1294397 69.2  1294397 69.2
Vcells 1537519 11.8    8388608 64.0  3770071 28.8
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 4.7201 6.13755 6.836211 6.54710 6.71110 15.4372
2 hist 8.9065 11.31465 12.251769 12.53195 12.76785 18.4921
expr min lq mean median uq max
1 binCounts 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.88693 1.843513 1.792187 1.914122 1.902497 1.197892

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells 1006376 53.8    2069815 110.6  1294397 69.2
Vcells 1879259 14.4    8388608  64.0  8381817 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 0.9862 1.48735 1.884892 1.61415 1.72045 7.1963
2 hist 3.7436 4.37895 4.850464 4.56450 4.77025 10.8815
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.795985 2.944129 2.573338 2.827804 2.772676 1.512096

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 1007882 53.9    2069815 110.6  2069815 110.6
Vcells 1937167 14.8    8388608  64.0  8385326  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 10.2775 11.49410 12.03821 11.80905 11.9789 17.2332
2 hist 12.8235 13.21065 13.85622 13.60840 13.7925 18.8832
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.00000 1.00000 1.000000 1.000000
2 hist 1.247726 1.149342 1.15102 1.15237 1.151399 1.095745

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 1007909 53.9    2069815 110.6  2069815 110.6
Vcells 1937712 14.8    8388608  64.0  8387357  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.1107 2.45555 2.966433 2.66765 2.76275 7.7006
2 hist 4.3027 4.75330 5.283610 5.10245 5.29500 10.3809
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.873863 1.935737 1.781132 1.912713 1.916569 1.348064

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 4.4.0 beta (2024-04-09 r86391 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default


locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.3.0     ggplot2_3.5.0        
[4] knitr_1.46            R.devices_2.17.2      R.utils_2.12.3       
[7] R.oo_1.26.0           R.methodsS3_1.8.2    

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.2        rlang_1.1.3      xfun_0.43       
 [5] labeling_0.4.3   glue_1.7.0       colorspace_2.1-0 markdown_1.12   
 [9] scales_1.3.0     fansi_1.0.6      R.cache_0.16.0   grid_4.4.0      
[13] munsell_0.5.1    tibble_3.2.1     R.rsp_0.46.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.4.0   pkgconfig_2.0.3  farver_2.1.1    
[21] digest_0.6.35    R6_2.5.1         utf8_1.2.4       pillar_1.9.0    
[25] magrittr_2.0.3   withr_3.0.0      tools_4.4.0      gtable_0.3.4    

Total processing time was 8.57 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2024-04-10 21:56:19 (+0200 UTC). Powered by RSP.