binCounts() benchmarks

matrixStats: Benchmark report


binCounts() benchmarks

This report benchmark the performance of binCounts() against alternative methods.

Alternative methods

as below

> hist <- graphics::hist
> binCounts_hist <- function(x, bx, right = FALSE, ...) {
+     hist(x, breaks = bx, right = right, include.lowest = TRUE, plot = FALSE)$counts
+ }

Data type “integer”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  807852 43.2    1315788 70.3  1315788 70.3
Vcells 1498901 11.5    8388608 64.0  3562027 27.2
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 5.9754 6.43200 6.878917 6.5684 6.68900 13.1917
2 hist 11.5389 12.30655 12.916438 12.4926 12.62675 19.1192
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.931067 1.913332 1.877685 1.901924 1.887689 1.449336

Figure: Benchmarking of binCounts() and hist() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  969795 51.8    1954774 104.4  1315788 70.3
Vcells 1824343 14.0    8388608  64.0  8386590 64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.2297 1.56915 2.051464 1.65505 1.72965 7.8569
2 hist 3.7337 4.12480 4.656045 4.51275 4.63205 13.4317
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 3.036269 2.628684 2.269621 2.726655 2.678027 1.709542

Figure: Benchmarking of binCounts() and hist() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type “double”

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)

Results

> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  969930 51.8    1954774 104.4  1954774 104.4
Vcells 1874747 14.4    8388608  64.0  8386890  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 10.5039 11.7464 12.25321 12.11045 12.20440 19.0747
2 hist 11.6019 13.5030 14.08809 13.69295 13.91275 20.3897
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 hist 1.104533 1.149544 1.149747 1.130672 1.139978 1.068939

Figure: Benchmarking of binCounts() and hist() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  969996 51.9    1954774 104.4  1954774 104.4
Vcells 1875303 14.4    8388608  64.0  8388323  64.0
> stats <- microbenchmark(binCounts = binCounts(x, bx = bx), hist = binCounts_hist(x, bx = bx), unit = "ms")

Table: Benchmarking of binCounts() and hist() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts 1.1680 2.76695 3.048870 2.86545 3.03605 8.1864
2 hist 4.3223 4.88730 5.577062 5.15950 5.44775 13.3829
expr min lq mean median uq max
1 binCounts 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000
2 hist 3.700599 1.766313 1.829223 1.80059 1.794355 1.634772

Figure: Benchmarking of binCounts() and hist() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2023-11-06 r85483 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default


locale:
[1] LC_COLLATE=C                 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=C                LC_NUMERIC=C                
[5] LC_TIME=C                   

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.10 matrixStats_1.1.0     ggplot2_3.4.4        
[4] knitr_1.45            R.devices_2.17.1      R.utils_2.12.2       
[7] R.oo_1.25.0           R.methodsS3_1.8.2    

loaded via a namespace (and not attached):
 [1] vctrs_0.6.4      cli_3.6.1        rlang_1.1.2      xfun_0.41       
 [5] labeling_0.4.3   glue_1.6.2       colorspace_2.1-0 markdown_1.11   
 [9] scales_1.2.1     fansi_1.0.5      R.cache_0.16.0   grid_4.4.0      
[13] munsell_0.5.0    tibble_3.2.1     R.rsp_0.45.0     base64enc_0.1-3 
[17] lifecycle_1.0.4  compiler_4.4.0   pkgconfig_2.0.3  farver_2.1.1    
[21] digest_0.6.33    R6_2.5.1         utf8_1.2.4       pillar_1.9.0    
[25] magrittr_2.0.3   withr_2.5.2      tools_4.4.0      gtable_0.3.4    

Total processing time was 8.67 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts')

Copyright Henrik Bengtsson. Last updated on 2023-11-07 04:52:01 (+0100 UTC). Powered by RSP.