R Under development (unstable) (2025-02-03 r87683 ucrt) -- "Unsuffered Consequences" Copyright (C) 2025 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(Umpire) > set.seed(97531) > > ## Need to generate a gene expression data set to get started. > ## So, we must start with a clinical version of a cancer engine > > ## the logical 'isWeighted' argument determines if prevalences are equal or not > ce <- ClinicalEngine(20, 4, FALSE) > summary(ce) # prevalences should be the same A 'CancerEngine' using the cancer model: -------------- Clinical Simulation Model (Raw), a CancerModel object constructed via: CancerModel(name = "Clinical Simulation Model (Raw)", nPossible = NP, nPattern = nClusters, HIT = hitfn, SURV = SURV, OUT = OUT, survivalModel = survivalModel, prevalence = Prevalence(isWeighted, nClusters)) Pattern prevalences: [1] 0.2570758 0.2349797 0.2682044 0.2397401 Survival effects: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.42116 -0.21347 -0.19315 -0.14185 -0.09388 0.23610 Outcome effects: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.42465 -0.16789 -0.07517 0.02297 0.20212 0.59215 -------------- Base expression given by: An Engine with 10 components. Altered expression given by: An Engine with 10 components. > round(ce@cm@prevalence, 2) # good [1] 0.26 0.23 0.27 0.24 > ce <- ClinicalEngine(20, 4, TRUE) > summary(ce) # prevalences should be varied A 'CancerEngine' using the cancer model: -------------- Clinical Simulation Model (Raw), a CancerModel object constructed via: CancerModel(name = "Clinical Simulation Model (Raw)", nPossible = NP, nPattern = nClusters, HIT = hitfn, SURV = SURV, OUT = OUT, survivalModel = survivalModel, prevalence = Prevalence(isWeighted, nClusters)) Pattern prevalences: [1] 0.1746455 0.2192310 0.2631088 0.3430147 Survival effects: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.039196 0.004398 0.046317 0.104108 0.159882 0.393073 Outcome effects: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.50799 -0.19352 -0.05556 -0.01792 0.20335 0.41844 -------------- Base expression given by: An Engine with 10 components. Altered expression given by: An Engine with 10 components. > round(ce@cm@prevalence, 2) # good [1] 0.17 0.22 0.26 0.34 > > nComponents(ce) [1] 10 > N <- nrow(ce) # fixed! > N # should equal 20, as requested by the user [1] 20 > > ## Now generate a data set > dset <- rand(ce, 300) > class(dset) [1] "list" > names(dset) [1] "clinical" "data" > summary(dset$clinical) CancerSubType Outcome LFU Event Min. :1.000 Bad :156 Min. : 0.00 Mode :logical 1st Qu.:2.000 Good:144 1st Qu.:10.00 FALSE:93 Median :3.000 Median :18.00 TRUE :207 Mean :2.797 Mean :22.84 3rd Qu.:4.000 3rd Qu.:33.00 Max. :4.000 Max. :71.00 > dim(dset$data) # 20 features, 300 samples [1] 20 300 > > ## Must add noise before making a mixed-type engine > cnm <- ClinicalNoiseModel(N) # default shape and scale > noisy <- blur(cnm, dset$data) > > ## Now we set the data types > dt <- makeDataTypes(dset$data, 1/3, 1/3, 1/3, 0.3, range = c(3, 9)) > cp <- dt$cutpoints > type <- sapply(cp, function(X) { X$Type }) > table(type) type continuous nominal ordinal symmetric binary 6 3 6 5 > sum(is.na(type)) [1] 0 > length(type) [1] 20 > class(dt$binned) [1] "data.frame" > dim(dt$binned) [1] 300 20 > summary(dt$binned) V1 V2 V3 V4 V5 V6 Min. :0.00 Min. :0.00 A:47 A:92 Min. :0.00 A :56 1st Qu.:0.00 1st Qu.:0.00 B:61 B:68 1st Qu.:0.00 C :48 Median :0.00 Median :0.00 C:60 C:66 Median :0.00 H :41 Mean :0.11 Mean :0.14 D:45 D:74 Mean :0.13 D :38 3rd Qu.:0.00 3rd Qu.:0.00 E:55 3rd Qu.:0.00 G :36 Max. :1.00 Max. :1.00 F:32 Max. :1.00 F :29 (Other):52 V7 V8 V9 V10 V11 A: 83 T :50 Min. :3.149 Min. :4.283 I :40 B:119 Z :40 1st Qu.:4.268 1st Qu.:5.486 A :38 C: 98 S :37 Median :4.600 Median :5.777 D :35 R :30 Mean :4.594 Mean :5.804 F :35 X :30 3rd Qu.:4.885 3rd Qu.:6.192 H :35 Y :30 Max. :5.910 Max. :6.860 G :34 (Other):83 (Other):83 V12 V13 V14 V15 V16 V17 Min. : 3.940 Min. :0.00 Min. :1.971 R:38 Min. :3.518 A:25 1st Qu.: 6.993 1st Qu.:0.00 1st Qu.:4.265 S:28 1st Qu.:4.949 B:37 Median : 7.760 Median :0.00 Median :5.314 T:37 Median :5.398 C:38 Mean : 7.717 Mean :0.18 Mean :5.195 U:65 Mean :5.385 D:40 3rd Qu.: 8.501 3rd Qu.:0.00 3rd Qu.:6.181 V:41 3rd Qu.:5.834 E:42 Max. :10.524 Max. :1.00 Max. :8.306 W:55 Max. :7.495 F:53 X:36 G:65 V18 V19 V20 R:37 Min. :4.126 Min. :0.00 S:57 1st Qu.:5.479 1st Qu.:0.00 T:28 Median :5.857 Median :0.00 U:36 Mean :5.865 Mean :0.35 V:53 3rd Qu.:6.274 3rd Qu.:1.00 W:49 Max. :7.550 Max. :1.00 X:40 > > ## Use the pieces from above to create an MTE. > mte <- MixedTypeEngine(ce, noise = cnm, cutpoints = dt$cutpoints) > # and generate some data > R <- rand(mte, 20) > summary(R) Length Class Mode binned 20 data.frame list clinical 4 data.frame list > > proc.time() user system elapsed 0.87 0.14 1.00