R Under development (unstable) (2026-01-18 r89306 ucrt) -- "Unsuffered Consequences" Copyright (C) 2026 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library( "sampleSelection" ) Loading required package: maxLik Loading required package: miscTools Please cite the 'maxLik' package as: Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1. If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site: https://r-forge.r-project.org/projects/maxlik/ > suppressPackageStartupMessages( library( "mvtnorm" ) ) > options( digits = 2 ) > > nObs <- 300 > betaS <- c( 1, 1, -1 ) > betaO <- c( 10, 4 ) > rho <- 0.4 > sigma <- 5 > bound <- c(-Inf,5,15,Inf) > > > set.seed(123) > dat <- data.frame( x1 = rnorm( nObs ), x2 = rnorm( nObs ) ) > vcovMat <- matrix( c( 1, rho*sigma, rho*sigma, sigma^2 ), nrow = 2 ) > eps <- rmvnorm( nObs, sigma = vcovMat ) > dat$epsS <- eps[,1] > dat$epsO <- eps[,2] > dat$yS <- with( dat, betaS[1] + betaS[2] * x1 + betaS[3] * x2 + epsS ) > 0 > # table(dat$yS) > dat$yOu <- with( dat, betaO[1] + betaO[2] * x1 + epsO ) > dat$yOu[ !dat$yS ] <- NA > hist( dat$yOu ) > dat$yO <- cut( dat$yOu, bound ) > > YS <- dat$yS > XS <- cbind( 1, dat$x1, dat$x2 ) > YO <- as.numeric( dat$yO ) > XO <- cbind( 1, dat$x1 ) > > > start <- c( betaS, betaO, log( sqrt( sigma ) ), atanh( rho ) ) > # the correct starting value of logSigma would be: log( sigma ) > names( start ) <- c( "betaS0", "betaS1", "betaS2", "betaO0", "betaO2", + "logSigma", "atanhRho" ) > > res <- selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = bound, + start = start, printLevel = 1 ) Tobit 2 model 0 invalid observations YO lower upper count 1 (-Inf,5] -Inf 5 26 2 (5,15] 5 15 130 3 (15, Inf] 15 Inf 53 YO observed: 209 times; not observed: 91 times: Number of intervals: 3 YO YS 1 2 3 0 0 0 0 91 1 26 130 53 0 Boundaries: [1] -Inf 5 15 Inf Initial values: (Intercept) x1 x2 (Intercept) x1 logSigma 1.00 1.00 -1.00 10.00 4.00 0.80 atanhRho 0.42 -------------- successive function values within relative tolerance limit (reltol) 13 iterations estimate: 0.98 0.97 -1.3 10 2.7 1.6 0.3 Function value: -275 > > print( res ) Call: selection(selection = yS ~ x1 + x2, outcome = yO ~ x1, data = dat, start = start, boundaries = bound, printLevel = 1) Coefficients: S:(Intercept) S:x1 S:x2 O:(Intercept) O:x1 0.982 0.967 -1.286 10.241 2.659 logSigma atanhRho sigma sigmaSq rho 1.631 0.298 5.108 26.088 0.290 > print( round( coef( res ), 2 ) ) S:(Intercept) S:x1 S:x2 O:(Intercept) O:x1 0.98 0.97 -1.29 10.24 2.66 logSigma atanhRho sigma sigmaSq rho 1.63 0.30 5.11 26.09 0.29 > print( round( coef( summary( res ) ), 2 ) ) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.98 0.11 9.05 0.00 x1 0.97 0.15 6.48 0.00 x2 -1.29 0.12 -10.63 0.00 (Intercept) 10.24 0.67 15.33 0.00 x1 2.66 0.59 4.49 0.00 logSigma 1.63 0.07 21.82 0.00 atanhRho 0.30 0.36 0.82 0.41 sigma 5.11 0.38 13.38 0.00 sigmaSq 26.09 3.90 6.69 0.00 rho 0.29 0.33 0.87 0.38 > print( res$start ) (Intercept) x1 x2 (Intercept) x1 logSigma 1.00 1.00 -1.00 10.00 4.00 0.80 atanhRho 0.42 > print( summary( res ) ) -------------------------------------------- Tobit 2 model with interval outcome (sample selection model) Maximum Likelihood estimation BHHH maximisation, 13 iterations Return code 8: successive function values within relative tolerance limit (reltol) Log-Likelihood: -275 300 observations (91 censored and 209 observed) Intervals of the dependent variable of the outcome equation: YO lower upper count 1 (-Inf,5] -Inf 5 26 2 (5,15] 5 15 130 3 (15, Inf] 15 Inf 53 7 free parameters (df = 293) Probit selection equation: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.982 0.109 9.05 < 2e-16 *** x1 0.967 0.149 6.48 3.8e-10 *** x2 -1.286 0.121 -10.63 < 2e-16 *** Outcome equation: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.241 0.668 15.33 <2e-16 *** x1 2.659 0.592 4.49 1e-05 *** Error terms: Estimate Std. Error t value Pr(>|t|) logSigma 1.6307 0.0747 21.82 < 2e-16 *** atanhRho 0.2982 0.3619 0.82 0.41 sigma 5.1077 0.3817 13.38 < 2e-16 *** sigmaSq 26.0884 3.8992 6.69 1.1e-10 *** rho 0.2897 0.3315 0.87 0.38 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 -------------------------------------------- > > > # tests with automatically generated starting values (ML estimation) > resMl <- selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = bound, + start = "ml", printLevel = 1 ) Tobit 2 model 0 invalid observations YO lower upper count 1 (-Inf,5] -Inf 5 26 2 (5,15] 5 15 130 3 (15, Inf] 15 Inf 53 YO observed: 209 times; not observed: 91 times: Number of intervals: 3 YO YS 1 2 3 0 0 0 0 91 1 26 130 53 0 Boundaries: [1] -Inf 5 15 Inf Initial values: (Intercept) x1 x2 (Intercept) x1 logSigma 0.98 0.97 -1.29 10.40 3.74 4.28 atanhRho 0.24 -------------- successive function values within relative tolerance limit (reltol) 21 iterations estimate: 0.98 0.97 -1.3 10 2.7 1.6 0.3 Function value: -275 > print( resMl ) Call: selection(selection = yS ~ x1 + x2, outcome = yO ~ x1, data = dat, start = "ml", boundaries = bound, printLevel = 1) Coefficients: S:(Intercept) S:x1 S:x2 O:(Intercept) O:x1 0.982 0.967 -1.286 10.240 2.660 logSigma atanhRho sigma sigmaSq rho 1.631 0.299 5.108 26.089 0.290 > print( round( coef( resMl ), 2 ) ) S:(Intercept) S:x1 S:x2 O:(Intercept) O:x1 0.98 0.97 -1.29 10.24 2.66 logSigma atanhRho sigma sigmaSq rho 1.63 0.30 5.11 26.09 0.29 > print( round( coef( summary( resMl ) ), 2 ) ) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.98 0.11 9.05 0.00 x1 0.97 0.15 6.48 0.00 x2 -1.29 0.12 -10.64 0.00 (Intercept) 10.24 0.67 15.33 0.00 x1 2.66 0.59 4.49 0.00 logSigma 1.63 0.07 21.82 0.00 atanhRho 0.30 0.36 0.83 0.41 sigma 5.11 0.38 13.38 0.00 sigmaSq 26.09 3.90 6.69 0.00 rho 0.29 0.33 0.88 0.38 > print( resMl$start ) (Intercept) x1 x2 (Intercept) x1 logSigma 0.98 0.97 -1.29 10.40 3.74 4.28 atanhRho 0.24 > print( summary( resMl ) ) -------------------------------------------- Tobit 2 model with interval outcome (sample selection model) Maximum Likelihood estimation BHHH maximisation, 21 iterations Return code 8: successive function values within relative tolerance limit (reltol) Log-Likelihood: -275 300 observations (91 censored and 209 observed) Intervals of the dependent variable of the outcome equation: YO lower upper count 1 (-Inf,5] -Inf 5 26 2 (5,15] 5 15 130 3 (15, Inf] 15 Inf 53 7 free parameters (df = 293) Probit selection equation: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.982 0.109 9.05 < 2e-16 *** x1 0.967 0.149 6.48 3.8e-10 *** x2 -1.286 0.121 -10.64 < 2e-16 *** Outcome equation: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.240 0.668 15.33 <2e-16 *** x1 2.660 0.592 4.49 1e-05 *** Error terms: Estimate Std. Error t value Pr(>|t|) logSigma 1.6308 0.0747 21.82 < 2e-16 *** atanhRho 0.2988 0.3619 0.83 0.41 sigma 5.1077 0.3817 13.38 < 2e-16 *** sigmaSq 26.0890 3.8997 6.69 1.1e-10 *** rho 0.2902 0.3314 0.88 0.38 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 -------------------------------------------- > > > # tests with incorrectly specified starting values > try( selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = bound, + start = "wrong" ) ) Error in tobit2Intfit(YS, XS, YO, XO, boundaries = boundaries, start = start, : argument 'start' must be "ml", "2step", or a numeric vector > try( selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = bound, + start = rep( 1, 11 ) ) ) Error in tobit2Intfit(YS, XS, YO, XO, boundaries = boundaries, start = start, : The vector of starting values has an incorrect length. Number of parameters: 7. Length of provided vector: 11 > > # tests with incorrectly specified boundaries > try( selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = 1:6, + start = start ) ) Error in tobit2Intfit(YS, XS, YO, XO, boundaries = boundaries, start = start, : argument 'boundaries' must have (number of intervals + 1 = 4) elements but it has 6 elements > try( selection( yS ~ x1 + x2, yO ~ x1, data = dat, boundaries = 4:1, + start = start ) ) Error in tobit2Intfit(YS, XS, YO, XO, boundaries = boundaries, start = start, : the boundaries in the vector definded by argument 'boundaries' must be in ascending order > > proc.time() user system elapsed 7.70 0.29 7.98