Several studies have shown that twin birth contributes substantially to infant and child mortality mainly in resource-poor countries. The excess rates among twins call for research in statistical modeling to identify the main causes behind it. In studies involving multiple individuals from the same family, the fundamental independence assumption in the classical statistical modeling is not plausible. In addition, previous studies indicated that ignoring sampling weight while dealing with a dataset collected with complex survey design can introduce serious bias. This study is then aimed to fill these methodological gaps to integrate the dependence from twin birth with an advanced statistical gamma frailty model to correctly identify the determinants of infant mortality among twins in Ethiopia. We compiled all available data from the 2016 Ethiopia Demographic and Health Survey with a total of 908 children (454 pairs of twins) with survey sampling weight incorporated in the analysis. To identify predictors and to assess the presence and significance of frailty, semiparametric univariate, bivariate shared, and correlated gamma frailty models were fitted. The likelihood ratio test was employed to test the significance of frailty term in the model. We found that sex of the child, among twins birth order, preceding birth interval, and succeeding birth interval are significantly associated with twin infant mortality. The results of this study further confirmed the significance of the shared frailty term accounting for the unobserved heterogeneity.
In this article, we propose a general framework to learn optimal treatment rules for type 2 diabetes (T2D) patients using electronic health records (EHRs). We first propose a joint modeling approach to characterize patient’s pretreatment conditions using longitudinal markers from EHRs. The estimation accounts for informative measurement times using inverse-intensity weighting methods. The predicted latent processes in the joint model are used to divide patients into a finite of subgroups and, within each group, patients share similar health profiles in EHRs. Within each patient group, we estimate optimal individualized treatment rules by extending a matched learning method to handle multicategory treatments using a one-versus-one approach. Each matched learning for two treatments is implemented by a weighted support vector machine with matched pairs of patients. We apply our method to estimate optimal treatment rules for T2D patients in a large sample of EHRs from the Ohio State University Wexner Medical Center. We demonstrate the utility of our method to select the optimal treatments from four classes of drugs and achieve a better control of glycated hemoglobin than any one-size-fits-all rules.
We consider a default Bayesian approach to multiple testing of equality of two binomial proportions. While our approach is motivated by a scenario where one proportion corresponds to an experimental condition and the other to a control, we find it is also reasonable for comparing two proportions in general. We consider a selection of priors under the alternative(s) including the intrinsic prior and a newly proposed “mode-based” Beta prior, and investigate their properties in terms of certain desirable characteristics that we specify for default priors. We also develop priors for the hyperparameters based on the conventional hyperprior used for normal means multiple testing. We also consider a computationally more efficient empirical Bayes approach using the intrinsic prior and the proposed Beta prior. We use repeated simulation and real data sets to evaluate and illustrate the approach, and compare certain frequentist characteristics of the results based on intrinsic and mode based Beta prior using full Bayes and empirical Bayes approaches. Additionally, the results from the Bayesian approach are compared with a commonly used frequentist procedure using conventional thresholds in the respective settings. Overall, we find that the proposed mode-based Beta prior is a suitable default prior for multiple testing of equality of two proportions.
We consider a kind of regime-switching autoregressive models for nonnegative integer-valued time series when the conditional distribution given historical information is Poisson distribution. In this type of models the link between the conditional variance (i.e. the conditional mean for Poisson distribution) and its past values as well as the observed values of the Poisson process may be different when an unobservable (hidden) variable, which is a Markovian Chain, takes different states. We study the stationarity and ergodicity of Markov-switching Poisson generalized autoregressive heteroscedastic (MS-PGARCH) models, and give a condition on parameters under which a MS-PGARCH process can be approximated by a geometrically ergodic process. Under this condition we discuss maximum likelihood estimation for MS-PGARCH models. Simulation studies and application to modelling financial count time series are presented to support our methodology.
The quantile regression model with measurement error is considered. To deal with measurement error, we extend the simulation-extrapolation (SIMEX) method to the case of quantile regressions in the presence of covariate measurement error. The proposed SIMEX estimation corrects the bias caused by the measurement error, and not requires the equal distribution assumption of the regression error and measurement error. The asymptotic distribution of the proposed estimator is derived. The finite sample performance of the proposed method is investigated by a simulation study. A real dataset from the Framingham Heart Study is analyzed to illustrate the proposed method.
Left-truncated and interval-censored data occur commonly and some approaches have been proposed in the literature for their analysis. However, most of the existing methods are based on the conditional likelihood given left-truncation times, which can be inefficient since the information in the marginal likelihood of the truncation times is ignored. To address this, in this paper, a pairwise pseudo-likelihood augmented estimation approach is proposed under the additive hazards model that can fully make use of all available information. The derived estimator is shown to be consistent and asymptotically normal, and simulation studies suggest that the proposed method works well and provides a substantial efficiency gain over the conditional approach. In addition, the method is applied to a set of real data arising from an AIDS cohort study.
Independence analysis is an indispensable step before regression analysis to find out the essential factors that influence the objects. With many applications in machine Learning, medical Learning and a variety of disciplines, statistical methods of measuring the relationship between random variables have been well studied in vector spaces. However, there are few methods developed to verify the relation between random elements in metric spaces. In this paper, we present a novel index called metric distributional discrepancy (MDD) to measure the dependence between a random element $X$ and a categorical variable $Y$ , which is applicable to the medical image and related variables. The metric distributional discrepancy statistics can be considered as the distance between the conditional distribution of $X$ given each class of $Y$ and the unconditional distribution of $X$. MDD enjoys some significant merits compared to other dependence-measures. For instance, MDD is zero if and only if $X$ and $Y$ are independent. MDD test is a distribution-free test since there is no assumption on the distribution of random elements. Furthermore, MDD test is robust to the data with heavy-tailed distribution and potential outliers. We demonstrate the validity of our theory and the property of the MDD test by several numerical experiments and real data analysis.
Up to date, only lower and upper bounds for the optimal configuration of a Square Array (A2) Group Testing (GT) algorithm are known. We establish exact analytical formulae and provide a couple of applications of our result. First, we compare the A2 GT scheme to several other classical GT schemes in terms of the gain per specimen attained at optimal configuration. Second, operating under objective Bayesian framework with the loss designed to attain minimum at optimal configuration, we suggest the preferred choice of the group size under natural minimal assumptions: the prior information regarding the prevalence suggests that grouping and application of A2 is better than individual testing. The same suggestion is provided for the Minimax strategy.
Modeling a continuous response of a large-scale network is an important task and it has become prevailing in practice at present. This paper proposes a novel network vector autoregressive moving average (NARMA) model which considers the responses from both an ultra-high dimension vector and the network structure effects. Compared with the network vector autoregressive (NAR, [26]) model, we take into account the lagged innovations and corresponding network effect in our proposed model. With more parameters considered and a moving average term incorporated, the proposed NARMA model can fit the data more closely and accurately, thus has a better performance than the NAR model. A modified least square estimation for the NARMA model is introduced, and the consistency properties are fully investigated. Finally, we demonstrate the superiority of the proposed NARMA model by investigating the financial contagions of S&P500 index constituents.