VSOLassoBag:a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research
摘要Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpret-ability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.comNSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable bio-markers from high-dimensional omics data.For user's convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations.
更多相关知识
- 浏览9
- 被引6
- 下载0

相似文献
- 中文期刊
- 外文期刊
- 学位论文
- 会议论文


换一批



