SARIMA、GAM和LSTM在肾综合征出血热预测中的应用效果比较
Comparison of the application effects of SARIMA, GAM and LSTM in prediction of hemorrhagic fever with renal syndrome
摘要目的:分析季节性差分自回归移动平均模型(SARIMA)、广义相加模型(GAM)和长短期记忆神经网络模型(LSTM)在肾综合征出血热(HFRS)发病率拟合及预测中的效果,为优化HFRS预测模型提供参考。方法:在公共卫生科学数据中心(https://www.phsciencedata.cn/)收集全国和HFRS发病率居前9位的省份(黑龙江、陕西、吉林、辽宁、山东、河北、江西、浙江和湖南省)2004-2017年HFRS逐月发病率资料,其中,2004-2016年资料作为训练数据,2017年1-12月资料作为测试数据。利用训练数据拟合全国和9个省份HFRS发病率的SARIMA、GAM、LSTM;利用拟合模型预测2017年1-12月HFRS发病率,并与测试数据比较。采用平均绝对百分比误差( MAPE)评价模型拟合及预测精度, MAPE < 20%时模型拟合或预测效果为好,20%~50%为可接受,> 50%为差。 结果:从总体拟合及预测效果来看,全国和黑龙江、陕西、吉林、辽宁、江西省的最优模型为SARIMA( MAPE分别为19.68%、20.48%、44.25%、19.59%、23.82%、35.29%),其中,全国和吉林省模型拟合及预测效果为好,其余均为可接受;山东、浙江省的最优模型为GAM( MAPE分别为18.29%、21.25%),其中,山东省模型拟合及预测效果为好,浙江省为可接受;河北、湖南省的最优模型为LSTM( MAPE分别为26.52%、22.69%),模型拟合及预测效果均为可接受。从拟合效果来看,GAM在全国数据中拟合精度最高, MAPE =10.44%。从预测效果来看,LSTM在全国数据中预测精度最高, MAPE = 12.23%。 结论:SARIMA、GAM、LSTM均能作为拟合HFRS发病率的最优模型,但不同地区拟合的最优模型表现出较大差异。今后在建立HFRS预测模型时应尽可能多地纳入备选模型进行筛选,以保证较高的拟合及预测精度。
更多相关知识
abstractsObjective:To analyze the effects of seasonal autoregressive integrated moving average model (SARIMA), generalized additive model (GAM), and long-short term memory model (LSTM) in fitting and predicting the incidence of hemorrhagic fever with renal syndrome (HFRS), so as to provide references for optimizing the HFRS prediction model.Methods:The monthly incidence data of HFRS from 2004 to 2017 of the whole country and the top 9 provinces with the highest incidence of HFRS (Heilongjiang, Shaanxi, Jilin, Liaoning, Shandong, Hebei, Jiangxi, Zhejiang and Hunan) were collected in the Public Health Science Data Center (https://www.phsciencedata.cn/), of which the data from 2004 to 2016 were used as training data, and the data from January to December 2017 were used as test data. The SARIMA, GAM, and LSTM of HFRS incidence in the whole country and 9 provinces were fitted with the training data; the fitted model was used to predict the incidence of HFRS from January to December 2017, and compared with the test data. The mean absolute percentage error ( MAPE) was used to evaluate the model fitting and prediction accuracy. When MAPE < 20%, the model fitting or prediction effect was good, 20%-50% was acceptable, and > 50% was poor. Results:From the perspective of overall fitting and prediction effect, the optimal model for the whole country and Heilongjiang, Shaanxi, Jilin, Liaoning and Jiangxi was SARIMA ( MAPE was 19.68%, 20.48%, 44.25%, 19.59%, 23.82% and 35.29%, respectively), among which the fitting and prediction effects of the whole country and Jilin were good, and the rest were acceptable. The optimal model for Shandong and Zhejiang was GAM ( MAPE was 18.29% and 21.25%, respectively), the fitting and prediction effect of Shandong was good, and Zhejiang was acceptable. The optimal model for Hebei and Hunan was LSTM ( MAPE was 26.52% and 22.69%, respectively), and the fitting and prediction effects were acceptable. From the perspective of fitting effect, GAM had the highest fitting accuracy in the whole country data, with MAPE = 10.44%. From the perspective of prediction effect, LSTM had the highest prediction accuracy in the whole country data, with MAPE = 12.23%. Conclusions:SARIMA, GAM, and LSTM can all be used as the optimal models for fitting the incidence of HFRS, but the optimal models fitted in different regions show great differences. In the future, in the establishment of HFRS prediction models, as many alternative models as possible should be included for screening to ensure higher fitting and prediction accuracy.
More相关知识
- 浏览79
- 被引0
- 下载0

相似文献
- 中文期刊
- 外文期刊
- 学位论文
- 会议论文