论文

基于LightGBM算法的强对流天气分类识别研究

  • 刘新伟 ,
  • 黄武斌 ,
  • 蒋盈沙 ,
  • 郭润霞 ,
  • 黄玉霞 ,
  • 宋强 ,
  • 杨勇
展开
  • <sup>1.</sup>兰州中心气象台,甘肃 兰州 730020;<sup>2.</sup>中国科学院西北生态环境资源研究院 寒旱区陆面过程与气候变化重点实验室,甘肃 兰州 730000

收稿日期: 2020-05-18

  网络出版日期: 2021-08-28

基金资助

中国气象局预报员专项(CMAYBY2020-134);甘肃省气象局面上项目(Ms2020-06);甘肃省气象局创新团队项目(GSQXCXTD-2020-01);甘肃省重点研发计划项目(20YF3FA012)

Study of the Classified Identification of the Strong Convective Weathers Based on the LightGBM Algorithm

  • Xinwei LIU ,
  • Wubin HUANG ,
  • Yingsha JIANG ,
  • Runxia GUO ,
  • Yuxia HUANG ,
  • Qiang SONG ,
  • Yong YANG
Expand
  • <sup>1.</sup>Lanzhou Central Meteorological Observatory,Lanzhou 730020,Gansu,China;<sup>2.</sup>Key Laboratory of Land Surface Process and Climate Change in Cold and Arid Regions,Northwest Institute of Eco-Environment and Resources,Chinese Academy of Sciences,Lanzhou 730000,Gansu,China

Received date: 2020-05-18

  Online published: 2021-08-28

摘要

强对流天气将导致多种灾害性天气, 但由于其突发性强且尺度较小, 在气象业务工作中仍难以准确地预警和预报。本文基于LightGBM (Light Gradient Boosting Machine)算法, 利用甘肃三个地区的C波段雷达回波产品以及地面观测数据, 构建了LightGBM模型, 并分类判识了三类主要的强对流天气[冰雹、 雷暴大风、 短时强降水(短强)]。结果表明, 在2011 -2017年训练集中, LightGBM模型表现较好, 整体误判率仅为4.9%。在2018年的独立样本测试中, 模型对三类强对流和非强对流天气的整体误判率为7.0%, 对三类强对流天气的平均命中率(Probability of Detection, POD)为86.4%, 平均临界成功指数(Critical Success Index, CSI)为64.3%, 平均空报比率(False Alarm Ratio, FAR)为29.0%。其中, 短强的误判率最低, POD和CSI最高, FAR也最小, 而雷暴大风和冰雹的误判率和评分比较接近。因此, 本文构建的LightGBM模型对强对流天气的分类识别较为理想, 首次对三类主要的强对流天气实现了自动化预警, 在未来的气象业务自动化工作中有广阔的应用前景。

本文引用格式

刘新伟 , 黄武斌 , 蒋盈沙 , 郭润霞 , 黄玉霞 , 宋强 , 杨勇 . 基于LightGBM算法的强对流天气分类识别研究[J]. 高原气象, 2021 , 40(4) : 909 -918 . DOI: 10.7522/j.issn.1000-0534.2020.00075

Abstract

Strong convective weather can cause serious disasters, but it is hard to be pre-warned and forecasted because of its abrupt occurrence and small scale.This study generates a model based on the LightGBM (Light Gradient Boosting Machine) algorithm using C-band radar products and in-situ observations, and identifies and classifies three main kinds of strong convective weather (hail, strong wind and short-time strong precipitation).Evaluation results shows that, for the training set from 2011 to 2017, the LightGBM model has good performances with overall false identification rate of 4.9%.Among the three main kinds of strong convective weather, the short-time strong precipitation has the lowest false identification rate (6.2%), while the hail has the highest false identification rate around 14.4%.The false identification rate for the non-convective weather is only 3.6%.Furthermore, the LightGBM has a mean probability of detection rate (POD) for three kinds of strong convective weather in the training set of 88.8%, mean critical success index (CSI) of 73.9% and mean false alarm ratio (FAR) of 18.8%.The short-time strong precipitation has the highest POD and CSI, and the lowest FAR as well.Then the LightGBM model is applied in an independent testing set in 2018.For the independent testing set, the model has an overall false identification rate of 7.0% for the three kinds of strong convective weather and the non-convective weather, and a mean POD of 86.4%, mean CSI of 64.3% and mean FAR of 29.0%.Similar to the training set, the short-time strong precipitation has the lowest false identification rate, the highest POD and CSI, and its FAR is the lowest as well, while performances for the thunderstorm and hail are similar.Therefore, the LightGBM model generated in this study is generally ideal, which early-warning three kinds of strong convective weather automatically for the first time, and is appropriate to be applied to the future auto-identify system of the meteorology operation.

参考文献

[1]Doswell C A, 2015.Severe convective storms in the European societal context[J].Atmospheric Research, 158/159: 210-215.DOI: 10.1016/j.atmosres.2014.08.007.
[2]Fox N I, Wikle C K, 2005.A Bayesian quantitative precipitation nowcast scheme[J].Weather Forecasting, 20(3): 264-275.
[3]Friedman J H, 2001.Greedy function approximation: A gradient boosting machine[J].Annals of Statistics, 29(5): 1189-1232.
[4]Mecikalski J R, Williams J K, Jewett C P, al et, 2015.Probabilistic 0-1 hour convective initiation nowcasts that combine geostationary satellite observations and numerical weather prediction model data[J].Journal of Applied Meteorology and Climatology, 54(5): 1039-1059.DOI: 10.1175/JAMC-D-14-0129.1.
[5]Seed A W, 2003.A dynamic and spatial scaling approach to advection forecasting[J].Journal of Applied Meteorology and Climatology, 42: 381-388.
[6]Yasser K, Hemayed E, 2017.Novelty Detection for Location Prediction Problems Using Boosting Trees[M].Gervasi O, Murgante B, Misra S, et al, Lecture Notes in Computer Science, 3: 173-182.
[7]曹渝昆, 朱萌, 2019.基于主成分分析和LightGBM的风电场发电功率超短期预测[J].上海电力学院学报, 35(6): 562-566.DOI: 10.3969/j.issn.1006-4729.2019.06.009.
[8]段鹤, 夏文梅, 苏晓力, 等, 2014.短时强降水特征统计及临近预警[J].气象, 40(10): 1194-1206.DOI: 10.7519/j.issn.1000-0526. 2014.10.004.
[9]高洁, 张涛, 程新洲, 等, 2019.一种基于LightGBM机器学习算法的用户年龄及性别预测方法[J].邮电设计技术 (9): 36-39.DOI: 10.12045/j.issn.1007-3043.2019.09.008.
[10]郭瀚阳, 陈明轩, 韩雷, 等, 2019.基于深度学习的强对流高分辨率临近预报试验[J].气象学报, 77(4): 715-727.DOI: 10. 11676/qxxb2019.036.
[11]郭尚瓒, 肖达, 袁行远, 2017.基于神经网络和模型集成的短时降雨预测方法[J].气象科技进展, 7(1): 107-113.DOI: 10. 3969/j.issn.2095-1973.2017.01.013.
[12]韩丰, 龙明盛, 李月安, 等, 2019.循环神经网络在雷达临近预报中的应用[J].应用气象学报, 30(1): 61-69.DOI: 10.11898/1001-7313.20190106.
[13]黄涛, 2019.基于机器学习的新闻分类系统研究与实现[D].北京: 北京邮电大学.
[14]康军, 张凡, 段宗涛, 等, 2020.基于LightGBM的乘客候车路段推荐方法[J].测控技术, 39(2): 56-62.DOI: 10.19708 /j.ckjs.2020.02.010.
[15]李丰, 刘黎平, 王红艳, 等, 2014.C波段多普勒天气雷达地物识别方法[J].应用气象学报, 25(2): 158-167.
[16]李国翠, 刘黎平, 连志鸾, 等, 2014.利用雷达回波三维拼图资料识别雷暴大风统计研究[J].气象学报, 72(1): 161-181.DOI: 10.11676/qxxb2014.003.
[17]李海峰, 2018.基于雷达回波的雷暴大风识别算法研究[D].哈尔滨: 哈尔滨工业大学.
[18]刘晓璐, 刘建西, 张世林, 等, 2014.基于探空资料因子组合分析方法的冰雹预报[J].应用气象学报, 25(2): 168-175.
[19]刘新伟, 蒋盈沙, 黄武斌, 等, 2021.基于雷达产品和随机森林算法的冰雹天气分类识别及预报[J/OL].高原气象,40(4):898-908. DOI:10. 7522/j. issn. 1000-0534. 2020. 00063.
[20]刘雨佳, 陈洪滨, 朱君鉴, 2014.山东省S波段与C波段天气雷达回波强队的对比分析[J].气象科学, 34(1): 87-95.DOI: 10. 3969 /2012jms.0176.
[21]龙柯吉, 康岚, 罗辉, 等, 2020.四川盆地雷暴大风雷达回波特征统计分析[J].气象, 46(2): 212-222.DOI: 10.7519/j.issn.1000-0526.2020.02.007.
[22]路亚奇, 曹彦超, 张峰, 等, 2016.陇东冰雹天气特征分析及预报预警[J].高原气象, 35(6): 1565-1576.DOI: 10.7522/j.issn. 1000-0534.2015.00116.
[23]路志英, 任一墨, 孙晓磊, 等, 2018.基于深度学习的短时强降水天气识别[J].天津大学学报, 51(2): 111-119.DOI: 10.11784/tdxbz201703106.
[24]吕伟, 2018.适于地域特点的强对流天气分类识别建模方法研究[D].天津: 天津大学.
[25]南东亮, 王维庆, 王海云, 2019.基于消息队列的LightGBM超参数优化[J].计算机工程与科学, 41 (8): 1360-1365.DOI: 10. 3969/j.issn.1007-130X.2019.08.004.
[26]苏佩娟, 2017.面向不平衡数据集分类的改进K-近邻法研究[D].成都: 西南交通大学.
[27]谭江红, 陈伟亮, 王珊珊, 2018.一种机器学习方法在湖北定时气温预报中的应用试验[J].气象科技进展, 8(5): 46-50.DOI: 10.3969/j.issn.2095-1973.2018.05.006.
[28]王方春, 2020.LightGBM算法在早期催收管理工作中的应用[J].电脑知识与技术, 16(7): 205-221.
[29]王红艳, 2015.新一代天气雷达组网估算降水的覆盖能力分析及方法研究[D].北京: 中国气象科学研究院.
[30]王令, 康玉霞, 焦热光, 等, 2004.北京地区强对流天气雷达回波特征[J].气象, 30(7): 31-35, 65.
[31]王萍, 潘跃, 2013.基于显著性特征的大冰雹识别模型[J].物理学报, 62(6): 515-524.DOI: 10.7498/aps.62.069202.
[32]王青霞, 唐明晖, 王强, 等, 2020.2018年湖南首场风雹天气成因分析及预警探讨[J].暴雨灾害, 39(1): 30-40.DOI: 10.3969/j.issn.1004-9045.2020.01.004.
[33]王莎, 沙勇, 宋金妹, 等, 2019.冀东地区冰雹云多普勒雷达参数特征分析[J].气象, 45(5): 713-722.DOI: 10.7519/j.issn.1000-0526.2019.05.013.
[34]王思宇, 陈建平, 2019.基于LightGBM算法的信用风险评估模型研究[J].软件导刊, 18(10): 19-22.DOI: 10.11907/rjdk. 191157.
[35]王研峰, 黄武斌, 王聚杰, 等, 2019.一次甘肃天水强冰雹的雷达回波特征及成因分析[J].高原气象, 38(2): 368-376.DOI: 10. 7522/j.issn.1000-0534.2018.00077.
[36]吴绍武, 续育茹, 2019.基于LightGBM的血压检测方法研究[J].生物医学工程研究, 38(3): 312-315.
[37]熊苏生, 2018.基于改进LightGBM的交通模式识别算法[J].计算机与现代化, 278(10): 68-73+126.DOI: 10.3969 /j.issn. 1006-2475.2018.10.014.
[38]修媛媛, 韩雷, 冯海磊, 2016.基于机器学习方法的强对流天气识别研究[J].电子设计工程, 24(9): 4-7, 11
[39]杨璐, 韩丰, 陈明轩, 等, 2018.基于支持向量机的雷暴大风识别方法[J].应用气象学报, 29(6): 680-689.DOI: 10.11898/1001-7313.20180604.
[40]叶志宇, 冯爱民, 高航, 2019.基于深度LightGBM集成学习模型的谷歌商店顾客购买力预测[J].计算机应用, 39(12): 3434-3439.DOI: 10.11772/j.issn.1001-9081.2019071305.
[41]俞小鼎, 2013.短时强降水临近预报的思路与方法[J].暴雨灾害, 32(3): 202-209.
[42]俞小鼎, 王迎春, 陈明轩, 等, 2005.新一代天气雷达与强对流天气预警[J].高原气象, 24(3): 456-464.
[43]张秉祥, 李国翠, 刘黎平, 等, 2014.基于模糊逻辑的冰雹天气雷达识别算法[J].应用气象学报, 25(4): 414-426.
[44]张丹峰, 2018.基于LightGBM, XGBoost, ERT混合模型的风机叶片结冰预测研究[D].上海: 上海师范大学.
[45]张国庆, 昌宁, 2019.基于LightGBM的银行信用卡违约研究[J].科技资讯, 17(12): 8-9.
[46]张正国, 邹光源, 刘丽君, 等, 2014.雷达回波顶高(ET)产品在广西冰雹云识别中的应用研究[J].气象研究与应用, 35(4): 89-92.
[47]张之贤, 张强, 赵庆云, 等, 2014.陇东南地区短时强降水的雷达回波特征及其降水反演[J].高原气象, 33(2): 530-538.DOI: 10.7522 /j.issn.1000-0534.2013.00001.
[48]赵海军, 潘玲, 王庆华, 等, 2018.临沂冰雹发生规律及预警技术研究[J].气象与环境科学, 41(2): 83-90.
[49]赵文, 张强, 赵建华, 2016.陇东南地区强降水过程与雷达VIL产品的定量关系研究[J].高原气象, 35(2): 528-537.DOI: 10. 7522/j.issn.1000 -0534.2015.00056.
[50]郑永光, 周康辉, 盛杰, 等, 2015.强对流天气监测预报预警技术进展[J].应用气象学报, 26(6): 641-657.DOI: 10.11898/1001-7313.20150601.
[51]周康辉, 郑永光, 王婷波, 等, 2017.基于模糊逻辑的雷暴大风和非雷暴大风区分方法[J].气象, 43(7): 781-791.DOI: 10.7519/j.issn.1000-0526.2017.07.002.
[52]周文, 王瑜, 李长胜, 等, 2019.LightGBM算法在阿尔茨海默症结构磁共振成像分类中的应用[J].中国医学物理学杂志, 36(4): 408-413.DOI: 10.3969/j.issn.1005-202X.2019.04.008.
文章导航

/