R语言在环境流行病学数据处理中的应用—以空气污染健康影响研究为例

    Application of R software in data analysis of environmental epidemiology: health effects of air pollution

    • 摘要:
      目的 探索R语言tidyverse程序包(package)在环境流行病学数据处理中的应用, 实现基于个人地址信息的空气污染个体暴露评估, 交流tidyverse程序包使用经验。
      方法 计算机模拟南京市2017—2019年心脑血管死亡数据, 从网络在线获取南京市2017—2019年气象、环境污染监测数据, 通过tidyverse中dplyr程序包进行数据的筛选、连接、汇总等操作; 使用tidyr程序包进行数据的变形和转换; 使用purrr实现遍历循环; 使用经纬度计算最近监测站点暴露和反距离插值暴露。
      结果 使用rvest程序包的爬虫技术批量获取气象、环境污染物监测数据等数据; 使用tidy、purrr程序包进行数据清洗; 使用geosphere程序包处理空间数据, 通过计算最近站点和反距离插值的方式评估个体暴露。
      结论 R语言tidyverse相较于基础包拥有一致的语法、高效的数据处理能力、易于掌握等优点; 在环境流行病学研究中使用tidyverse进行数据清洗、汇总统计、暴露计算等数据处理能有效地提高效率; 本研究提供了采用R语言tidyverse程序包进行反距离加权计算等数据处理的计算机代码, 实现了对个体逐日空气污染物暴露的评估方法, 为进行空气污染物暴露评估提供了有效的工具。

       

      Abstract:
      Objective To implement individual exposure assessment of air pollution based on personal address information using the R language tidyverse package and exchange experience in the use of the method.
      Methods The data of cardiovascular and cerebrovascular mortality in Nanjing from 2017 to 2019 were simulated with computer, and the meteorological and environmental pollutant monitoring data in the same period were obtained online from the network. The data then were filtered, connected, and summarized through dplyr package in the R language tidyverse package, and then deformed and converted by the tidyr package, and achieved traversal loops by the purrr package. The nearest environmental monitoring sites exposure and inverse distance weighted exposure were calculated by latitude and longitude method.
      Results Using the crawler technology of the rvest package meteorological data, environmental pollutant monitoring data and others were obtained, and using tidy and purrr packages for data cleaning, using geosphere packages to process spatial data, to assess the individual exposure by calculating the nearest site and inverse distance interpolation.
      Conclusion Compared with the base package, the R language tidyverse has the advantages of consistent syntax, efficient data processing ability, and being easy to master. It could be improved effectively by using tidyverse for data cleaning, summary statistics, exposure calculation and other data processing in environmental epidemiological studies. This study provided the code for data processing by using the R language tidyverse package for inverse distance weighting calculation, and realized a method to evaluate individual daily air pollutants exposure, which provided an effective tool for conducting air pollutants exposure assessment.

       

    /

    返回文章
    返回