为更加快速准确地从微震时序数据中提取微震事件,提高异常事件的捕捉效率,提出一种基于多尺度融合卷积和空洞卷积的自动编码器(multi-scale fusion convolution and dilated convolutions auto encoder,MDCAE)与融合波动率和限制窗口的动态时间扭曲(constraints dynamic time warping for fusing volatility,CDTW-Vol)方法。提出MDCAE的特征提取方法,将波形信号转变为低维特征信号,引入微震波形的波动率的概念,通过改进后的DTW算法对特征信号进行相似性度量,得到的相似性矩阵进行k-medoids聚类,得到聚类结果。应用某矿区501工作面和802工作面微震监测数据集进行实验,验证所提方法的准确性和泛化性,经实验得出所提聚类方法轮廓系数89%,兰德系数90%,相比普通的k-medoids聚类算法聚类精度上升57%,为捕捉微震系统的异常事件提供了一种新方法。
分类问题是数据挖掘、机器学习等领域的基础性问题之一,然而多数分类方法仅关注向量值样本的分类问题,而对于实际中广泛存在的集值型数据样本的分类关注较少。本文提出了一种基于Wasserstein距离的无监督聚类算法(Wk-means),利用熵正则最优传输模型度量集值型数据点之间的距离,并结合聚类的思想设计了一个可用于集值型数据的Wk-means聚类方法。为验证方法的有效性,本文首先在几个公开数据集上进行了实验,结果证实了Wk-means在多样本、多类别、多特征的集值型数据中表现优异,并且通过统计检验表明本文算法与其他算法存在显著差异。随后将本文方法实际应用于滏阳河水质数据集,结果同样表明相比传统的数据聚类算法,Wk-means能够更准确地划分水质类别,且运行效率更高。本文提出的Wk-means算法在集值型水质数据的分类任务中表现出色,能够为环境监测和管理提供有价值的决策支持。Classification is one of the basic problems in data mining, machine learning and other fields. However, most classification methods only focus on the vector-valued samples, while paying less attention to the classification of set-valued data samples that are widely existed in practice. This paper proposes an unsupervised clustering algorithm (Wk-means) based on Wasserstein distance. Combined with the idea of clustering, Wk-means can be used for set-valued samples, in which the entropy-regularized optimal transport model is used to measure the distance between set-valued samples. In order to verify the effectiveness of Wk-means, experiments are conducted firstly on several public data sets. The results confirm the excellent performance of Wk-means in set-valued data with multi-sample, multi-category, and multi-feature. Moreover, the statistical test show that Wk-means is significantly different from other algorithms. Wk-means is then applied to the Fuyang River water quality data set. The res