仲源, 沈锡田, 张志清, 叶敏, 韩禹. 机器学习在橄榄石产地溯源中的应用[J]. 宝石和宝石学杂志(中英文), 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
引用本文: 仲源, 沈锡田, 张志清, 叶敏, 韩禹. 机器学习在橄榄石产地溯源中的应用[J]. 宝石和宝石学杂志(中英文), 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
ZHONG Yuan, Andy Hsitien Shen, ZHANG Zhiqing, YE Min, HAN Yu. Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot[J]. Journal of Gems & Gemmology, 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
Citation: ZHONG Yuan, Andy Hsitien Shen, ZHANG Zhiqing, YE Min, HAN Yu. Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot[J]. Journal of Gems & Gemmology, 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006

机器学习在橄榄石产地溯源中的应用

Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot

  • 摘要: 宝玉石产地溯源中常采用的元素投图法存在一定局限性,如元素选择的主观性、对原始样品的依赖以及二维投图中多产地的分布重叠。机器学习算法在医疗诊断、农作物溯源等分类场景已有广泛应用,其中线性判别分析算法在宝玉石产地判别中已有不少研究,但其他算法提及得相对较少。本研究以三个产地(河北大麻坪、吉林意气松、朝鲜长渊郡)的橄榄石样品为例,基于激光剥蚀电感耦合等离子体质谱(LA-ICP-MS)检测数据,使用Python语言进行数据处理和建模,分析了元素的选择对于线性判别效果的影响。结果表明选择相关性小且产地分布差异大的元素可以提高模型准确率,选择10种元素(Mn、Zn、Na、Al、Sc、V、Cr、P、Ti、REE)建立的线性判别模型的交叉检验准确率为0.889,优于采用检出限以上的所有元素建模。在10种成分基础上,对比了不同机器学习算法(线性判别分析、支持向量机、决策树、随机森林和反向传播神经网络)的判别效果,发现非线性算法的准确率普遍较高,其中支持向量机综合效果较好。

     

    Abstract: The commonly used elemental mapping method in gemstone origin tracing exhibits inherent limitations, such as subjectivity in element selection, reliance on original samples, and overlapping distribution of multiple origins in two-dimensional mapping. Machine learning (ML) has been widely applied in classification scenarios, including medical diagnosis and crop traceability. While linear discriminant analysis (LDA) has been extensively studied for gemstone origin determination, other ML algorithms have received less attention. In this study, peridot samples from three origins (Damaping, Hebei; Yiqisong, Jilin; Changwon District, Democratic People's Republic of Korea) were analyzed using LA-ICP-MS and modeled with Python. The influence of element selection on LDA effectiveness was analyzed. Results showed that selecting elements with low correlation and significant origin distribution differences improved model accuracy. A linear discriminant model using 10 elements (Mn, Zn, Na, Al, Sc, V, Cr, P, Ti, REE) achieved 0.889 cross-validation accuracy, outperforming models with all detectable elements. Comparing different ML algorithms (LDA, SVM, Decision tree, Random forest, Back propagation neural network) based on these 10 elements, non-linear algorithms, especially SVM, showed better performance.

     

/

返回文章
返回