机器学习在橄榄石产地溯源中的应用

仲源, 沈锡田, 张志清, 叶敏, 韩禹

仲源, 沈锡田, 张志清, 叶敏, 韩禹. 机器学习在橄榄石产地溯源中的应用[J]. 宝石和宝石学杂志(中英文), 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
引用本文: 仲源, 沈锡田, 张志清, 叶敏, 韩禹. 机器学习在橄榄石产地溯源中的应用[J]. 宝石和宝石学杂志(中英文), 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
ZHONG Yuan, Andy Hsitien Shen, ZHANG Zhiqing, YE Min, HAN Yu. Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot[J]. Journal of Gems & Gemmology, 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006
Citation: ZHONG Yuan, Andy Hsitien Shen, ZHANG Zhiqing, YE Min, HAN Yu. Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot[J]. Journal of Gems & Gemmology, 2023, 25(6): 65-75. DOI: 10.15964/j.cnki.027jgg.2023.06.006

机器学习在橄榄石产地溯源中的应用

详细信息
    作者简介:

    仲源(1996-), 男, 硕士, 主要从事宝石产地溯源方法方面的研究。E-mail: zhoungy1024@qq.com

    通讯作者:

    沈锡田(1962-), 男, 教授, 主要从事宝石矿床和宝石物理化学性质方面的研究工作。E-mail: ahshen@foxmail.com

  • 中图分类号: P595;TP181; TS93

Application of Machine Learning Algorithms in the Geographical Origin Determination of Peridot

  • 摘要:

    宝玉石产地溯源中常采用的元素投图法存在一定局限性,如元素选择的主观性、对原始样品的依赖以及二维投图中多产地的分布重叠。机器学习算法在医疗诊断、农作物溯源等分类场景已有广泛应用,其中线性判别分析算法在宝玉石产地判别中已有不少研究,但其他算法提及得相对较少。本研究以三个产地(河北大麻坪、吉林意气松、朝鲜长渊郡)的橄榄石样品为例,基于激光剥蚀电感耦合等离子体质谱(LA-ICP-MS)检测数据,使用Python语言进行数据处理和建模,分析了元素的选择对于线性判别效果的影响。结果表明选择相关性小且产地分布差异大的元素可以提高模型准确率,选择10种元素(Mn、Zn、Na、Al、Sc、V、Cr、P、Ti、REE)建立的线性判别模型的交叉检验准确率为0.889,优于采用检出限以上的所有元素建模。在10种成分基础上,对比了不同机器学习算法(线性判别分析、支持向量机、决策树、随机森林和反向传播神经网络)的判别效果,发现非线性算法的准确率普遍较高,其中支持向量机综合效果较好。

    Abstract:

    The commonly used elemental mapping method in gemstone origin tracing exhibits inherent limitations, such as subjectivity in element selection, reliance on original samples, and overlapping distribution of multiple origins in two-dimensional mapping. Machine learning (ML) has been widely applied in classification scenarios, including medical diagnosis and crop traceability. While linear discriminant analysis (LDA) has been extensively studied for gemstone origin determination, other ML algorithms have received less attention. In this study, peridot samples from three origins (Damaping, Hebei; Yiqisong, Jilin; Changwon District, Democratic People's Republic of Korea) were analyzed using LA-ICP-MS and modeled with Python. The influence of element selection on LDA effectiveness was analyzed. Results showed that selecting elements with low correlation and significant origin distribution differences improved model accuracy. A linear discriminant model using 10 elements (Mn, Zn, Na, Al, Sc, V, Cr, P, Ti, REE) achieved 0.889 cross-validation accuracy, outperforming models with all detectable elements. Comparing different ML algorithms (LDA, SVM, Decision tree, Random forest, Back propagation neural network) based on these 10 elements, non-linear algorithms, especially SVM, showed better performance.

  • 图  1  危地马拉绿色翡翠(“玛雅绿”)代表性样品
    Figure  1.  Representative Feicui samples from Guatemala ("Maya green")
    图  2  代表性翡翠样品的红外反射光谱
    Figure  2.  Infrared reflection spectra of representative green Feicui samples
  • 图  9   六种机器学习模型在橄榄石训练集和测试集上的准确率

    注:BPNN每次训练得到的结果存在一定波动,因此迭代50次分别得到训练集准确率均值和测试集准确率均值,其中训练集准确率的1σ= 0.01,测试集准确率的1σ=0.015

    Figure  9.   Accuracy of six machine learning models on the peridot training set and testing set

    图  1   河北大麻坪、吉林意气松和朝鲜长渊郡橄榄石样品的产地位置

    注:基于国家地理信息公共服务平台-天地图制作,审图号为GS(2023)336号,底图无修改

    Figure  1.   Map of the origins of the peridot samples from Damaping, Hebei Province and Yiqisong, Jilin Province, China and Changyon District, DPRK

    图  2   河北大麻坪双面抛光后的部分橄榄石样品

    Figure  2.   Some double-sided polished peridot samples from Damaping, Hebei Province

    图  3   三个产地的橄榄石刻面成品(左三:河北大麻坪;中三:吉林意气松;右三:朝鲜长渊郡)

    Figure  3.   Faceted samples from the three origins (left three samples: Damaping, Hebei Province; middle three samples: Yiqisong, Jilin Province; right three samples: Changyon District, DPRK)

    图  4   三个产地橄榄石样品在16种成分上的高斯核密度估计

    注:使用Seaborn库的kdeplot进行绘制,采用Scott等[16]的核密度估计方法;P和Ca含量低于检出限的少量样品按其检出限/10替换原始数据

    Figure  4.   Gaussian kernel density estimation on 16 elements of peridot samples from the three origins

    图  5   不同数量的成分组合后建立LDA模型得到的准确率

    Figure  5.   Accuracy of LDA model obtained by combining different elements

    图  6   不同数量成分组合中的平均改进度

    注:横轴n表示组合中的成分数量,纵轴Δxn表示成分x加入n-1种成分中构成n种成分组合时,对模型准确率的影响,正值表示所有组合的平均准确率提升,负值表示平均准确率下降,称Δxn为成分xn种成分组合的平均改进度

    Figure  6.   Average improvement rates for each element in different combinations of element quantities

    图  7   各成分平均改进度总和

    Figure  7.   Sum of average improvement rates for individual element

    图  8   二维空间中的二分类问题样本分布的理想边界和LDA边界

    A1:大致左右对称分布的数据,使用scikit-learn的datasets模块的make_classification随机生成;B1:两组相互嵌套的月牙形数据,使用make_moons随机生成;C1:两组呈同心圆分布的数据,使用make_circles随机生成;A2/B2/C2:理想的分类边界;A3/B3/C3:三种数据通过LDA划分的分类边界,准确率ACC表示被划入正确类别的样品占总样品的比值: 正确划分到真实类别的点为“○”形,错误划分的点为“×”形

    Figure  8.   Ideal boundary and LDA boundary for a binary classification problem in 2D space

    表  1   三个产地橄榄石样品的常规宝石学特征

    Table  1   Conventional characteristics of peridot samples from the three origins

    河北大麻坪橄榄石 吉林意气松橄榄石[10] 朝鲜橄榄石[10]
    样品数 62粒 100粒 100粒
    颜色 黄绿色 黄绿色 褐绿色
    透明度 透明 透明 透明
    多色性 弱,浅黄绿-黄绿色 弱,浅黄绿-黄绿色 弱,浅褐绿色-褐绿色
    折射率 1.650~1.690 1.654~1.695 1.654~1.694
    双折射率 0.034~0.039 0.035~0.038 0.036~0.038
    相对密度 3.26~3.38 3.33~3.36 3.33~3.38
    内含物特征 “睡莲叶状”包裹体;部分愈合裂隙;铬铁矿和透辉石包裹体;棕黑色浸染 “睡莲叶状”包裹体;部分愈合裂隙;铬铁矿、透辉石、顽火辉石和利蛇纹石(仅见于一个样品)包裹体;棕色浸染 “睡莲叶状”包裹体;部分愈合裂隙和烟雾状面纱状包裹体;铬铁矿和透辉石包裹体;棕色浸染
    下载: 导出CSV

    表  2   六种机器学习算法调用的接口和参数设置

    Table  2   Interfaces and parameter configurations for six machine learning algorithms

    机器学习算法 调用的scikit-learn接口 实例化时的参数设置
    LDA:线性判别分析 Linear Discriminant Analysis 默认
    SVC-RBF:基于高斯核函数的支持向量机 SVC kernel=’rbf’
    SVC-Laplc:基于拉普拉斯核的支持向量机 SVC, laplacian_kernel kernel='precomputed'
    DTC:决策树 Decision Tree Classifier max_depth=3
    min_samples_leaf=1
    min_samples_split=13
    RFC:随机森林分类算法 Random Forest Classifier n_estimators=66
    max_depth=8
    max_features=4
    min_samples_split=4
    BPNN:反向传播神经网络 MLP Classifier hidden_layer_sizes=(20, 20)
    activation=’relu’
    alpha=0.5
    max_iter=1 000
    下载: 导出CSV

    表  3   LA-ICP-MS测得的三个产地的橄榄石样品的主量成分

    Table  3   Main components of peridot samples from the three origins analyzed by LA-ICP-MS wB/%

    河北大麻坪 吉林意气松[10] 朝鲜[10]
    MgO 47.83~51.27 (49.64) 48.86~51.02 (49.97) 46.06~50.79 (49.33)
    FeO 7.83~10.64 (8.72) 7.97~9.92 (8.60) 8.05~12.54 (9.33)
    SiO2 39.24~42.57 (40.73) 39.59~41.46 (40.56) 39.20~41.38 (40.40)
    Fo 88.90~91.90 (91.00) 89.80~91.80 (91.20) 86.80~91.80 (90.40)
    下载: 导出CSV

    表  4   LA-ICP-MS测得的三个产地的橄榄石样品的微量元素

    Table  4   Trace elements of peridot samples from the three origins analyzed by LA-ICP-MS /10-6

    河北大麻坪 吉林意气松[10] 朝鲜[10]
    Li 1.08~3.41 (1.62) 0.93~2.20 (1.41) 1.03~3.87 (1.82)
    Be* bdl~0.75 (0.16) bdl~0.57 (0.05) bdl~0.27 (0.03)
    Na 12.50~99.60 (36.50) 4.64~77.80 (30.40) 16.10~125.90 (71.60)
    Al 20.50~198.01 (85.20) 35.02~129.10 (54.20) 55.20~261.03 (118.02)
    P* 67.20~221.30 (130.31) bdl~218.05(94.40) bdl~273.09(105.10)
    K* bdl~7.48 (1.33) bdl~50.30 (4.41) bdl~32.90 (5.02)
    Ca* bdl~852.10 (413.30) bdl~774.32 (340.20) bdl~1097.20 (489.11)
    Sc 2.04~4.94 (3.06) 2.25~5.84 (4.00) 2.99~6.73 (4.33)
    Ti* 1.66~41.30 (13.30) bdl~27.70 (7.28) 2.31~42.10 (15.90)
    V 1.07~5.33 (2.93) 1.43~4.15 (2.50) 1.34~5.91 (3.31)
    Cr 34.90~259.04 (111.20) 36.80~200.00 (92.40) 28.70~207.20 (126.15)
    Mn 932.30~1149.20 (1023.10) 940.30~1109.20 (1004.10) 950.10~1500.20 (1101.30)
    Co 132.20~146.10 (138.30) 127.20~144.10 (136.20) 127.10~148.10 (137.40)
    Ni 2494.30~3319.20 (3004.90) 2706.20~3313.30 (2979.10) 2254.20~3129.39 (2855.10)
    Cu* 0.38~2.86 (1.32) bdl~6.68 (1.51) bdl~23.80 (2.37)
    Zn 37.00~64.60 (48.60) 36.30~54.70 (43.80) 42.40~111.10 (54.50)
    Ga* bdl~0.18 (0.05) bdl~0.31 (0.06) bdl~0.45 (0.11)
    Rb* bdl~0.12 (0.02) bdl~0.38 (0.06) bdl~0.27 (0.05)
    Sr* bdl~0.04 (0.01) bdl~0.13 (0.02) bdl~0.59 (0.03)
    Y* bdl~0.07 (0.02) bdl~0.08 (0.02) bdl~0.11 (0.04)
    Zr* bdl~0.22 (0.04) bdl~0.16 (0.03) bdl~1.27 (0.08)
    Nb* bdl~0.05 (0.01) bdl~0.05 (0.01) bdl~0.56 (0.01)
    Ag* bdl~0.06 (0.01) bdl~0.15 (0.02) bdl~0.07 (0.02)
    Cd* bdl~0.26 (0.03) bdl~0.55 (0.09) bdl~0.34 (0.04)
    Sn* bdl~2.52 (1.56) bdl~6.28 (2.07) bdl~8.34 (2.71)
    Sb* bdl~8.05 (0.38) bdl~1.31 (0.17) bdl~0.26 (0.04)
    Cs* bdl~0.05 (0.01) bdl~0.15 (0.02) bdl~0.08 (0.02)
    REE* 0.01~0.25 (0.04) 0.01~0.85 (0.16) 0.01~0.62 (0.12)
    其他元素基本低于检出限,不列出;表 3表 4括号内的值表示所有样品均值,低于检出限的样品按软件给出的原始数据作为其值;REE表示稀土元素La~Lu的总量,低于检出限的取检出限的1/10作为其值;bdl表示低于检出限;* 表示存在样品低于该元素检出限
    下载: 导出CSV
  • [1]

    Schweiger R. Diagnostic features and heat treatment of Kashmir sapphires[J]. Gems & Gemology, 1990, 26(4): 267-280.

    [2] 张雨阳, 陈美华, 叶爽, 等. 三维荧光光谱在蓝宝石成因及产地指示作用中的研究——以斯里兰卡和老挝蓝宝石为例[J]. 光谱学与光谱分析, 2022, 42(5): 1 508-1 513.

    Zhang Y Y, Chen M H, Ye S, et al. Research of geographical origin of sapphire based on three-dimensional fluorescence spectroscopy: A case study in Sri Lanka and Laos sapphires[J]. Spectroscopy and Spectral Analysis, 2022, 42(5): 1 508-1 513. (in Chinese)

    [3]

    Abduriyim A. Geographic origin determination of colored gemstones[J]. Gems & Gemology, 2011, 47(2): 114-116.

    [4]

    Abduriyim A, Kitawaki H. Applications of laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) to gemology[J]. Gems & Gemology, 2006, 42(2): 98-118.

    [5] 向芳, 王成善, 蒋镇东, 等. 成都金沙玉器的稀土元素特征及材质来源[J]. 地球科学与环境学报, 2008, 30(1): 54-56.

    Xiang F, Wang C S, Jiang Z D, et al. Rare-earth element characters of jadewares of Jinsha site in Chengdu and its significance for indicating material source[J]. Journal of Earth Sciences and Environment, 2008, 30(1): 54-56. (in Chinese)

    [6]

    Aggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: A systematic review and meta-analysis[J]. NPJ Digital Medicine, 2021, 4(1): 1-23. doi: 10.1038/s41746-020-00373-5

    [7]

    Kabir M H, Guindo M L, Chen R, et al. Geographic origin discrimination of millet using Vis-NIR spectroscopy combined with machine learning techniques[J]. Foods, 2021, 10(11): 2 767-2 778. doi: 10.3390/foods10112767

    [8]

    Shen A H, Blodgett T E, Shigley J. Country-of-origin determination of modern gem peridots from LA-ICP-MS trace-element chemistry and linear discriminant analysis (LDA)[C]//Geological Society of America Abstracts. Denver: Geological Society of America, 2013: 525.

    [9]

    Giuliani G, Caumon G, Rakotosamizanany S, et al. Classification chimique descorindons par analyse factorielle discriminante: Application à La typologie des gisements de rubis et saphirs[J]. Revue De Gemmologie, 2014(188): 14-22.

    [10]

    Zhang Z, Ye M, Shen A H. Characterisation of peridot from China's Jilin Province and from North Korea[J]. The Journal of Gemmology, 2019, 36(5): 436-446. doi: 10.15506/JoG.2019.36.5.436

    [11]

    Kochelek K A, Mcmillan N J, Mcmanus C E, et al. Provenance determination of sapphires and rubies using laser-induced breakdown spectroscopy and multivariate analysis[J]. American Mineralogist, 2015, 100(8): 1 921-1 931.

    [12]

    Burges C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.

    [13]

    Maimon O Z, Rokach L. Data mining with decision trees: Theory and applications[M]. Singapore: World Scientific, 2014.

    [14] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.

    Zhou Z H. Machine learning[M]. Beijing: Tsinghua University Publishing House, 2016. (in Chinese)

    [15]

    Schmidhuber J. Deep learning in neural Networks: An overview[J]. Neural Networks, 2015, 61(1): 85-117.

    [16]

    Scott D W, Tapia R A, Thompson J R. Kernel density estimation revisited[J]. Nonlinear Analysis: Theory, Methods & Applications, 1977, 1(4): 339-372.

    [17]

    De Hoog J C M, Gall L, Cornell D H. Trace-element geochemistry of mantle olivine and application to mantle petrogenesis and geothermobarometry[J]. Chemical Geology, 2010, 270(1): 196-215.

图(9)  /  表(4)
计量
  • 文章访问数:  196
  • HTML全文浏览量:  101
  • PDF下载量:  115
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-10-29
  • 刊出日期:  2023-11-29

目录

    /

    返回文章
    返回