大数据背景下名贵宝石的产地鉴别

Origin Determination of Precious Gemstones on the Application of Big Data

  • 摘要: 在宝石贸易中,产地来源可以赋予名贵宝石显著的附加价值。名贵宝石——红宝石、蓝宝石和祖母绿等的产地溯源研究是国际宝石实验室的核心技术,其意义是名贵宝石产地差异比较、产地特征、成矿类型以及成矿和找矿规律。从20世纪50年代瑞士Gübelin宝石实验室首创彩色宝石的产地鉴别以来,名贵宝石的产地鉴别研究日益深入,其基础在于不同成矿背景祖母绿矿床产出的祖母绿具有显著的常规宝石学性质、包裹体、谱学特征和化学成分1-9。因此,目前对于宝石产地来源的鉴定通常基于经验丰富的宝石学家对以上四方面特征的综合判断。然而,随着名贵宝石产地的研究爆发式增长和不同产地来源宝石数据的全方位系统报道,传统的产地鉴别手段效力逐渐下降,特征包裹体雷同、光谱样式类似和化学成分含量范围重叠等现象屡见不鲜,使传统产地鉴别的可靠性备受挑战。该背景下,宝石的化学指纹在产地鉴别中的地位日益增加。基于矿物微区分析技术的成熟,宝石中微量元素的测试精度不断提升,高精度的成分数据对于产地来源具有强烈指示意义。传统鉴别技术对于化学成分的分析主要基于低维的二元/三元鉴别图解,但数据点的大面积重叠已无法适应日渐庞大的成分数据集。因此,能够高效挖掘高维成分数据的机器学习方法有利于解决目前名贵宝石产地鉴别所面临的困境。本研究搜集了祖母绿样品来源于12个产地共303颗(图 1),包括哥伦比亚西部成矿带(WEB)、赞比亚Kafubu、巴西Itabira、巴西Carnaíba、巴西Socotó、阿富汗Panjshir、巴基斯坦Swat、埃塞俄比亚Shakiso、俄罗斯Malysheva、尼日利亚Gwantu、马达加斯加Mananjary和中国云南大丫口。首先应用传统产地鉴别手段系统测试了各产地祖母绿样品的常规宝石学参数、包裹体特征、紫外-可见光-近红外光谱(UV-Vis-NIR)、红外光谱(IR)、拉曼光谱和主微量元素含量,并汇编构建了三大名贵宝石化学成分数据库。此外,多种机器学习方法被应用于宝石成分数据挖掘,构建了三种高效准确的宝石产地鉴别模型。对于祖母绿的系统研究表明,全球各产地的祖母绿中存在三种不同的UV-Vis-NIR光谱吸收样式(图 2)。对于祖母绿通道中的重氢水的研究也表明,在2 600~2 850 cm-1范围内的红外吸收光谱也可以将全球祖母绿分为三个组别(图 3),该创新性结论对祖母绿的产地具有强烈指示意义。本研究构建了包含来自22个国家45个矿区的祖母绿化学成分数据共2 753个,其中包括本研究测试数据425个。基于该数据库,三种成熟高效的机器学习方法(随机森林、支持向量机和极端梯度提升)被用于挖掘祖母绿的成分数据并构建产地鉴别模型。其中,随机森林(RF)和极端梯度提升(XGBoost)模型对于祖母绿数据适应性较强,性能优越。22个RF模型中,最高准确率与F1分数可达99.5%(图 4)。模型RF-EM1-1320可以100%区分11个产地(图 5),模型RF-EM12-135仅使用5个元素即可获得99.1%的准确率。同时,模型对高维元素信息的解码表明V/Cr和碱金属元素(Li、Na、Rb、Cs)是与产地强相关的元素,其特征权重在所有模型中均排名前列,指示了成矿过程中母流体的构成与来源。以上研究结果表明,大数据和机器学习技术在宝石产地鉴别领域的应用具有高准确率、高效率和广谱性的优点,潜力巨大。大数据和机器学习对于宝石学产地研究的赋能是对产地鉴别技术的开创性拓展,更为宝石学研究提供提供了新视野。

     

    Abstract: In the gem trade, geographic provenance can significantly enhance the value of precious gemstone. The origin traceability of the prestigious gemstones such as ruby, sapphire, and emerald is not only the core competency of international gemmological laboratories, but also carries profound research significance including differences of origins, characteristics of deposits, mineralization types, ore-forming and prospecting patterns.Since the pioneering work of identifying the origins of coloured gemstones was first initiated by the Swiss Gübelin Gem Lab, Switzerland in the 1950s, the research on the origins of precious gemstones has been increasingly studied, which is the significant differences of conventional gemmological properties, inclusions, spectral characteristics, and chemical compositions of emeralds produced by deposits with different ore-forming backgrounds. Therefore, the current origin identification of gemstone is typically based on the comprehensive judgment of experienced gemologists regarding these four aspects. However, with the explosive growth of research on the origins and the comprehensive systematization of data on gemstones from different sources, the effectiveness of traditional identification methods has gradually declined. The phenomenon of similar inclusions, similar spectral patterns, and overlapping chemical composition ranges is commonplace, challenging the reliability of gemstone origin discrimination. In this study, the chemical fingerprint of gemstone is playing an important role in origin identification. With the advancement of micro-analysis techniques, the test precision of trace elements in gemstones has been improved significantly, and high-precision compositional data are strongly indicative significance for origins. Traditional identification techniques for chemical composition analysis mainly rely on low-dimensional binary/ternary discrimination diagrams, but the extensive overlap of data points cannot adapt to the increasingly large number of compositional data set. Therefore, the machine learning methods capable of efficiently mining high-dimensional compositional data are beneficial in addressing the current challenges faced in the origin identification of gemstone.In this study, 303 collected emerald samples from 12 producing areas(Fig. 1)including western emerald belt in Colombia (WEB), Kafubu in Zambia, Itabira in Brazil, Carnaíba in Brazil, Socotó in Brazil, Panjshir in Afghanistan, Swat in Pakistan, Shakiso in Ethiopia, Malysheva in Russia, Gwantu in Nigeria, Mananjary in Madagascar, and Dayakou in China.Firstly, the conventional gemmological parameters, inclusions, ultraviolet-visible-near infrared spectroscopy (UV-Vis-NIR), infrared spectroscopy (IR), Raman spectroscopy, and major and trace element contents of emerald samples from various origins were tested by traditional origin identification methods, and the chemical composition databases of the three prestigious gemstones were compiled and constructed. Additionally, three different machine learning methods were applied to mine gemstone composition data, and three efficient and accurate models for gemstone origin identification were constructed.Systematic research on emeralds from global originsindicate that there are three different UV-Vis-NIR spectrum absorption patterns (Fig. 2). The study on deuterium water in the emerald band also indicates that infrared absorption in the range of 2 600-2 850 cm-1 can also divide global emeralds into three groups (Fig. 3), which is a groundbreaking conclusion with strong implications for emerald provenance. The chemical composition database of emerald samples from 45 occurrences in 22 countries were compiled, totaling 2 753 data, including 425 test data in this study. Based on this database, three mature and efficient machine learning methods (random forest, support vector machine, and extreme gradient boosting) were used to mine emerald composition data and construct origin identification models. Among them, random forest (RF) and extreme gradient boosting (XGBoost) models have strong adaptability and superior performance for emerald data. The highest accuracy and F1 score of 22 RF models can reach 99.5% (Fig. 4). Model RF-EM1-1320 can distinguish 11 origins with 100% accuracy (Fig. 5), and model RF-EM12-135 can achieve the accuracy of 99.1% using only 5 elements. Simultaneously, the decoding of high-dimensional element information by the models indicates that V/Cr and alkali metal elements (Li, Na, Rb, Cs) are strongly related to origin, with their characteristic weights ranking at the top in all models, indicating the composition and source of the mother fluid in the ore-forming process.These findings underscore the significant advantages of big data and machine learning technologies in the field of gemstone origin determination, characterized by high accuracy, efficiency, and versatility. The application of these technologies for gemstone origin researche represents a groundbreaking expansion of origini dentification technology, and offers a novel perspective for gemmological research.

     

/

返回文章
返回