王文川, 朱全银, 孙纪舟, 马甲林. 基于语义匹配的多标签多层级中文专利分类[J]. 微电子学与计算机, 2022, 39(4): 91-99. DOI: 10.19304/J.ISSN1000-7180.2021.1083
引用本文: 王文川, 朱全银, 孙纪舟, 马甲林. 基于语义匹配的多标签多层级中文专利分类[J]. 微电子学与计算机, 2022, 39(4): 91-99. DOI: 10.19304/J.ISSN1000-7180.2021.1083
WANG Wenchuan, ZHU Quanyin, SUN Jizhou, MA Jialin. Multi-label and multi-level chinese patent classification based on semantic matching[J]. Microelectronics & Computer, 2022, 39(4): 91-99. DOI: 10.19304/J.ISSN1000-7180.2021.1083
Citation: WANG Wenchuan, ZHU Quanyin, SUN Jizhou, MA Jialin. Multi-label and multi-level chinese patent classification based on semantic matching[J]. Microelectronics & Computer, 2022, 39(4): 91-99. DOI: 10.19304/J.ISSN1000-7180.2021.1083

基于语义匹配的多标签多层级中文专利分类

Multi-label and multi-level chinese patent classification based on semantic matching

  • 摘要: 随着“十四五”规划提出要保护和激励国内产生更多高价值专利,各类跨学科、跨领域的创新型专利申请量激增,专利自动分类方法辅助人工分类的需求日益增长.目前,中文专利分类主要由审查员根据提交的专利内容,与国际专利分类体系表进行人工匹配来确定所属分类,人工效率低.已有的专利自动分类方法主要从专利中提取文本结构特征和语义特征,将两种特征与国际专利分类体系表中的标签直接进行相似度匹配,没有考虑到国际专利分类表中分类标签解释文本的语义信息,容易导致分类模糊.为此,提出一种基于语义匹配的多标签多层级中文专利分类方法,将传统的文本分类问题转化为基于语义特征的文本匹配问题,以实现专利文本多标签多层级分类任务.通过从国际专利分类表中提取各标签各层级(部、大类、小类、大组和小组)的语义特征,同时从公开专利中提取文本语义特征,并将二者进行语义匹配,从而达到自动分类的目的.在同一数据集上的实验结果显示,该方法能够取得更好的效果.

     

    Abstract: As the "14th Five-Year Plan" proposes to protect and encourage more high-value patents in the country, the number of innovative patent applications across disciplines and fields has surged, and the demand for automatic patent classification methods to assist manual classification is increasing.At present, the Chinese patent classification is mainly determined by the examiner′s manual matching with the international patent classification system table according to the patent content submitted. The manual efficiency is low, while the existing automatic classification methods mainly extract the text structure features and semantic features from the patents, and directly match the two features with the classification labels of the international patent classification system table.The existing classification methods do not take into account the semantic information of the interpretation text of the classification labels in the international patent classification table, which easily leads to fuzzy classification. Therefore, this paper transforms the traditional text classification problem into a text matching problem based on semantic features.Propose a multi-label and multi-level Chinese patent classification method based on semantic matching to realize the multi-label and multi-level classification task of patent text: extract the semantic features of each label at each level (department, major category, sub-category, major group, and group) from the international patent classification table, and extract it from public patents The semantic features of the text, and the semantic matching between the two, so as to achieve the purpose of automatic classification. A model comparison experiment was conducted on the same data set, and the results showed that the patent classification method based on semantic matching proposed in this paper can achieve better results.

     

/

返回文章
返回