开放学习研究

远程教育创新

当前位置: 首页 >> 期刊栏目 >> 远程教育创新 >> 正文

互联网社区中网络化知识实体抽取研究

2022年第2期  点击:[]

王怀波1 郑勤华2

(1.北京师范大学 系统科学学院,北京 100875;2.北京师范大学 远程教育研究中心,北京 100875)

【摘 要】互联网社区中“群体贡献、生产众筹”的知识涌现和汇聚现象,颠覆了知识稳定、权威的本质特征,改变了知识的表征形式与抽取方式。已有研究从理论层面提出“网络化知识”内涵及其表征模型,但缺少实践层面的落地应用。为此,本研究重点探究一套适用于互联网社区中网络化知识实体抽取的方法框架,以适应新时期网络化知识抽取的需要。抽取框架包括“数据采集与处理、分词与实体抽取、实体过滤与统一”三个环节。该框架综合考虑网络化知识的境遇化和动态性特征,通过设定一套基于话题文档的分类规则,将文本内容处理到相对独立的语境中;同时探索关键词、词语组合、命名实体识别三种方式获取候选实体;最终借助实体语义相似度计算等方法完成实体的过滤与统一。最后,研究借助cMOOC联通主义学习社区中网络化知识实体的抽取进行框架的应用与验证。

【关键词】互联网社区; 网络化知识; 知识实体; 知识抽取

Research on Networked Knowledge Entity Extraction in Internet Community

WANG Huaibo1 and ZHENG Qinhua2

(1.School of Systems Science, Beijing Normal University, Beijing 100875, China; 2.Research Center of Distance Education, Beijing Normal University, Beijing 100875, China)

Abstract:"Group contribution, group production" of knowledge emergence phenomenon in internet community, has brought about essential differences in the attributes of stability, authority of knowledge, and has changed the way of knowledge representation and extraction. Previous studies have put forward the concept and representation model of "networked knowledge" from the theoretical level,but there is a lack of practical application. Therefore, to meet the needs of knowledge extraction in the new era, this study focuses on exploring a framework for networked knowledge entity extraction in Internet community. The extraction framework includes data collection and processing, word segmentation and entity extraction, entity filtering and unification. The framework considers the situational and dynamic characteristics of networked knowledge, and forms a set of classification rules based on topic documents to process the text content into a relatively independent context. At the same time, this study explores three ways to obtain candidate entities based on keyword extraction, word combination and named entity recognition. Finally, the filtering and unification of entities are completed through the calculation of semantic similarity of knowledge entities. At last, the framework was applied and verified with the assistance of the extraction of networked knowledge entities in cMOOC learning community.

Keywords:internet community; networked knowledge; knowledge entity; knowledge extraction

下载:  互联网社区中网络化知识实体抽取研究.pdf


关闭

最新文章New
我要投稿
热点文章Hot