Identifying Topics and Trends of China’s Research on Third Front Construction Based on BERTopic Modeling
Main Article Content
Abstract
Third Front Construction, assuming an important role in New China History, was a massive strategic movement in 1960-1980s, involving infrastructure construction in areas such as national defense, industry, technology and transportation. The study identifies the topics and their trends among China’s academic publications on Third Front Construction based on BERTopic Modeling. BERTopic is an advanced topic modeling algorithm that leverages the power of the Bidirectional Encoder Representations from Transformers (BERT) model. By transforming textual data into vectorized representations, BERtopic employs clustering techniques, specifically the unified manifold approximation and projection (UMAP) and Hierarchical Density Based Spatial Clustering of Applications with Noise algorithm (HDBSCAN), to categorize the text data into distinct topics, thereby uncovering the latent semantic structures and relationships within the texts. A total of 357 journal articles on Third Front Construction are kept for analysis after data preprocessing and the articles are all indexed in the core database of CNKI from the inception to 2023. Through the visual analysis of topic distribution, feature words, hierarchical clustering, and similarity matrices by BERTopic modeling, the study reveals several findings: (1) BERTopic modeling is capable of providing extensive and elucidative topic information for unlabeled texts. (2) BERTopic modeling identifies five main topics prevalent in China’s research on Third Front Construction: industrial and urban construction, national strategy and economic development, literature research, industrial immigrants and social culture, leaders and national decision-making. (3) BERTopic modeling identifies the trends for future topics based on the semantic structures and relationships among topics: thematic research, historical materials’ research, interdisciplinary research, and theoretical research. In addition, the study seeks to validate advanced natural language processing (NLP) analysis by comparing its results with those obtained through traditional methods. It offers valuable insights and recommendations for future research endeavors and enhances the deep understanding of New China History.