Swedish Topic Modeling with BERTopic
January 01, 2024
Swedish-topic-modeling is a simple Markivet parser and topic analyser using BERTopic. It parses Swedish media archive data from Markivet and automatically identifies and extracts underlying topics from Swedish-language texts.
Features
- Parser for the Markivet Swedish media archive format
- BERTopic-based topic extraction
- Stop word support via stopwords-iso
- Tested against Stack Overflow Q&A datasets for benchmarking
References
- Stop words: stopwords-iso
- Test files from Markivet
Links
Swedish-topic-modeling GitHub repository