Swedish Topic Modeling with BERTopic

January 01, 2024

Swedish-topic-modeling is a simple Markivet parser and topic analyser using BERTopic. It parses Swedish media archive data from Markivet and automatically identifies and extracts underlying topics from Swedish-language texts.

Features

  • Parser for the Markivet Swedish media archive format
  • BERTopic-based topic extraction
  • Stop word support via stopwords-iso
  • Tested against Stack Overflow Q&A datasets for benchmarking

References

Swedish-topic-modeling GitHub repository