Automated Topic Detection Tool
https://www.archivesportaleurope.net/topicdetection/
Our automated topic detection tool is the beta version of a new way to do research in Archives Portal Europe; it expands any keyword search to include other languages and semantic affinities, to detect finding aids related to a certain topic.
This innovative tool supports researchers, archivists, and curious users alike by surfacing relevant content—even when it's not explicitly tagged—through advanced semantic and entity search methods.
It has two main benefits:
■ Our users can overcome the limitations of a keyword-based search in the portal
■ Our content providers can test it on their collections, to detect and tag them with our topics.
You can read more about topic-based search in our Help section, as well as under our Explore Section
What is the Automated Topic Detection Tool?
The tool uses machine learning to find connections between your search terms and the descriptive metadata of archival documents—even if the metadata doesn't contain formal subject headings. It works in multiple languages and uses powerful models trained on a wide range of sources to identify relevant material based on meaning, not just exact keywords.
Across Europe, archival practices vary. Not all institutions assign controlled subject headings to their collections, and when they do, the headings may differ from national or international standards. This creates gaps in search results and makes it hard to explore topics comprehensively.
The Automated Topic Detection Tool addresses this by:
■ Identifying documents related to a topic even if they lack subject headings
■ Improving discovery across languages and metadata styles
■ Enabling both users and content providers to expand and refine how collections are explored and described
How does it work?
▶ The tool offers two types of search:
Concept Search:
■ Uses word embeddings to understand the meaning of your keywords
■ Matches documents semantically—even if the exact words don’t appear
■ Works across languages using a shared “semantic space”
Entity Search:
■ Recognises names (people, places, organisations) using Wikidata and VIAF
■ Retrieves name variants in multiple languages
■ Searches for these variants throughout the dataset
▶ The tool also supports the wildcard * (eg, Democra* - it will look for Democracy, Democratic, etc)
▶ The tool also supports the following Boolean operators (remember to tick the box Boolean Search when you use them):
AND (eg, Napoleon AND Waterloo)
OR (eg, Napoleon OR Lafayette)
" " quotation marks (eg, "Napoleon Bonaparte" - if you don't use the brackets, the tool will look them up as separate search terms)
▶ The tool also supports the Broad Entity Mention Search - this will search for the name of the entity in all languages available in Wikidata, not only the sub-set of languages present in the corpus.
Search results show:
■ Title and topic of each document
■ Content summary
■ Country and language
■ Date(s) of creation
■ A similarity score indicating relevance
Highlighted words show how your query connects to each document.
Please note: it’s in Beta Version !
At the moment, the tool does not scrape all of the portal, but only those archival collections that are already tagged with a topic. This reduces the test set to around 675,000 documents, but it allows us to double-check that the tool does not make mistakes; and it is still useful to detect documents that can be assigned to more than one topic.
The tool currently allows research in the following topics:
Catholicism | Health |
Democracy | Maps |
Economics | Napoleon I |
First World War | Notaries |
Genealogy | Slavery |
German Democratic Republic | Transport |
And in the following languages:
English | Latvian |
Finnish | Polish |
French | Russian |
German | Slovenian |
Hebrew | Spanish |
Italian | Swedish |
This is just the beginning. The tool is designed to grow—covering more topics, languages, and collections over time.
Try it yourself!
The tool is available at this link: https://www.archivesportaleurope.net/topicdetection/
☞ Are you a Content Provider wishing to tagging by topic your collections? Please get in touch and we will explain how to make the most out of the tool, and the current procedure for topic tagging ☜
And please help us with the testing!
When you use the tool, please take a few minutes to answer our questionnaire: the more testing, the better understanding of the tool:
https://docs.google.com/forms/d/e/1FAIpQLScUB5V7-16iJusy4UEDD6hD9nJHYbi9Mmu1e7hiR0GLaSEJ-Q/viewform
All the technical details and methodology behind the tool have been published in a peer-reviewed paper. You can read it here: https://dl.acm.org/doi/10.1145/3494572
Have questions or feedback?
Contact us at info@archivesportaleurope.net