Threats Watcher - Core Algorithm
- Watcher.threats_watcher.core.cleanup()
Remove words with a creation date greater than 30 days.
- Watcher.threats_watcher.core.extract_entities_and_threats(title: str) dict
Extract and clean entities and threats from title using NER model.
- Watcher.threats_watcher.core.fetch_last_posts(nb_max_post)
Fetch the nb last posts for each feed.
- Parameters:
nb_max_post – The deepness of the search on each feed.
- Watcher.threats_watcher.core.focus_five_letters()
Focus on 5 letters long words.
- Watcher.threats_watcher.core.focus_on_top(words_occurrence)
Focus on top words. Populated the database with only words with a minimum occurrence of “words_occurence” in feeds. Also triggers breaking news when threshold is exceeded. Generates AI summary for newly created or updated words.
- Parameters:
words_occurrence – Word occurence in feeds.
- Watcher.threats_watcher.core.get_confidence_score(confidence)
Converts a confidence level (1, 2 or 3) to a percentage.
- Watcher.threats_watcher.core.get_normalized_domain(url)
Extracts and normalizes the domain from a URL without a network query (for Source objects).
- Watcher.threats_watcher.core.get_pre_redirect_domain(url)
Retrieves the domain of the URL before the redirect.
- Watcher.threats_watcher.core.load_feeds()
Load feeds.
- Watcher.threats_watcher.core.main_watch()
- Main function:
close_old_connections()
load_feeds()
fetch_last_posts(settings.POSTS_DEPTH)
tokenize_count_urls()
remove_banned_words()
focus_five_letters()
focus_on_top(settings.WORDS_OCCURRENCE)
send_threats_watcher_notifications()
- Watcher.threats_watcher.core.reliability_score()
Calculates the reliability score for each TrendyWord by scanning its associated PostUrls.
- Watcher.threats_watcher.core.remove_banned_words()
Clean the posts for specific patterns: BannedWord, then english + french common words.
- Watcher.threats_watcher.core.send_threats_watcher_notifications(content)
Send notifications for Threats Watcher events to all enabled subscribers. Detects notification type from content and prepares the context for downstream handlers.
- Watcher.threats_watcher.core.start_scheduler()
- Launch planning tasks in background:
Fire main_watch every 30 minutes
Fire cleanup every day at 8 am
Fire weekly summary based on settings configuration
- Watcher.threats_watcher.core.tokenize_count_urls()
- For each title (≤ 30 days):
Runs NER and threat extraction,
Casts scores to float,
Counts occurrences and aggregates associated URLs.