Threats Watcher - Core Algorithm
- Watcher.threats_watcher.core.cleanup()
Remove words with a creation date greater than 30 days.
- Watcher.threats_watcher.core.fetch_last_posts(nb_max_post)
Fetch the nb last posts for each feed.
- Parameters:
nb_max_post – The deepness of the search on each feed.
- Watcher.threats_watcher.core.focus_five_letters()
Focus on 5 letters long words.
- Watcher.threats_watcher.core.focus_on_top(words_occurrence)
Focus on top words. Populated the database with only words with a minimum occurrence of “words_occurence” in feeds.
- Parameters:
words_occurrence – Word occurence in feeds.
- Watcher.threats_watcher.core.load_feeds()
Load feeds.
- Watcher.threats_watcher.core.main_watch()
- Main function:
close_old_connections()
load_feeds()
fetch_last_posts(settings.POSTS_DEPTH)
tokenize_count_urls()
remove_banned_words()
focus_five_letters()
focus_on_top(settings.WORDS_OCCURRENCE)
send_email()
- Watcher.threats_watcher.core.remove_banned_words()
Clean the posts for specific patterns: BannedWord, then english + french common words.
- Watcher.threats_watcher.core.send_email()
Send e-mail alert.
- Watcher.threats_watcher.core.start_scheduler()
- Launch multiple planning tasks in background:
Fire main_watch every 30 minutes from Monday to Friday (daylight only)
Fire main_watch at 18h00 on Saturday
Fire cleanup every day at 8 am
- Watcher.threats_watcher.core.tokenize_count_urls()
Tokenize phrases to words, Count word occurences and keep the word post source urls.