Threats Watcher - Core Algorithm
Remove words with a creation date greater than 30 days.
Fetch the nb last posts for each feed.
nb_max_post – The deepness of the search on each feed.
Focus on 5 letters long words.
Focus on top words. Populated the database with only words with a minimum occurrence of “words_occurence” in feeds.
words_occurrence – Word occurence in feeds.
- Main function:
Clean the posts for specific patterns: BannedWord, then english + french common words.
Send e-mail alert.
- Launch multiple planning tasks in background:
Fire main_watch every 30 minutes from Monday to Friday (daylight only)
Fire main_watch at 18h00 on Saturday
Fire cleanup every day at 8 am
Tokenize phrases to words, Count word occurences and keep the word post source urls.