Threats Watcher - Core Algorithm

Watcher.threats_watcher.core.cleanup()

Remove words with a creation date greater than 30 days.

Watcher.threats_watcher.core.fetch_last_posts(nb_max_post)

Fetch the nb last posts for each feed.

Parameters

nb_max_post – The deepness of the search on each feed.

Watcher.threats_watcher.core.focus_five_letters()

Focus on 5 letters long words.

Watcher.threats_watcher.core.focus_on_top(words_occurrence)

Focus on top words. Populated the database with only words with a minimum occurrence of “words_occurence” in feeds.

Parameters

words_occurrence – Word occurence in feeds.

Watcher.threats_watcher.core.load_feeds()

Load feeds.

Watcher.threats_watcher.core.main_watch()
Main function:
  • close_old_connections()

  • load_feeds()

  • fetch_last_posts(settings.POSTS_DEPTH)

  • tokenize_count_urls()

  • remove_banned_words()

  • focus_five_letters()

  • focus_on_top(settings.WORDS_OCCURRENCE)

  • send_email()

Watcher.threats_watcher.core.remove_banned_words()

Clean the posts for specific patterns: BannedWord, then english + french common words.

Watcher.threats_watcher.core.send_email()

Send e-mail alert.

Watcher.threats_watcher.core.start_scheduler()
Launch multiple planning tasks in background:
  • Fire main_watch every 30 minutes from Monday to Friday (daylight only)

  • Fire main_watch at 18h00 on Saturday

  • Fire cleanup every day at 8 am

Watcher.threats_watcher.core.tokenize_count_urls()

Tokenize phrases to words, Count word occurences and keep the word post source urls.