features.txt 2.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344
  1. 'nb_words_title' Number of words in the article's titles
  2. 'nb_words_content' Number of words in the article
  3. 'pp_uniq_words' Proportion of unique words in the article
  4. 'pp_stop_words' Proportion of stop words (i.e. words predefined to be too common to be of use for interpretation or queries, such as 'the', 'a', 'and', etc.)
  5. 'pp_uniq_non-stop_words' Proportion of non-stop words among unique words
  6. 'nb_links' Number of hyperlinks in the article
  7. 'nb_outside_links' Number of hyperlinks pointing to another website
  8. 'nb_images' Number of images in the article
  9. 'nb_videos' Number of videos in the article
  10. 'ave_word_length' Average word length
  11. 'nb_keywords' Number of keywords in the metadata
  12. 'category' Category of the article: 0-Lifestyle, 1-Entertainment, 2-Business, 3-Web, 4-Tech, 5-World
  13. 'nb_mina_mink' Minimum number of share counts among all articles with at least one keyword in common with the article
  14. 'nb_mina_maxk' Minimum number of maximum share counts per keyword
  15. 'nb_mina_avek' Minimum number of average share counts per keyword
  16. 'nb_maxa_mink' Maximum number of minimum share counts per keyword
  17. 'nb_maxa_maxk' Maximum number of share counts among all articles with at least one keyword in common with the article
  18. 'nb_maxa_avek' Maximum number of average share counts per keyword
  19. 'nb_avea_mink' Average number of minimum share counts per keyword
  20. 'nb_avea_maxk' Average number of maximum share counts per keyword
  21. 'nb_avea_avek' Average number of average share counts per keyword
  22. 'nb_min_linked' Minimum number of shares of articles from the same website linked within the article
  23. 'nb_max_linked' Maximum number of shares of articles from the same website linked within the article
  24. 'nb_ave_linked' Average number of shares of articles from the same website linked within the article
  25. 'weekday' Day of the week: 0-Monday, 1-Tuesday, 2-Wednesday, until 6-Sunday
  26. 'dist_topic_0' Distance to topic 0
  27. 'dist_topic_1' Distance to topic 1
  28. 'dist_topic_2' Distance to topic 2
  29. 'dist_topic_3' Distance to topic 3
  30. 'dist_topic_4' Distance to topic 4
  31. 'subj' Subjectivity
  32. 'polar' Sentiment polarity
  33. 'pp_pos_words' Proportion of positive words in the article
  34. 'pp_neg_words' Proportion of negative words in the article
  35. 'pp_pos_words_in_nonneutral' Proportion of positive words among the non-neutral words of the article
  36. 'ave_polar_pos' Average sentiment polarity of the positive words
  37. 'min_polar_pos' Minimum sentiment polarity of the positive words
  38. 'max_polar_pos' Maximum sentiment polarity of the positive words
  39. 'ave_polar_neg' Average sentiment polarity of the negative words
  40. 'min_polar_neg' Mimimum sentiment polarity of the negative words
  41. 'max_polar_neg' Maximum sentiment polarity of the negative words
  42. 'subj_title' Subjectivity of the title
  43. 'polar_title' Polarity of the title