import termextract.english_postagger
import termextract.core
from pprint import pprint # このサンプルでの処理結果の整形表示のため
f = open("eng_sample.txt", "r", encoding="utf-8")
text = f.read()
f.close
print(text)
import nltk
tagged_text = nltk.pos_tag(nltk.word_tokenize(text))
print(tagged_text)
frequency = termextract.english_postagger.cmp_noun_dict(tagged_text)
pprint(frequency)
#term_list = termextract.english_postagger.cmp_noun_list(tagged_text)
#pprint(term_list)
LR = termextract.core.score_lr(frequency,
ignore_words=termextract.english_postagger.IGNORE_WORDS,
lr_mode=1, average_rate=1
)
pprint(LR)
term_imp = termextract.core.term_importance(frequency, LR)
pprint(term_imp)
import collections
data_collection = collections.Counter(term_imp)
for cmp_noun, value in data_collection.most_common():
print(cmp_noun, value, sep="\t")