英文POS Tagger(nltk.word_tokenize)による用語抽出

モジュールをimport

In [1]:
import termextract.english_postagger
import termextract.core

from pprint import pprint # このサンプルでの処理結果の整形表示のため

英文のプレインテキストを読み込み

テキストはWikipediaの「A.I. Artificial Intelligence」記事( https://en.wikipedia.org/wiki/A.I._Artificial_Intelligence )から抜粋

In [2]:
f = open("eng_sample_s.txt", "r", encoding="utf-8")
text = f.read()
f.close
print(text)
Artificial intelligence (AI) is the intelligence exhibited by machines. In computer science, an ideal "intelligent" machine is a flexible rational agent that perceives its environment and takes actions that maximize its chance of success at an arbitrary goal.[1] Colloquially, the term "artificial intelligence" is likely to be applied when a machine uses cutting-edge techniques to competently perform or mimic "cognitive" functions that we intuitively associate with human minds, such as "learning" and "problem solving".[2] The colloquial connotation, especially among the public, associates artificial intelligence with machines that are "cutting-edge" (or even "mysterious"). This subjective borderline around what constitutes "artificial intelligence" tends to shrink over time; for example, optical character recognition is no longer perceived as an exemplar of "artificial intelligence" as it is nowadays a mundane routine technology.[3] Modern examples of AI include computers that can beat professional players at Chess and Go, and self-driving cars that navigate crowded city streets.

AI research is highly technical and specialized, and is deeply divided into subfields that often fail to communicate with each other.[4] Some of the division is due to social and cultural factors: subfields have grown up around particular institutions and the work of individual researchers. AI research is also divided by several technical issues. Some subfields focus on the solution of specific problems. Others focus on one of several possible approaches or on the use of a particular tool or towards the accomplishment of particular applications.

The central problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing (communication), perception and the ability to move and manipulate objects.[5] General intelligence is still among the field's long-term goals.[6] Currently popular approaches include statistical methods, computational intelligence and traditional symbolic AI. There are a large number of tools used in AI, including versions of search and mathematical optimization, logic, methods based on probability and economics, and many others. The AI field is interdisciplinary, in which a number of sciences and professions converge, including computer science, mathematics, psychology, linguistics, philosophy and neuroscience, as well as other specialized fields such as artificial psychology.

The field was founded on the claim that a central property of humans, human intelligence—the sapience of Homo sapiens sapiens—"can be so precisely described that a machine can be made to simulate it."[7] This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity.[8] Artificial intelligence has been the subject of tremendous optimism[9] but has also suffered stunning setbacks.[10] Today AI techniques have become an essential part of the technology industry, providing the heavy lifting for many of the most challenging problems in computer science.[11]


nltk.word_tokenize (英文のPOS Tagger)で英文処理

事前にnltk (Pythonの自然言語処理パッケージ)のインストールが必要

In [5]:
import nltk
tagged_text = nltk.pos_tag(nltk.word_tokenize(text))
print(tagged_text)
[('Artificial', 'JJ'), ('intelligence', 'NN'), ('(', '('), ('AI', 'NNP'), (')', ')'), ('is', 'VBZ'), ('the', 'DT'), ('intelligence', 'NN'), ('exhibited', 'VBN'), ('by', 'IN'), ('machines', 'NNS'), ('.', '.'), ('In', 'IN'), ('computer', 'NN'), ('science', 'NN'), (',', ','), ('an', 'DT'), ('ideal', 'NN'), ('``', '``'), ('intelligent', 'JJ'), ("''", "''"), ('machine', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('flexible', 'JJ'), ('rational', 'JJ'), ('agent', 'NN'), ('that', 'WDT'), ('perceives', 'VBZ'), ('its', 'PRP$'), ('environment', 'NN'), ('and', 'CC'), ('takes', 'VBZ'), ('actions', 'NNS'), ('that', 'IN'), ('maximize', 'VB'), ('its', 'PRP$'), ('chance', 'NN'), ('of', 'IN'), ('success', 'NN'), ('at', 'IN'), ('an', 'DT'), ('arbitrary', 'JJ'), ('goal', 'NN'), ('.', '.'), ('[', 'CC'), ('1', 'CD'), (']', 'NNP'), ('Colloquially', 'NNP'), (',', ','), ('the', 'DT'), ('term', 'NN'), ('``', '``'), ('artificial', 'JJ'), ('intelligence', 'NN'), ("''", "''"), ('is', 'VBZ'), ('likely', 'JJ'), ('to', 'TO'), ('be', 'VB'), ('applied', 'VBN'), ('when', 'WRB'), ('a', 'DT'), ('machine', 'NN'), ('uses', 'VBZ'), ('cutting-edge', 'NN'), ('techniques', 'NNS'), ('to', 'TO'), ('competently', 'VB'), ('perform', 'NN'), ('or', 'CC'), ('mimic', 'VB'), ('``', '``'), ('cognitive', 'JJ'), ("''", "''"), ('functions', 'NNS'), ('that', 'IN'), ('we', 'PRP'), ('intuitively', 'RB'), ('associate', 'VBP'), ('with', 'IN'), ('human', 'JJ'), ('minds', 'NNS'), (',', ','), ('such', 'JJ'), ('as', 'IN'), ('``', '``'), ('learning', 'VBG'), ("''", "''"), ('and', 'CC'), ('``', '``'), ('problem', 'NN'), ('solving', 'NN'), ("''", "''"), ('.', '.'), ('[', 'VB'), ('2', 'CD'), (']', 'IN'), ('The', 'DT'), ('colloquial', 'JJ'), ('connotation', 'NN'), (',', ','), ('especially', 'RB'), ('among', 'IN'), ('the', 'DT'), ('public', 'NN'), (',', ','), ('associates', 'VBZ'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('with', 'IN'), ('machines', 'NNS'), ('that', 'WDT'), ('are', 'VBP'), ('``', '``'), ('cutting-edge', 'JJ'), ("''", "''"), ('(', '('), ('or', 'CC'), ('even', 'RB'), ('``', '``'), ('mysterious', 'JJ'), ("''", "''"), (')', ')'), ('.', '.'), ('This', 'DT'), ('subjective', 'JJ'), ('borderline', 'NN'), ('around', 'IN'), ('what', 'WP'), ('constitutes', 'VBZ'), ('``', '``'), ('artificial', 'JJ'), ('intelligence', 'NN'), ("''", "''"), ('tends', 'VBZ'), ('to', 'TO'), ('shrink', 'VB'), ('over', 'IN'), ('time', 'NN'), (';', ':'), ('for', 'IN'), ('example', 'NN'), (',', ','), ('optical', 'JJ'), ('character', 'NN'), ('recognition', 'NN'), ('is', 'VBZ'), ('no', 'RB'), ('longer', 'RB'), ('perceived', 'VBN'), ('as', 'IN'), ('an', 'DT'), ('exemplar', 'NN'), ('of', 'IN'), ('``', '``'), ('artificial', 'JJ'), ('intelligence', 'NN'), ("''", "''"), ('as', 'IN'), ('it', 'PRP'), ('is', 'VBZ'), ('nowadays', 'RB'), ('a', 'DT'), ('mundane', 'JJ'), ('routine', 'NN'), ('technology', 'NN'), ('.', '.'), ('[', 'CC'), ('3', 'CD'), (']', 'JJ'), ('Modern', 'NNP'), ('examples', 'NNS'), ('of', 'IN'), ('AI', 'NNP'), ('include', 'VBP'), ('computers', 'NNS'), ('that', 'WDT'), ('can', 'MD'), ('beat', 'VB'), ('professional', 'JJ'), ('players', 'NNS'), ('at', 'IN'), ('Chess', 'NNP'), ('and', 'CC'), ('Go', 'NNP'), (',', ','), ('and', 'CC'), ('self-driving', 'JJ'), ('cars', 'NNS'), ('that', 'WDT'), ('navigate', 'VBP'), ('crowded', 'VBN'), ('city', 'NN'), ('streets', 'NNS'), ('.', '.'), ('AI', 'NNP'), ('research', 'NN'), ('is', 'VBZ'), ('highly', 'RB'), ('technical', 'JJ'), ('and', 'CC'), ('specialized', 'JJ'), (',', ','), ('and', 'CC'), ('is', 'VBZ'), ('deeply', 'RB'), ('divided', 'VBN'), ('into', 'IN'), ('subfields', 'NNS'), ('that', 'WDT'), ('often', 'RB'), ('fail', 'VBP'), ('to', 'TO'), ('communicate', 'VB'), ('with', 'IN'), ('each', 'DT'), ('other', 'JJ'), ('.', '.'), ('[', '$'), ('4', 'CD'), (']', 'NNP'), ('Some', 'DT'), ('of', 'IN'), ('the', 'DT'), ('division', 'NN'), ('is', 'VBZ'), ('due', 'JJ'), ('to', 'TO'), ('social', 'JJ'), ('and', 'CC'), ('cultural', 'JJ'), ('factors', 'NNS'), (':', ':'), ('subfields', 'NNS'), ('have', 'VBP'), ('grown', 'VBN'), ('up', 'RP'), ('around', 'IN'), ('particular', 'JJ'), ('institutions', 'NNS'), ('and', 'CC'), ('the', 'DT'), ('work', 'NN'), ('of', 'IN'), ('individual', 'JJ'), ('researchers', 'NNS'), ('.', '.'), ('AI', 'NNP'), ('research', 'NN'), ('is', 'VBZ'), ('also', 'RB'), ('divided', 'VBN'), ('by', 'IN'), ('several', 'JJ'), ('technical', 'JJ'), ('issues', 'NNS'), ('.', '.'), ('Some', 'DT'), ('subfields', 'NNS'), ('focus', 'VBP'), ('on', 'IN'), ('the', 'DT'), ('solution', 'NN'), ('of', 'IN'), ('specific', 'JJ'), ('problems', 'NNS'), ('.', '.'), ('Others', 'NNS'), ('focus', 'VBP'), ('on', 'IN'), ('one', 'CD'), ('of', 'IN'), ('several', 'JJ'), ('possible', 'JJ'), ('approaches', 'NNS'), ('or', 'CC'), ('on', 'IN'), ('the', 'DT'), ('use', 'NN'), ('of', 'IN'), ('a', 'DT'), ('particular', 'JJ'), ('tool', 'NN'), ('or', 'CC'), ('towards', 'VB'), ('the', 'DT'), ('accomplishment', 'NN'), ('of', 'IN'), ('particular', 'JJ'), ('applications', 'NNS'), ('.', '.'), ('The', 'DT'), ('central', 'JJ'), ('problems', 'NNS'), ('(', '('), ('or', 'CC'), ('goals', 'NNS'), (')', ')'), ('of', 'IN'), ('AI', 'NNP'), ('research', 'NN'), ('include', 'VBP'), ('reasoning', 'VBG'), (',', ','), ('knowledge', 'NN'), (',', ','), ('planning', 'NN'), (',', ','), ('learning', 'NN'), (',', ','), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('(', '('), ('communication', 'NN'), (')', ')'), (',', ','), ('perception', 'NN'), ('and', 'CC'), ('the', 'DT'), ('ability', 'NN'), ('to', 'TO'), ('move', 'VB'), ('and', 'CC'), ('manipulate', 'VB'), ('objects', 'NNS'), ('.', '.'), ('[', '$'), ('5', 'CD'), (']', 'NNP'), ('General', 'NNP'), ('intelligence', 'NN'), ('is', 'VBZ'), ('still', 'RB'), ('among', 'IN'), ('the', 'DT'), ('field', 'NN'), ("'s", 'POS'), ('long-term', 'JJ'), ('goals', 'NNS'), ('.', '.'), ('[', '$'), ('6', 'CD'), (']', 'NNP'), ('Currently', 'NNP'), ('popular', 'JJ'), ('approaches', 'NNS'), ('include', 'VBP'), ('statistical', 'JJ'), ('methods', 'NNS'), (',', ','), ('computational', 'JJ'), ('intelligence', 'NN'), ('and', 'CC'), ('traditional', 'JJ'), ('symbolic', 'JJ'), ('AI', 'NNP'), ('.', '.'), ('There', 'EX'), ('are', 'VBP'), ('a', 'DT'), ('large', 'JJ'), ('number', 'NN'), ('of', 'IN'), ('tools', 'NNS'), ('used', 'VBN'), ('in', 'IN'), ('AI', 'NNP'), (',', ','), ('including', 'VBG'), ('versions', 'NNS'), ('of', 'IN'), ('search', 'NN'), ('and', 'CC'), ('mathematical', 'JJ'), ('optimization', 'NN'), (',', ','), ('logic', 'NN'), (',', ','), ('methods', 'NNS'), ('based', 'VBN'), ('on', 'IN'), ('probability', 'NN'), ('and', 'CC'), ('economics', 'NNS'), (',', ','), ('and', 'CC'), ('many', 'JJ'), ('others', 'NNS'), ('.', '.'), ('The', 'DT'), ('AI', 'NNP'), ('field', 'NN'), ('is', 'VBZ'), ('interdisciplinary', 'JJ'), (',', ','), ('in', 'IN'), ('which', 'WDT'), ('a', 'DT'), ('number', 'NN'), ('of', 'IN'), ('sciences', 'NNS'), ('and', 'CC'), ('professions', 'NNS'), ('converge', 'VBP'), (',', ','), ('including', 'VBG'), ('computer', 'NN'), ('science', 'NN'), (',', ','), ('mathematics', 'NNS'), (',', ','), ('psychology', 'NN'), (',', ','), ('linguistics', 'NNS'), (',', ','), ('philosophy', 'NN'), ('and', 'CC'), ('neuroscience', 'NN'), (',', ','), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('other', 'JJ'), ('specialized', 'JJ'), ('fields', 'NNS'), ('such', 'JJ'), ('as', 'IN'), ('artificial', 'JJ'), ('psychology', 'NN'), ('.', '.'), ('The', 'DT'), ('field', 'NN'), ('was', 'VBD'), ('founded', 'VBN'), ('on', 'IN'), ('the', 'DT'), ('claim', 'NN'), ('that', 'IN'), ('a', 'DT'), ('central', 'JJ'), ('property', 'NN'), ('of', 'IN'), ('humans', 'NNS'), (',', ','), ('human', 'JJ'), ('intelligence—the', 'NN'), ('sapience', 'NN'), ('of', 'IN'), ('Homo', 'NNP'), ('sapiens', 'JJ'), ('sapiens—', 'NN'), ("''", "''"), ('can', 'MD'), ('be', 'VB'), ('so', 'RB'), ('precisely', 'RB'), ('described', 'VBN'), ('that', 'IN'), ('a', 'DT'), ('machine', 'NN'), ('can', 'MD'), ('be', 'VB'), ('made', 'VBN'), ('to', 'TO'), ('simulate', 'VB'), ('it', 'PRP'), ('.', '.'), ('``', '``'), ('[', 'JJ'), ('7', 'CD'), (']', 'NN'), ('This', 'DT'), ('raises', 'VBZ'), ('philosophical', 'JJ'), ('arguments', 'NNS'), ('about', 'IN'), ('the', 'DT'), ('nature', 'NN'), ('of', 'IN'), ('the', 'DT'), ('mind', 'NN'), ('and', 'CC'), ('the', 'DT'), ('ethics', 'NNS'), ('of', 'IN'), ('creating', 'VBG'), ('artificial', 'JJ'), ('beings', 'NNS'), ('endowed', 'VBN'), ('with', 'IN'), ('human-like', 'JJ'), ('intelligence', 'NN'), (',', ','), ('issues', 'NNS'), ('which', 'WDT'), ('have', 'VBP'), ('been', 'VBN'), ('explored', 'VBN'), ('by', 'IN'), ('myth', 'NN'), (',', ','), ('fiction', 'NN'), ('and', 'CC'), ('philosophy', 'NN'), ('since', 'IN'), ('antiquity', 'NN'), ('.', '.'), ('[', 'CC'), ('8', 'CD'), (']', 'JJ'), ('Artificial', 'NNP'), ('intelligence', 'NN'), ('has', 'VBZ'), ('been', 'VBN'), ('the', 'DT'), ('subject', 'NN'), ('of', 'IN'), ('tremendous', 'JJ'), ('optimism', 'NN'), ('[', 'VBD'), ('9', 'CD'), (']', 'NN'), ('but', 'CC'), ('has', 'VBZ'), ('also', 'RB'), ('suffered', 'VBN'), ('stunning', 'JJ'), ('setbacks', 'NNS'), ('.', '.'), ('[', '$'), ('10', 'CD'), (']', 'NN'), ('Today', 'NNP'), ('AI', 'NNP'), ('techniques', 'NNS'), ('have', 'VBP'), ('become', 'VBN'), ('an', 'DT'), ('essential', 'JJ'), ('part', 'NN'), ('of', 'IN'), ('the', 'DT'), ('technology', 'NN'), ('industry', 'NN'), (',', ','), ('providing', 'VBG'), ('the', 'DT'), ('heavy', 'JJ'), ('lifting', 'NN'), ('for', 'IN'), ('many', 'JJ'), ('of', 'IN'), ('the', 'DT'), ('most', 'RBS'), ('challenging', 'JJ'), ('problems', 'NNS'), ('in', 'IN'), ('computer', 'NN'), ('science', 'NN'), ('.', '.'), ('[', 'CC'), ('11', 'CD'), (']', 'NN')]

複合語抽出処理(ディクショナリとリストの両方可)

In [6]:
frequency  = termextract.english_postagger.cmp_noun_dict(tagged_text)
pprint(frequency)

#term_list = termextract.english_postagger.cmp_noun_list(tagged_text)
#pprint(term_list)
{'AI': 2,
 'AI field': 1,
 'AI research': 3,
 'Artificial intelligence': 2,
 'Chess': 1,
 'Colloquially': 1,
 'Currently': 1,
 'General intelligence': 1,
 'Go': 1,
 'Modern example of AI': 1,
 'Other': 1,
 'Today AI technique': 1,
 'ability': 1,
 'accomplishment': 1,
 'action': 1,
 'antiquity': 1,
 'applied': 1,
 'arbitrary goal': 1,
 'artificial being': 1,
 'artificial intelligence': 4,
 'artificial psychology': 1,
 'based': 1,
 'become': 1,
 'been': 2,
 'central problem': 1,
 'central property of human': 1,
 'challenging problem': 1,
 'chance of success': 1,
 'claim': 1,
 'cognitive': 1,
 'colloquial connotation': 1,
 'communication': 1,
 'computational intelligence': 1,
 'computer': 1,
 'computer science': 3,
 'crowded city street': 1,
 'cultural factor': 1,
 'cutting-edge': 1,
 'cutting-edge technique': 1,
 'described': 1,
 'divided': 2,
 'division': 1,
 'due': 1,
 'economic': 1,
 'endowed': 1,
 'environment': 1,
 'essential part': 1,
 'ethic': 1,
 'example': 1,
 'exemplar': 1,
 'exhibited': 1,
 'explored': 1,
 'fiction': 1,
 'field': 1,
 "field 's long-term goal": 1,
 'flexible rational agent': 1,
 'founded': 1,
 'function': 1,
 'goal': 1,
 'grown': 1,
 'heavy lifting': 1,
 'human intelligence—the sapience of Homo': 1,
 'human mind': 1,
 'human-like intelligence': 1,
 'ideal': 1,
 'individual researcher': 1,
 'intelligence': 1,
 'intelligent': 1,
 'interdisciplinary': 1,
 'issue': 1,
 'knowledge': 1,
 'large number of tool': 1,
 'learning': 1,
 'likely': 1,
 'linguistic': 1,
 'logic': 1,
 'machine': 5,
 'made': 1,
 'many': 1,
 'many other': 1,
 'mathematic': 1,
 'mathematical optimization': 1,
 'method': 1,
 'mind': 1,
 'mundane routine technology': 1,
 'mysterious': 1,
 'myth': 1,
 'natural language processing': 1,
 'nature': 1,
 'neuroscience': 1,
 'number of science': 1,
 'object': 1,
 'one': 1,
 'optical character recognition': 1,
 'other': 1,
 'other specialized field': 1,
 'particular application': 1,
 'particular institution': 1,
 'particular tool': 1,
 'perceived': 1,
 'perception': 1,
 'perform': 1,
 'philosophical argument': 1,
 'philosophy': 2,
 'planning': 1,
 'popular approache': 1,
 'probability': 1,
 'problem solving': 1,
 'professional player': 1,
 'professions': 1,
 'psychology': 1,
 'public': 1,
 'sapiens sapiens—': 1,
 'self-driving car': 1,
 'several possible approache': 1,
 'several technical issue': 1,
 'social': 1,
 'solution': 1,
 'specialized': 1,
 'specific problem': 1,
 'statistical method': 1,
 'stunning setback': 1,
 'subfield': 3,
 'subject': 1,
 'subjective borderline': 1,
 'such': 2,
 'suffered': 1,
 'technical': 1,
 'technology industry': 1,
 'term': 1,
 'time': 1,
 'traditional symbolic AI': 1,
 'tremendous optimism': 1,
 'use': 1,
 'used': 1,
 'version of search': 1,
 'work': 1}

FrequencyからLRを生成する

In [9]:
lr = termextract.core.score_lr(
    frequency,
    ignore_words=termextract.english_postagger.IGNORE_WORDS,
    lr_mode=1, average_rate=1)
pprint(lr)
{'AI': 4.898979485566356,
 'AI field': 3.4641016151377544,
 'AI research': 3.1301691601465746,
 'Artificial intelligence': 2.340347319320716,
 'Chess': 1.0,
 'Colloquially': 1.0,
 'Currently': 1.0,
 'General intelligence': 2.114742526881128,
 'Go': 1.0,
 'Modern example of AI': 1.9293572599206188,
 'Other': 1.0,
 'Today AI technique': 2.2894284851066637,
 'ability': 1.0,
 'accomplishment': 1.0,
 'action': 1.0,
 'antiquity': 1.0,
 'applied': 1.0,
 'arbitrary goal': 1.5650845800732873,
 'artificial being': 1.9343364202676694,
 'artificial intelligence': 2.892507608519078,
 'artificial psychology': 1.9343364202676694,
 'based': 1.0,
 'become': 1.0,
 'been': 1.0,
 'central problem': 2.213363839400643,
 'central property of human': 1.7067368368450775,
 'challenging problem': 2.0,
 'chance of success': 1.2599210498948732,
 'claim': 1.0,
 'cognitive': 1.0,
 'colloquial connotation': 1.4142135623730951,
 'communication': 1.0,
 'computational intelligence': 2.114742526881128,
 'computer': 2.0,
 'computer science': 2.114742526881128,
 'crowded city street': 1.5874010519681994,
 'cultural factor': 1.4142135623730951,
 'cutting-edge': 1.4142135623730951,
 'cutting-edge technique': 1.5650845800732873,
 'described': 1.0,
 'divided': 1.0,
 'division': 1.0,
 'due': 1.0,
 'economic': 1.0,
 'endowed': 1.0,
 'environment': 1.0,
 'essential part': 1.4142135623730951,
 'ethic': 1.0,
 'example': 2.0,
 'exemplar': 1.0,
 'exhibited': 1.0,
 'explored': 1.0,
 'fiction': 1.0,
 'field': 2.449489742783178,
 "field 's long-term goal": 2.0296635898134046,
 'flexible rational agent': 1.5874010519681994,
 'founded': 1.0,
 'function': 1.0,
 'goal': 1.7320508075688772,
 'grown': 1.0,
 'heavy lifting': 1.4142135623730951,
 'human intelligence—the sapience of Homo': 1.6917263851493571,
 'human mind': 1.8612097182041991,
 'human-like intelligence': 2.114742526881128,
 'ideal': 1.0,
 'individual researcher': 1.4142135623730951,
 'intelligence': 3.1622776601683795,
 'intelligent': 1.0,
 'interdisciplinary': 1.0,
 'issue': 1.4142135623730951,
 'knowledge': 1.0,
 'large number of tool': 1.5650845800732873,
 'learning': 1.0,
 'likely': 1.0,
 'linguistic': 1.0,
 'logic': 1.0,
 'machine': 1.0,
 'made': 1.0,
 'many': 1.4142135623730951,
 'many other': 1.681792830507429,
 'mathematic': 1.0,
 'mathematical optimization': 1.4142135623730951,
 'method': 1.4142135623730951,
 'mind': 1.4142135623730951,
 'mundane routine technology': 1.7817974362806785,
 'mysterious': 1.0,
 'myth': 1.0,
 'natural language processing': 1.5874010519681994,
 'nature': 1.0,
 'neuroscience': 1.0,
 'number of science': 1.762734383267615,
 'object': 1.0,
 'one': 1.0,
 'optical character recognition': 1.5874010519681994,
 'other': 2.0,
 'other specialized field': 2.1398263878673256,
 'particular application': 1.681792830507429,
 'particular institution': 1.681792830507429,
 'particular tool': 1.8612097182041991,
 'perceived': 1.0,
 'perception': 1.0,
 'perform': 1.0,
 'philosophical argument': 1.4142135623730951,
 'philosophy': 1.0,
 'planning': 1.0,
 'popular approache': 1.5650845800732873,
 'probability': 1.0,
 'problem solving': 2.0,
 'professional player': 1.4142135623730951,
 'professions': 1.0,
 'psychology': 1.4142135623730951,
 'public': 1.0,
 'sapiens sapiens—': 1.4142135623730951,
 'self-driving car': 1.4142135623730951,
 'several possible approache': 1.8171205928321397,
 'several technical issue': 1.6983813295649528,
 'social': 1.0,
 'solution': 1.0,
 'specialized': 2.0,
 'specific problem': 2.0,
 'statistical method': 1.4142135623730951,
 'stunning setback': 1.4142135623730951,
 'subfield': 1.0,
 'subject': 1.0,
 'subjective borderline': 1.4142135623730951,
 'such': 1.0,
 'suffered': 1.0,
 'technical': 2.0,
 'technology industry': 1.681792830507429,
 'term': 1.0,
 'time': 1.0,
 'traditional symbolic AI': 2.4018739103520055,
 'tremendous optimism': 1.4142135623730951,
 'use': 1.0,
 'used': 1.0,
 'version of search': 1.2599210498948732,
 'work': 1.0}

FrequencyからLRを生成する

In [10]:
term_imp = termextract.core.term_importance(frequency, lr)

collectionsを使って重要度が高い順に表示

In [11]:
import collections
data_collection = collections.Counter(term_imp)
for cmp_noun, value in data_collection.most_common():
    print(cmp_noun, value, sep="\t")
artificial intelligence	11.570030434076312
AI	9.797958971132712
AI research	9.390507480439723
computer science	6.344227580643384
machine	5.0
Artificial intelligence	4.680694638641432
AI field	3.4641016151377544
intelligence	3.1622776601683795
subfield	3.0
field	2.449489742783178
traditional symbolic AI	2.4018739103520055
Today AI technique	2.2894284851066637
central problem	2.213363839400643
other specialized field	2.1398263878673256
General intelligence	2.114742526881128
computational intelligence	2.114742526881128
human-like intelligence	2.114742526881128
field 's long-term goal	2.0296635898134046
problem solving	2.0
divided	2.0
been	2.0
philosophy	2.0
technical	2.0
computer	2.0
other	2.0
example	2.0
such	2.0
challenging problem	2.0
specific problem	2.0
specialized	2.0
artificial psychology	1.9343364202676694
artificial being	1.9343364202676694
Modern example of AI	1.9293572599206188
particular tool	1.8612097182041991
human mind	1.8612097182041991
several possible approache	1.8171205928321397
mundane routine technology	1.7817974362806785
number of science	1.762734383267615
goal	1.7320508075688772
central property of human	1.7067368368450775
several technical issue	1.6983813295649528
human intelligence—the sapience of Homo	1.6917263851493571
technology industry	1.681792830507429
particular institution	1.681792830507429
particular application	1.681792830507429
many other	1.681792830507429
flexible rational agent	1.5874010519681994
natural language processing	1.5874010519681994
crowded city street	1.5874010519681994
optical character recognition	1.5874010519681994
popular approache	1.5650845800732873
large number of tool	1.5650845800732873
cutting-edge technique	1.5650845800732873
arbitrary goal	1.5650845800732873
issue	1.4142135623730951
cultural factor	1.4142135623730951
method	1.4142135623730951
professional player	1.4142135623730951
cutting-edge	1.4142135623730951
philosophical argument	1.4142135623730951
subjective borderline	1.4142135623730951
colloquial connotation	1.4142135623730951
psychology	1.4142135623730951
self-driving car	1.4142135623730951
tremendous optimism	1.4142135623730951
heavy lifting	1.4142135623730951
essential part	1.4142135623730951
mathematical optimization	1.4142135623730951
stunning setback	1.4142135623730951
mind	1.4142135623730951
sapiens sapiens—	1.4142135623730951
individual researcher	1.4142135623730951
many	1.4142135623730951
statistical method	1.4142135623730951
version of search	1.2599210498948732
chance of success	1.2599210498948732
time	1.0
intelligent	1.0
suffered	1.0
communication	1.0
neuroscience	1.0
endowed	1.0
mathematic	1.0
one	1.0
myth	1.0
knowledge	1.0
term	1.0
ability	1.0
due	1.0
Go	1.0
become	1.0
subject	1.0
environment	1.0
perceived	1.0
explored	1.0
use	1.0
interdisciplinary	1.0
economic	1.0
antiquity	1.0
function	1.0
logic	1.0
planning	1.0
cognitive	1.0
founded	1.0
social	1.0
perception	1.0
claim	1.0
Colloquially	1.0
applied	1.0
public	1.0
accomplishment	1.0
mysterious	1.0
division	1.0
solution	1.0
made	1.0
work	1.0
likely	1.0
exhibited	1.0
ethic	1.0
ideal	1.0
learning	1.0
used	1.0
Other	1.0
object	1.0
described	1.0
grown	1.0
Currently	1.0
based	1.0
probability	1.0
action	1.0
fiction	1.0
linguistic	1.0
nature	1.0
exemplar	1.0
perform	1.0
professions	1.0
Chess	1.0
In [ ]: