Japanese Western Europe language, Japanese, Chines Chinese

Automatic Domain Terminology Extraction System
Welcome to "Gensen Web"


     You can extract valued domain specific terms from Web pages or text you input. The extracted terms are sorted and displayed in descending order of their importance in other words, the extracted terms are well selected ones: thus the name of this system is "Gensen" which means "well selected."

     "Gensen Web" system is a Web version of the original term extraction system "TermExtract" written in Perl. The function is a little bit limited compared to the original stand-alone version.


  1. Input URL of Web page written in HTML or PDF from which you want to extract terms. Or input, probably copy and paste document. Or select your local PC file (text file or PDF only).
  2. Choose POS tagger version: highquality but slow or high speed version: but a little bit less quality
  3. Click the "start" button.
  4. Wait a while, then the extracted and sorted terms are displayed.
Input URL

Input (or copy and paste) document

Select local file(text file or PDF owritten by utf8 only)

high speed version English French German Italian Spanish Finnish Swedish
POS tagger version Japanese Chines-simple English: highquality but slow

Auto  (Powerd by Perl module Lingua::LanguageGuesser)

Perplexity mode
The "Perplexity mode" score importance of terms in context based on "Diversity of information".

Introduction about Stand alone system "termex" (in Japanese)

Introduction about text mining tool "termmi" (in Japanese)

Documentation of Perl module”TermExtract” in Japanese

Documentation of Python3 module”termextract” in Japanese

Top Page

Comments welcome to