The Commons

Back to Results

Patent Title: Method and system for extracting pairs of multilingual terminology from an aligned multilingual text

Assignee: IBM
Patent Number: US6236958
Issue Date: 05-22-2001
Application Number:
File Date:05-15-1998


Abstract: A terminology extraction system which allows for automatic creation of bilingual terminology has a source text which comprises at least one sequence of source terms, aligned with a target text which also comprises at least one sequence of target terms. A term extractor builds a network from each source and target sequence wherein each node of the network comprises at least one term and such that each combination of source terms is included within one source node and each combination of target terms is included within one target node. The term extractor links each source node with each target node, and through a flow optimization method selects relevant links in the resulting network. Once the term extractor has been run on the entire set of aligned sequences, a term statistics circuit computes an association score for each pair of linked source/target terms, and finally the scored pairs of linked source/target term that are considered relevant bilingual terms are stored in a bilingual terminology database. The whole process can be iterated in order to improve the strength of the bilingual links.

Notes:

Link to USPTO

IBM Pledge dated 1/11/2005