NEC Develops Technology that Extracts Opinions from Text Data
JCN Network - Home
JCN Newswire
 Search:    Companies  | Annual Reports  |  CSR  | News Alerts  |  Japanese   
Company
News Sectors
Automotive
Chemicals
Consumers
CSR
Electronics
Energy
Environment
Financial
Health & Med
Industrial
IT & Internet
Materials
Real Estate
Science & Tech
Telecoms
Trade
Transportation
Departments
Annual Reports
Companies
News Alerts
News Search
Photo Gallery
Company Login
Japan Corporate News Network
About JCN
Privacy Policy
RSS  
Terms of Use
 
 
NEC Develops Technology that Extracts Opinions from Text Data - JCN Newswire
NEC Develops Technology that Extracts Opinions from Text Data

Tokyo, Apr 14, 2009 - (JCN Newswire) - NEC Corporation announced today the development of a "sentence characteristic distribution calculation method" that extracts "opinion sentences" from written text which expresses the feelings or subjective views of a writer, or "topic-related sentences," which are concerned with a specific issue. This technology is designed for analyzing sentences that feature individuals' evaluations of corporate brands or products (reputation information) in a wide range of written content that includes blogs, questionnaires and others.

In order to judge whether a sentence should be considered an opinion sentence or a topic-related sentence, this newly developed method focuses on the continuity of topics, and calculates the subjectivity or topicality of several sentences that appear before or after the target sentence.

In this way, the recall ratio (an index of comprehensiveness; see *1) has been increased by 21% compared to conventional methods of judging individual sentences, and it has become possible to extract more opinion sentences or topic-related sentences from the subject text.

For example, this method could determine an individual's ideas, calculate evaluation scores, or determine the ratio of approval vs. disapproval regarding a certain event, product, or service, from information on the Internet, such as blogs, electronic bulletin boards, questionnaire data, or records of inquiries at call centers. This information could then be used for corporate marketing activities.
Following is an outline of the main features of the newly developed sentence characteristic distribution calculation method.

1. Calculates scores for the subjectivity or topicality of multiple sentencesThis method focuses on the general tendency of sentences to be written with continuity on a given topic. Machine learning technologies (*2) are used to determine how many opinion sentences there are in a given group of continuous sentences in a text (a "block" of sentences), in order to extract rules for evaluating the subjectivity or topicality of the block. These rules are applied to the blocks being evaluated to calculate the score.

2. Judgments on subjectivity or topicality of sentences based on sentence characteristic distributionUsing the method described in 1. above, the system calculates scores for all blocks within the text, and then calculates the distribution of subjectivity or topicality (sentence characteristic distribution) for the text as a whole. In this sentence characteristic distribution, sentences are judged as being opinion sentences or topic-related sentences if the scores exceed a specified threshold value.

In recent years, with the proliferation of the Internet, users are able to transmit a wide range of information. This information includes many opinions and comments on news, products, and services, and this has attracted the attention of companies seeking information that can be effectively used in market surveys and improvements to products and services. In the past, technologies extracted reputation from information on blogs and other forms of user-generated content (UGC; *3) by specifying evaluation expressions (e.g., "good," "bad," "expensive," "cheap") along with the subject of the evaluation. In some cases, however, these technologies were unable to obtain evaluation information (opinion sentences) when the sentences were very short (e.g., when the subject of the sentence is not included) or complex sentences, where the subject of the evaluation is separated from the evaluation expressions. There has thus been a demand for technologies that offer more coverage of these types of sentences.

NEC has been developing reputation extraction technologies that can identify the positive and negative aspects of evaluation subjects. By combining these technologies with the recently developed technology, it will be possible to recognize even more customer needs and opinions from information available on networks, including blogs, electronic bulletin boards, and other UGC information, as well as questionnaire data and records of inquiries received at call centers.

NEC plans to apply these technologies in new search services, analysis services for marketing activities, customer relationship management solutions, and to strengthen research and development activities aimed at further expanding these application fields in the future.

*1: Recall ratio
The ratio of sentences that can actually be extracted from among the sentences that should be extracted. Using the sentence characteristic distribution calculation method, the recall ratio can be increased from 52% to 73% (F value: from 57.21% to 58.83%) in the case of opinion sentence extraction. In the case of topic-related sentence extraction, the recall ratio can be increased from 18% to 63% (*F value: from 26.94% to 54.81%). (The above figures are based on a survey by NEC, using a Japanese data set and a test collection for participants in a Multilanguage opinion analysis task provided for training at the NTCIR-7 workshop held by the National Institute of Informatics (NII).)
*F value: the harmonic mean of the recall ratio and the precision ratio (the ratio of correct answers included in the extracted data); in this index, a higher value indicates greater utility.

*2: Machine learning technologies
Technologies that extract categorization rules and judgment standards by analyzing example data sets

*3: UGC (User Generated Content) Information that is transmitted by consumers directly via the Internet or other networks. Examples include blogs and social networking content.


About NEC Corporation

NEC Corporation (TSE: 6701) is one of the world's leading providers of Internet, broadband network and enterprise business solutions dedicated to meeting the specialized needs of a diversified global base of customers. NEC delivers tailored solutions in the key fields of computer, networking and electron devices, by integrating its technical strengths in IT and Networks, and by providing advanced semiconductor solutions through NEC Electronics Corporation. The NEC Group employs more than 150,000 people worldwide. For additional information, please visit the NEC website at: http://www.nec.com.



Contact:

Kazuhito Ooto
NEC Corporation
+81-3-3798-6511
E-Mail?Fk-ooto@bc.jp.nec.com

Joseph Jasper
NEC Corporation
+81-3-3798-6511
E-Mail?Fj-jasper@ax.jp.nec.com
 

Apr 14, 2009
Source: NEC Corporation

NEC Corporation (TSE: 6701) (FTSE: NEC.IL) (U.S: NIPNY)

From the Japan Corporate News Network
http://www.japancorp.net
Topic: Press release summary
View more news from these Sectors: Telecoms


 
 
CSR Report Download
Annual Reports

  More >>    
Most Popular


About JCN | Privacy Policy | Terms of Use | JCN Network (日本語)