Watson Natural Language Processing Library for Embed is a containerized library designed to empower IBM partners with greater flexibility to infuse powerful natural language AI into their solutions. This posts summarizes some of the possible use cases for which documentation and samples are available.
Text Classification
Scenario: Customer reviews of hotels are classified in ‘non-complaint’ and ‘complaint’
Data:
1
2
3
Label, Text
Non-Complaint, Staff very friendly and helpful hotel immaculate
Complaint, My room was dirty and I was afraid to walk barefoot on the floor which looked as if it was not cleaned in weeks White furniture which looked nice in pictures was dirty too ...
Sample:
- Notebook: Custom Trained
- The Ensemble ‘model’ combines three classification models: CNN, SVM with TF-IDF and SVM with USE (Universal Sentence Encoder)
- Alternatively BERT can be used
- Second sample predicts multiple classifications
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- embedding_use_en_stock
- text_stopwords_classification_ensemble_en_stock
- embedding_glove_en_stock
- watson_nlp.workflows.classification.Ensemble
- watson_nlp.blocks.classification.SVM
- watson_nlp.blocks.classification.bert.BERT
Sentiment Classification
Scenario:
- Sentiment (positive, neutral, negative) is analyzed for movie reviews
- Analysis is done for document, sentence and aspect/target level: The sentence sentiment model will split the whole document into sentences and return each sentence’s sentiment. The document sentiment is calculated by aggregating all the sentence predictions in the document. Each target’s sentiment is calculated by aggregating the predictions from all the sentences that contain this target’s span.
Document level:
1
2
3
Label, Text
SENTIMENT_NEGATIVE, I grew up (b. 1965) watching and loving the Thunderbirds. All my mates at school watched. We played "Thunderbirds" before school, during lunch and after school. We all wanted to be Virgil or Scott. No one wanted to be Alan. Counting down from 5 became an art form. I took my children to see the movie hoping they would get a glimpse of what I loved as a child. How bitterly disappointing.
SENTIMENT_POSITIVE, Im a die hard Dads Army fan and nothing will ever change that. I got all the tapes, DVD's and audiobooks and every time i watch/listen to them its brand new. <br /><br />The film. The film is a re run of certain episodes, Man and the hour, Enemy within the gates, Battle School and numerous others with a different edge. Introduction of a new General instead of Captain Square was a brilliant mov...
Sentence level:
1
2
3
Sentence, Positive, Neutral, Negative
watching and loving the thunderbirds, 0.747463, 0.236405, 0.016132
How bitterly disappointing, 0.050000, 0.050000, 0.900000
Aspect/target level:
Example: “The served food was delicious, yet the service was slow.”
The block identifies that there is a positive sentiment expressed towards the target “food”, and a negative sentiment expressed towards “service”.
Samples:
- Notebook: Pre-Trained Model
- Notebook: Custom Trained
- The custom trained model provides an accuracy of 96%, while the out of the box model only has 87% accuracy
- The CNN models utilize GloVE embeddings
- The BERT model uses the Google Multilingual BERT Foundation Model
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- sentiment_aggregated-cnn-workflow_lang_en_stock and sentiment-aggregated_cnn-workflow_en_stock
- sentiment_aggregated-bert-workflow_lang_multi_stock and sentiment-document_bert_multi_stock
- pretrained-model_bert_multi_bert_multi_uncased
- watson_nlp.workflows.document_sentiment.BERT
Emotion Classification
Scenario: Classify emotions in tweets into ‘sadness’, ‘joy’, ‘anger’, ‘fear’ and ‘disgust’
Data:
1
2
3
text, anger, disgust, fear, joy, sadness, highest_emotion
@Markgatiss I'm surrounded by those Trump voters. You're right, it is terrifying. #redstate #despair, 0.105250, 0.184527, 0.325083, 0.138259, 0.139309, fear
jstor's lingua obscura articles are always such a delight, 0.019665, 0.010188, 0.021364, 0.981510, 0.051864, joy
Samples:
- Notebook: Pre-Trained Model
- Notebook: Custom Trained
- The aggregated emotion workflow model is an ensemble of the three models TF-IDF SVM, USE (Universal Sentence Encoder) SVM, and CNN
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- emotion_aggregated-workflow_lang_en_stock and emotion_aggregated-workflow_en_stock
- watson_nlp.blocks.classification.cnn.CNN
- watson_nlp.workflows.classification.base_classifier.tfidf_svm.TFidfSvm
- watson_nlp.workflows.classification.base_classifier.use_svm.UseSvm
Keywords and Phrases Extraction
Scenario: Extract noun phrases and keywords from documents
The Watson NLP noun phrases model extracts non-overlapping noun phrases from the input text.
1
2
3
4
Example input:
'Anna went to school at University of California Santa Cruz.'
Example phrases:
'Anna', 'school', 'University of California Santa Cruz'
The Watson NLP keywords model extracts from an input document keywords based on how relevant they are within the document.
1
2
3
4
Example input:
'Anna went to school at University of California Santa Cruz. Anna joined the university in 2015.'
Example keywords:
'Anna' and 'University of California Santa Cruz' are more relevant than 'school' and 'university'.
Sample:
- Notebook: Pre-Trained Model
- Sample includes text pre-processing (stop-words filtering, remove some patterns, lemmatization)
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- noun-phrases_rbr_lang_en_stock and noun-phrases_rbr_en_stock
- keywords_text-rank-workflow_lang_en_stock and keywords_text-rank_en_stock
Topic Modeling
Scenario:
- Extract topics from complaints from banking customers
- As result, customer service teams can mine support tickets to understand what the main problems are and route them to the right experts
- For example, a conversation that includes the words ‘loan’, ‘mortgage’, ‘loan servicing’, or ‘loan payment issues’, could be sent to the loan department for resolution
Output Data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"clusterId": 0,
"numDocuments": 227.0,
"percentage": 2.0183159953765446,
"cohesiveness": 2.172240128162185,
"modelWords": [
"report (report),0.5984445214271545",
"inquiry (inquiry),0.054762981832027435",
...
],
"modelNgram": [
"inquiry report hard,2.3678008318348054",
"inquiry report,2.1766209884405305",
...
],
"modelId": 0,
"topicName": "remove_inquiry_report",
"sentences": [
"noticed Report ( ) account reported closed consumer , checked report .",
"Inaccurate date account closed , summary report states remarks report states .",
...
Samples:
- Notebook: Custom Trained
- Sample shows how to build in a first step a summary model that suggests ngrams sizes (number of words per topic) and a topic model
- A second sample shows the Watson NLP advantages compared to Latent Dirichlet Allocation, e.g. better accuracy and topics don’t have to be predefined
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- watson_nlp.toolkit.NGramSummary
- watson_nlp.blocks.topics.HierarchicalClustering
Entities Extraction
Scenario:
- Extract entities from hotel reviews
- Entities can include names of people, organization names, dates, prices, and facilities
Output Data:
1
2
3
4
5
6
7
8
Input text:
Dorset Square Hotel conveniently located, moderately priced luxury hotel. blocks Wax museum London. street pleasant, peaceful park named (what else) Dorset Square, adds quiet ambiance. proximity busy streets tourist attractions.We nights April, 2000 enjoyed stay. restaurant served excellent breakfast, offered English European fare. Buses taxi's easy catch, great night spots short distance away...
Entities:
Location, street pleasant
Facility, Dorset Square
Date, April, 2000
Organization, English European
Sample:
- Notebook: Pre-trained Models
- The same notebook also demonstrates keywords, phrases and sentiments detecion
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- entity-mentions_bilstm-workflow_lang_en_stock and entity-mentions_bilstm_en_stock
- entity-mentions_rbr_lang_en_stock
- entity-mentions_bert-workflow_lang_multi_stock and entity-mentions_bert_multi_stock
Text Categorization
Scenario:
- Create a hierarchical categorization system for news articles that is more specific than broad categories like politics, sports, and entertainment
- For example, articles on politics can be further categorized into subcategories like crime, war, education etc.
Data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Input:
President issues vow as tensions with China rise. ...
Output:
{
"categories": [
{
"labels": [
"news and politics",
"politics",
"political issues"
],
"score": 0.999511,
"explanation": [
{
"text": "issues"
}
]
...
Sample:
- Notebook: Pre-trained Model
- This notebook demonstrates how to use the Explicit Semantic Analysis (ESA) block for performing text categorization. This model has been pre-trained on scrapped web data & news data set
Used Watson NLP models and libraries:
- syntax_izumo_lang_en_stock and syntax_izumo_en_stock
- categories_esa-workflow_lang_en_stock and categories_esa_en_stock
- watson_nlp.blocks.categories.ESAHierarchical
What’s next?
In addition to the use cases above, Watson NLP also supports the following functionality.
- Relation Extraction
- Coreference
- Concept Extraction
- Topical Clustering
- PII entities
- Categories Detection
- Language Detection
- Detag HTML Tags
To find out more, check out the documentation.