heidloff.net - Building is my Passion

Structuring semi-structured Documents

Watson Discovery is an IBM offering to search and analyze information in various types of documents. This post describes how the graphical interface in Discovery can be used to structure information in documents to reduce noise.

Let’s take a look at an example. In the sample scenario data is used from the US Securities and Exchange Commission. The goal is to improve the quality of the result for the question “What is the purpose of Rule 15c3-5?”.

By default Watson Discovery returns the right answer from page 81.


But the result includes the ‘noise’ from the previous sentence.


To improve this, the document can be structured by identifying sections graphically. The following screenshot shows how footnotes, subtitles, headers, etc. can be defined by selecting text in the right column.


As result Discovery understands the different sections in the document and only returns the actual sentence of the answer.


To find out more about Watson Discovery, check out the documentation.

Featured Blog Posts
The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.
Trending Tags