|
|
| |
Perplexity of Company Name Search Revealed
Data retrieval, the most fundamental requisite for information systems, is seemingly uncomplicated yet
extraordinarily elusive to implement. Boolean expressions accurately record digital information, yet obscure
the inherent symbolism and patterning readily achieved through inherent human cognitive understanding.
The inability to reconstitute ideas, images and memories from generalized concepts and utilize the resulting
abstractions for hypothesizing, categorization and problem solving is the single greatest impediment facing
the evolution of intelligent systems.
Routinely, technologists are unknowingly faced with the "abstraction-categorization" barrier when asked to
construct applications that query text based identity information.
The failure of search systems to satisfy an end user's expectations is caused by people's innate ability
to extrapolate meaning from fragmented data where computerized systems require exacting precision in order
to retrieve information found in detailed binary data.

Corporate name searching concretely illustrates the pragmatic difficulties in developing solutions that find correct
information without missing likely candidates. People readily understand the similarities between "Triple A towing"
and "AAA towing" yet computerized systems would need to employ a knowledge based algorithm to recognize the
relationship between Triple A and AAA.
The deployment of intelligence through knowledge based systems greatly benefits search and matching algorithms by
identifying nicknames, shortened forms, noise words and other circumstances that require experience to return a
more comprehensive result set. However, knowledge based systems are limited by the breadth and depth of their
lexicon. Contrary to names such as IBM and AT&T, the vast majority of acronyms lie outside the scope of
knowledge base processing. For example, our clients often used the IST acronym interchangeably with
Intelligent Search Technology yet it would be unreasonable to expect the inclusion of IST in a knowledge based system.
Exacerbating the company name search problem is the use of elements that have varying implications based on
their context necessitating the need for contradictory rules. The use of "and" in the name "Judy Ann and
Richard Scott Wagner LLP" as compared to "Kohler and Barnes LLP" demonstrate a typical contradictory scenario.
In the first instance, the "and" denotes two distinct name units (Richard Scott Wagner and Judy Ann Wagner)
as compared to the later where the "and" should be ignored.
Similarly, the most robust knowledge based systems fail to account for the infinite deviations created by
combinations of misspellings, transpositions and phonetic variations.
Sequence variation, extra or missing information further complicates the successful retrieval of relevant data.
Many systems fail to return "Bush, Cheney, Powell and Rumsfeld" when only the "Rumsfeld and Cheney" name elements
were used to initiate a search. Regrettably the inability of the airline's electronic detection systems to
overcome this class of variations resulted in the failure to identify the 9/11 hijackers.
Through decreasing distinctiveness, generalizing transformation functions combined with the removal and masking
of variations limit accuracy and introduce increased potential for invalid matches. The consequence of increasing
variation tolerance is the expansion of result sets. Not only are performance penalties incurred as result
sets get larger but the inclusion of irrelevant information increases.
Full table scan intelligent string based comparison solutions may achieve high levels of accuracy but are
computationally intensive and limited to small data sources. Until highly parallelized computers with
thousands of processing nodes become commercially available non indexed based solutions will be limited
to fairly modest sized data stores. Any enhancement made to this class of solutions increases
computational load by multiplying the execution time of the additional instructions with the number
of comparisons needed to scan the data source. Intelligent string based comparison solutions become
impracticable as the comparison functions become more robust and data stores increase in size.
The NameSearch® software with its corporate search algorithms and acronym recognition
subroutines significantly
advances an information system's ability to seek and match corporate name data.
NameSearch® enables systems to make and apply abstractions without ignoring discrepancies. The searching
function casts an abstraction net that captures likely candidates. Utilizing rule-based expertise,
knowledge enhanced phonetic recognition, sanitization, acronym recognition and multiple pathing technologies
the NameSearch® routine overcomes discrepancies while minimizing the inclusion of irrelevant records.
The NameSearch® match engine is employed to intelligently rank or eliminate unlikely candidates.
The corporate name scoring routine derives results through its evaluation of a neural network.
The neural network is created by comprehending the degree of similarity between two names. The advanced
heuristic pattern recognition subprogram receives information from the raw input combined with observations
made by the knowledge base and phonetic routines. The pattern recognition facility digests the information
and constructs weighted nodes that are placed on the neural net. Once the neural net is completed it is
optimistically evaluated and intelligent scores rendered.
NameSearch® achieves unsurpassed performance through returning only relevant data reducing both computationally
intensive matching activities and the utilization of expensive input/output operations.
With NameSearch® information systems will attain unsurpassed levels of matching performance, accuracy and reliability.
|
|