Data Cleansing

Match, Merge, Cleanse Automatically

Our Data Cleansing Solution provides an advanced offering using linguistic analysis to cleanse multiple structured data repositories into a cleansed master index. With the combination of ETL tools, FAST SEARCH and our Data Cleansing Solution, structured data from multiple repositories can be merged to create a clean master index cost-effectively in a matter of weeks.

Silos of dirty data

Large organisations are plagued with disparate silos of data, each source containing their own unique version of the truth, complicated further by the lack of a universal unique identifier to integrate and merge all these records together. This problem creates dirty and incomplete data which directly affects the usefulness of data stores, or any business intelligence initiative that is based on this poor data quality.

Our Data Cleansing solution provides a unique offering that applies ETL technology for data extraction and integration, and FAST’s data processing for linguistic analysis and fuzzy matching, supporting approximate matching on multiple strings, disparate sources can be joined, and missing relationships discovered.

Superior cleansed data for better business decisions

Empower users to develop comprehensive business responses based on heightened data confidence. Features that produce superior data results include:

  • Merging structured and unstructured information into one master repository
  • Ability to link unique identifiers, such as phone numbers and tax IDs, as a means of locating additional information about customers
  • Concatenation of multiple fields such as address and name listings to rank degree of similarity
  • Use of Thesauri and lists of 25 Million proper names to automate approximate matching
  • Synonym expansion to allow for related term matching or association
  • Spell correction

Optimised index for increased ROI

Cleansing names and addresses often requires the use of fuzzy logic as addresses may include abbreviations, misspellings, contain various formats and are often separated into a number of discrete fields. A traditional database will create a data model to reflect field layout; indexing the columns which are expected to be searched often. An optimised database will index not only individual columns but also index the sum of two columns. This method can increase the size of the original data fields by 4 - 5 times, thus requiring significant and costly system resources to store. In comparison, our installations hosted on commodity hardware can often compress data, leading to compact and efficient information retrieval.

Change the game: Information discovery using search

Enterprise Search has proven across numerous markets to be the most scalable and cost-effective technology in knowledge discovery and business intelligence. By incorporating linguistic data cleansing, the broader problem of integrating structured data with unstructured documents is addressed. Metadata management using a combination of ETL tools and linguistic-based document processing provides a unique solution for transforming data into actionable insight.