Two sided match

Multiple systems can have incomplete data.

While the simple case for missing records involves one dataset (spreadsheet, database, system) lacking values to fully match another, over time or with multiple sources-of-truth, rows can be missing on either side.

The root causes of such scenarios are various:

  • Data was imported and not kept up to date
  • Fields being used to match have been renamed
  • Data deleted or archived from one system, without realisation of downstream impact
  • New alledged transactions from 3rd party, not yet defined in core platforms

order-customers.csv

reference customer
567-A alpha
345-B bronco
123-C charlie
789-D delta
445-E echo

crm-customers.csv

id customer
4372 alpha
1982 charlie
2199 delta
8123 gamma
8818 Echo ltd

If we match this, using customer as the common field, or key:

reference order-customer crm-customer id
567-A alpha alpha 4372
345-B bronco    
123-C charlie charlie 1982
789-D delta delta 2199
    gamma 8123
445-E echo Echo ltd 8818

You can see “bronco” and “gamma” lack matching records, and also that “echo” will need a modified rule in order to be considered a match. Such “fuzzy” matching can be either pre-defined or learnt by the platform based on user actions (coming soon).