Multiple systems can have incomplete data.
While the simple case for missing records involves one dataset (spreadsheet, database, system) lacking values to fully match another, over time or with multiple sources-of-truth, rows can be missing on either side.
The root causes of such scenarios are various:
- Data was imported and not kept up to date
- Fields being used to match have been renamed
- Data deleted or archived from one system, without realisation of downstream impact
- New alledged transactions from 3rd party, not yet defined in core platforms
order-customers.csv
reference | customer |
---|---|
567-A | alpha |
345-B | bronco |
123-C | charlie |
789-D | delta |
445-E | echo |
crm-customers.csv
id | customer |
---|---|
4372 | alpha |
1982 | charlie |
2199 | delta |
8123 | gamma |
8818 | Echo ltd |
If we match this, using customer as the common field, or key:
reference | order-customer | crm-customer | id |
---|---|---|---|
567-A | alpha | alpha | 4372 |
345-B | bronco | ||
123-C | charlie | charlie | 1982 |
789-D | delta | delta | 2199 |
gamma | 8123 | ||
445-E | echo | Echo ltd | 8818 |
You can see “bronco” and “gamma” lack matching records, and also that “echo” will need a modified rule in order to be considered a match. Such “fuzzy” matching can be either pre-defined or learnt by the platform based on user actions (coming soon).