Versioning

When performing ad-hoc data matching, the matter of versioning typically comes up. CloudSinc has robust multi-version tracking, allowing retention and simple comparison of dataset versions. A consequence of this is that you can also track changes between versions of a single data set - this can help disambiguate data problems and identify unexpected changes across time.

While comparison of two sets of data is useful as a one-off, commonly the same process must be repeated daily / monthly or even annually (as part of year-end data reporting).

Two main versioning approaches exist, though as the structured page discusses, it’s often a choice of representation versus strict boundary between them, and changes in upstream systems create further complications.

Snapshot replacement

This is somewhat the simple case. You have a set of data, 50 rows (or 50 million..) and match it as normal. When a new snapshot is available, the previous data is replaced and only the new version is considered.

One consideration here, is that some previous effort may have been done in terms of acknowledging previous breaks and/or alerting up/down-stream teams/systems. Sometimes this is minimal or automated, but is a driver for incremental versioning.

Incremental

With incremental, rather than replace the whole record set, updated & new records are added based on a business key. The key can simply be a field, such as a transaction reference but can also be implied by the import - such as the day or month. Combinations of fields and versions can also be used.

The incremental approach views multiple imports as individual fragments of a larger logical data set.