Data cleansing

Managed by | Updated .

Clean data as close to the source as possible. Implementations should aim to avoid custom data cleansing workflow steps.

Should a filter or a hook script be used?

Data cleansing efforts should be applied as close to the source as possible. The order of priority for cleaning should be:

  • Source: Can you arrange for the data to be as close as possible to the expected format? Can you gather only what is needed (include / exclude patterns)?
  • Custom filter (Groovy)
  • Hook scripts (Groovy)
  • Server-side template (Freemarker)
  • Client-side scripting (Javascript)

The rationale is that the farthest you get from the data, the hardest it is to understand the cleansing code. For example having Javascript code correct something in the data for display would require an implementer to inspect the Javascript, then FreeMarker, then the hook scripts, the filters and finally the data to be able to understand what the Javascript is doing.

Was this artcle helpful?