Managed by | Updated .
Clean data as close to the source as possible. Implementations should aim to avoid custom data cleansing workflow steps.
Should a filter or a hook script be used?
Data cleansing efforts should be applied as close to the source as possible. The order of priority for cleaning should be:
- Source: Can you arrange for the data to be as close as possible to the expected format? Can you gather only what is needed (include / exclude patterns, noindex tags)?
- Custom filter (Groovy)
- Hook scripts (Groovy)
- Server-side template (Freemarker)
Additionally content cleaned close to the source benefits other systems. For example cleaning code in the Freemarker template does not affect the JSON and XML output. Cleaning done in a hook script will not affect the cached copy of the document, etc.