Padre-fl: clear or set arbitrary document flags

Managed by | Updated .

Background

This article describes how to clear arbitrary document flags using padre-fl.

Note

The following commands operate directly on the index and should be used with caution.

Clear individual document flags on a set of URLs

The following commands can be used to clear the respective flag bits in the index.

For each command:

  • INDEX_STEM: is the index stem for the collection's index
  • URL_LIST: is a file containing the URL pattern of items to match when the command runs. The file strings that are matched to the start of URLs (same format as for a kill_partial.cfg). Alternatively use - as the URL_LIST to specify a URL pattern on STDIN.

Clear expired_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7fe AND

Clear killed_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7fd AND

Clear duplicate_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7fb AND

Clear noindex_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7f7 AND

Clear filtered_binary_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7ef AND

Clear documents_without_an_early_binding_security_lock flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7df AND

Clear documents_with_paid_ads flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 7bf AND

Clear unfiltered_binary_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 77f AND

Clear documents_matching_admin_specified_regex flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 6ff AND

Clear noarchive_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 5ff AND

Clear nosnippet_documents flag

Use the following padre-fl command:

padre-fl <INDEX_STEM> <URL_LIST> -bits 3ff AND

Technical steps

Padre-fl when used with the -bits option combines the supplied hex string with the selected rows of the document flag table with the specified logical operation.

The flags table consists of 11 bits which are either set (1) or not set (0). The bits (from right to left) are:

  1. expired_documents
  2. killed_documents
  3. duplicate_documents
  4. noindex_documents
  5. filtered_binary_documents
  6. documents_without_an_early_binding_security_lock
  7. documents_with_paid_ads
  8. unfiltered_binary_documents
  9. documents_matching_admin_specified_regex
  10. noarchive_documents
  11. nosnippet_documents

A document can have several flags set. These can be set or cleared using padre-fl with the -bits option, a bitmask to apply and logic operation to use. Using this knowledge it is possible to set or clear multiple bits in one operation

For example to clear just the duplicate bit (as above) works as follows.

Figure out which column contains the duplicate bit and represent this as a binary number. The duplicate_documents flag is the third bit and corresponds to:

00000000100

To clear this bit it needs to combined (using a logical AND operation) with the following bitmask:

11111111011

or 7fb in hexadecimal.

This correspond to the following padre-fl command

padre-fl <INDEX_STEM> <URL_LIST> -bits 7fb AND

The same process can be applied to multiple items using the same process as outlined above.

Was this artcle helpful?