Padre-fl: clear or set arbitrary document flags
Managed by | Updated .
Background
This article describes how to clear arbitrary document flags using padre-fl.
Note
The following commands operate directly on the index and should be used with caution.
Clear individual document flags on a set of URLs
The following commands can be used to clear the respective flag bits in the index.
For each command:
- INDEX_STEM: is the index stem for the collection's index
- URL_LIST: is a file containing the URL pattern of items to match when the command runs. The file strings that are matched to the start of URLs (same format as for a kill_partial.cfg). Alternatively use - as the URL_LIST to specify a URL pattern on STDIN.
Clear expired_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7fe AND
Clear killed_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7fd AND
Clear duplicate_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7fb AND
Clear noindex_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7f7 AND
Clear filtered_binary_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7ef AND
Clear documents_without_an_early_binding_security_lock flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7df AND
Clear documents_with_paid_ads flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 7bf AND
Clear unfiltered_binary_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 77f AND
Clear documents_matching_admin_specified_regex flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 6ff AND
Clear noarchive_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 5ff AND
Clear nosnippet_documents flag
Use the following padre-fl command:
padre-fl <INDEX_STEM> <URL_LIST> -bits 3ff AND
Technical steps
Padre-fl when used with the -bits option combines the supplied hex string with the selected rows of the document flag table with the specified logical operation.
The flags table consists of 11 bits which are either set (1) or not set (0). The bits (from right to left) are:
- expired_documents
- killed_documents
- duplicate_documents
- noindex_documents
- filtered_binary_documents
- documents_without_an_early_binding_security_lock
- documents_with_paid_ads
- unfiltered_binary_documents
- documents_matching_admin_specified_regex
- noarchive_documents
- nosnippet_documents
A document can have several flags set. These can be set or cleared using padre-fl with the -bits option, a bitmask to apply and logic operation to use. Using this knowledge it is possible to set or clear multiple bits in one operation
For example to clear just the duplicate bit (as above) works as follows.
Figure out which column contains the duplicate bit and represent this as a binary number. The duplicate_documents flag is the third bit and corresponds to:
00000000100
To clear this bit it needs to combined (using a logical AND operation) with the following bitmask:
11111111011
or 7fb in hexadecimal.
This correspond to the following padre-fl command
padre-fl <INDEX_STEM> <URL_LIST> -bits 7fb AND
The same process can be applied to multiple items using the same process as outlined above.