Padre binaries: padre-di (Padre document information)

Managed by | Updated .

padre-di

The purpose of padre-di is to display information relating to documents within an index.

You can use it to:

  • Spot strange documents in the document table
  • Print metadata information associated with a document

Usage

Check mode

Usage: $ padre-di /opt/funnelback/.../live/idx/index -check

No. docs: 868
0: [884, 127, 0, 62] ...lear.1.2.html
1: [871, 127, 0, 62] ...lear.2.2.html
2: [837, 127, 0, 62] ...lear.2.1.html
3: [717, 127, 0, 62] ...lear.4.3.html
4: [620, 127, 0, 62] ...lear.3.3.html
5: [555, 127, 0, 62] ...lear.5.2.html
6: [819, 127, 0, 62] ...lear.3.7.html
7: [890, 127, 0, 62] ...lear.3.4.html
8: [817, 127, 0, 62] ...lear.3.6.html
9: [981, 100, 0, 62] ...lear.5.3.html

Checking complete.
Content lengths: 446 - 1333 (not meaningful)
Onsite indegree: 63 - 127
Offsite indegree: 0 - 0
URL length (chars): 44 - 87

Notes:

  • The four bracketed values are [Content lengths,Onsite indegree,Offsite indegree,URL length (chars)]
  • The last three only go up to 127, due to an optimisation
  • Documents 0-9 are always printed
  • Also, documents with no content, or no incoming links, or no url length are printed
  • This check doesn't 'fail' like padre-cw can

Metadata inspection mode (URLs)

Displays the indexed metadata by URL for each document in the index.

Usage: $ padre-di /opt/funnelback/.../live/idx/index -meta

No. docs: 868
[86] test-data.funnelback.com/Shakespeare/
t: The Complete Works of William Shakespeare |Comedy|History|Tragedy|Poetry
[585] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.1.html
d: 1597-01-01|1590-01-01
t: SCENE I. London. The palace. |SCENE I. London. The palace.
[574] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.2.html
d: 1597-01-01|1590-01-01
t: SCENE II. London. An apartment of the Prince's. |SCENE II. London. An apartment of the Prince's.
[579] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.3.html
d: 1597-01-01|1590-01-01
....

Notes:

  • Sorted lexicographically by the document URL
  • Prints numeric metadata as strings (only important if they're different)

Metadata inspection mode (document numbers)

Displays the indexed metadata by document number for each document in the index.

Usage: $ padre-di /opt/funnelback/.../live/idx/index -metad

No. docs: 868
[86] test-data.funnelback.com/Shakespeare/
[86] t: The Complete Works of William Shakespeare |Comedy|History|Tragedy|Poetry
[585] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.1.html
[585] d: 1597-01-01|1590-01-01
[585] t: SCENE I. London. The palace. |SCENE I. London. The palace.
[574] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.2.html
[574] d: 1597-01-01|1590-01-01
[574] t: SCENE II. London. An apartment of the Prince's. |SCENE II. London.
An apartment of the Prince's.
[579] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.3.html
[579] d: 1597-01-01|1590-01-01
....
$ padre-di /opt/funnelback/.../live/idx/index -metad iv.1.1

No. docs: 868
[585] test-data.funnelback.com/Shakespeare/1henryiv/1henryiv.1.1.html
[585] d: 1597-01-01|1590-01-01
[585] t: SCENE I. London. The palace. |SCENE I. London. The palace.
[69] test-data.funnelback.com/Shakespeare/2henryiv/2henryiv.1.1.html
[69] d: 1598-01-01
[69] t: SCENE I. The same. |SCENE I. The same. 

Field #docs #chars
d: 2 31
t: 2 96

Notes:

  • -metad is the same as -meta, but adds the docid to every line
  • You can supply a pattern after -meta or -metad
  • The pattern is an exact substring match
  • Every -meta mode has the summary at the end
Was this artcle helpful?

Comments