Character encoding: Index file

Managed by | Updated .

It's interesting to look at the index file, because PADRE will have processed the content and possibly transformed it before storing it on disk. For example, it's likely that HTML entities will get decoded, so it's worth checking that they were correctly decoded by PADRE.

You can use the padre-sr utility to index the index content:

Using padre-sr to index the index content.
# You can get the "docnum" 3 by looking at the search results XML or JSON
./bin/padre-sr data/squiz-forum-funnelback/live/idx/index 3 1
 
# TODO padre-sr broken on search internal :(
# Workaround using padre-di:
 
./bin/padre-di data/squiz-forum-funnelback/live/idx/index -meta showtopic=12515
Stem = 'data/squiz-forum-funnelback/live/idx/index'
No. docs: 12
[3] forums.squizsuite.net/index.php?showtopic=12515
a: Cromers
c: Funnel back search doesn't allow empty search query - posted in Funnelback: Hi   We have 3 funnelback searches on our site: Business Finder; Course Finder; Job Finder   The first 2 will display all the available results when the page is initially navigated to, however the Job finder only returns result when a query is entered. Is there a setting to fire a blank search that I need to turn on?|Funnel back search doesn't allow empty search query - posted in Funnelback: Hi   We have 3 funnelback searches on our site: Business Finder; Course Finder; Job Finder   The first 2 will display all the available results when the page is initially navigated to, however the Job finder only returns result when a query is entered. Is there a setting to fire a blank search that I need to turn on?
d: 2014-09-22
e: article
s: empty query|search
t: Funnel back search doesn't allow empty search query - Funnelback - Squiz Suite Support Forum
A: Cromers|Aleks Bochniak|Cromers|gordongrace|Cromers|Benjamin Pearson|gordongrace
D: 2014-09-22T09:42:21+00:00|2014-09-22T12:22:33+00:00|2014-09-22T12:43:44+00:00|2014-09-22T13:05:51+00:00|2014-09-22T13:28:39+00:00|2014-09-22T22:57:07+00:00|2014-09-23T08:04:39+00:00
I: http://www.gravatar.com/avatar/e76d223e86038cb76ff3b829ded20999?s=100&d=http%3A%2F%2Fforums.squizsuite.net%2Fpublic%2Fstyle_images%2Fmaster%2Fprofile%2Fdefault_large.png|http://forums.squizsuite.net/uploads/av-1008.jpg?_r=1180422795|http://www.gravatar.com/avatar/e76d223e86038cb76ff3b829ded20999?s=100&d=http%3A%2F%2Fforums.squizsuite.net%2Fpublic%2Fstyle_images%2Fmaster%2Fprofile%2Fdefault_large.png|http://forums.squizsuite.net/uploads/profile/photo-thumb-11031.jpg?_r=1387174259|http://www.gravatar.com/avatar/e76d223e86038cb76ff3b829ded20999?s=100&d=http%3A%2F%2Fforums.squizsuite.net%2Fpublic%2Fstyle_images%2Fmaster%2Fprofile%2Fdefault_large.png|http://forums.squizsuite.net/uploads/av-2430.png?_r=0|http://forums.squizsuite.net/uploads/profile/photo-thumb-11031.jpg?_r=1387174259
S: Hi   We have 3 funnelback searches on our site: Business Finder; Course Finder; Job Finder   The first 2 will display all the available results when the page is initially navigated to, however the Job finder only returns result when a query is entered. Is there a setting to fire a blank search that I need to turn on?|try setting your query parameter to be a null query.   eg. query=!padrenullquery|Hi Aleks   Where do I put this?|Hi Cromers -   Firing a 'blank' or 'null' query is intended to 'show all results'.    Simply showing all results has its own issues, though: How will they be sorted?  Date? Title? Size? How will the query summary message be displayed?  "You searched for XXX."? What should be logged by the search analytics system? Once you've attempted to answer those questions for yourself, enabling support for null queries is reasonably straightforward.   If you're using the deprecated Classic UI, you can add the following to collection.cfg: ui.null_query_enabled=true| If you're using the Modern UI, two hook scripts will be required:   hook_pre_process.groovy: |// Fix to enable ui.null_query_enabled functionality|if (transaction.question.query == null) {| // query must be set to something or padre isn't called _ is stripped out by padre when processing the query| transaction.question.query = "_"| transaction.question.originalQuery = "_"| // set the system query value to run a null query| transaction.question.additionalParameters["s"] = ["!padrenull"]| }| hook_post_process.groovy: |// Allow the modern UI to handle an undefined queryCleaned value (will occur for the above code as s params aren't included in queryClean)|if ( transaction.response != null && transaction.response.resultPacket.queryCleaned == null)|{| transaction.response.resultPacket.queryCleaned ="";|}| See also: http://docs.funnelba...ok_scripts.html||Hi Gordon   I'm not entirely sure if we are using the Modern UI.  We've got Funnelback 13.0 and Matrix 4.14.0.   We already have the search page in place that handles the results with regard to results per page, sort order and filtering. We just want all the results to be returned to the user when they navigate to the search page, in the same way that out other 2 funnelback searched work.|It looks like Funnelback 13.0 does support it... http://docs.funnelba...ok_scripts.html|Hi Cromers -   If you're using Funnelback 13.0, you'll probably be using the Modern UI by default (although support for the Classic UI is still available in that version).   See also: http://docs.funnelba..._interface.html
U: http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54396|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54397|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54398|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54399|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54400|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54403|http://forums.squizsuite.net/index.php?s=6258edbbc08a5347636117c80372a804&showtopic=12515#entry54404

We can see that:

  • The non breaking space has indeed been stored as two ISO-8859-1 sequences  and the invisible non-breaking space.
  • The apostrophe in the summary is correctly stored:  That gives us a clue that the problem with the apostrophe is probably at rendering time, rather than crawling / indexing time.

If you find a problem at this stage (e.g. the HTML entity for the apostrophe was incorrectly decoded), it's likely to be a bug in PADRE. Report it with a sample document to reproduce the problem. It's unlikely to happen though, because the PADRE code about interpreting content and indexing it is very well exercised. Usually, when the content is OK, PADRE will index it correctly.

Was this artcle helpful?

Comments