Character encoding: query processor

Managed by | Updated .

Background

The following details how to check that the query processor is returning the content in correct form.

Steps

Run the query processor on the command line:

Terminal
./bin/padre-sw data/squiz-forum-funnelback/live/idx/index -res=xml
v:12515
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<PADRE_result_packet>
  ...
 
<results>
  ...
  <result>
  <rank>1</rank>
  <score>1000</score>
  <title>Funnel back search doesn&#39;t allow empty search query - Funnelback - Squiz Suite Support Forum</title>
  <collection>squiz-forum-funnelback</collection>
  <component>0</component>
  <live_url>http://forums.squizsuite.net/index.php?showtopic=12515</live_url>
<summary><![CDATA[Funnel back search doesn't allow empty search query - posted in Funnelback: Hi   We have 3 funnelback searches on our site: Business Finder; Course Finder; Job Finder   The first 2 will display all the available resul
ts when the page is]]></summary>
   ...

We can see that:

  • The apostrophe in the title is represented as &#39; . While not strictly necessary (an apostrophe is a valid XML character), that's still a  valid way to represent an apostrophe in XML, so nothing wrong here
  • The apostrophe in the summary is represented as is, but that's because the summary is enclosed in a CDATA block. In CDATA blocks content is interpreted as is, so &#39; would have stayed &#39; rather than being interpreted as an apostrophe.
Was this artcle helpful?