Enabling PDF Content Crawling

Overview

Sometimes, customers may want to search for the PDF files and their content on their DNN sites. By default, PDF content crawling is not enabled and you will need to install a compatible PDF IFilter to enable content crawling for the .PDF file extension.

 

Prerequisites

  • Administrator rights on the operating system where the DNN instance is installed.
  • Superuser access to the DNN site.

 

Diagnosis

To be able to search for PDF contents, you would need a compatible PDF IFilter installed. You can check if the PDF IFilter is installed by going into Settings > Site Settings > Search > File Extensions and check the .pdf file extension. This should have a checkmark if it is enabled under the Content Crawling column.

 

2020-06-02_1609.png

 

Back to Top


 

Solution

<supportagent>

Please refer to Handling Third-Party Component Integrations if you have any process related questions on handling related tickets.

</supportagent>

  1. Download and install the a PDF IFilter. Few examples of PDF IFilters are,


    Note: We've done limited testing with DNN Sharp, Foxit and PDFlib's IFilter solutions and found no issues so far. However, they are currently not officially supported due to them being third party products.
  2. Once installed, navigate to the Persona Bar > Settings > Site Settings > Search > File Extensions. Here, you will find the PDF Filter enabled for content crawling, as shown in the image below.

    File.Extensions.png

  3. Navigate to the Persona Bar > Settings > Scheduler > Scheduler tab.

  4. Now, you must force run three (3) scheduled tasks to index the site again. These tasks are File, Site, and URL Crawlers. For each of these tasks, click the pencil icon to edit it and click Run Now.

 

Additional Information

  • Content Crawling will not be enabled in your application pool if it does not match the bit parity of the PDF IFilter you are installing.

    Check the advanced settings for your application pool and browse to the Enable 32-Bit Applications setting. For an x64 bit IFilter, the setting must be set to False.

    2020-03-11_1031.png

 

Testing

Once all three (3) scheduled tasks have run successfully:

  1. Go to the Search Results page.
  2. Search for a keyword that is inside the PDF.
  3. The relevant PDF file with the keyword should appear in the results.

 

Note: If the changes do not take effect after all 3 scheduled tasks have run, perform a site re-index for the changes to take effect. Please remember a re-index should not be done during peak hours to avoid performance issues on the site.

 

Back to Top

Comments

0 comments

Please sign in to leave a comment.