How to search a document on Linux while ignoring diacritics (harakat/accents)

The Problem

Most applications are not smart enough to ignore accents when searching through the text of a document. Here is a screenshot of LibreOffice 5.2 failing at finding the word Arabic word “bsm” because I didn’t type in every single diacritic:

This is an especially serious problem when searching through Arabic text because the usage of diacritics is totally inconsistent as they are not strictly necessary. Different levels of diacritics are added according to the level of user-friendliness that is desired by the document creator.

Firefox is equally miserable at searching Arabic text:

The Solution

The solution is to open the document in a WebKit-based web browser, which has sensible handling of diacritics. Below is a screenshot of the open source Midori browser succeeding at finding and highlighting the Arabic word I was searching for even though I didn’t type in the diacritics:

Other WebKit browsers include Chromium and Chrome, both by Google. I would rather use a non-Google browser personally, so Midori is my preferred option.

If your document is not in the HTML format (the format that browsers use), you can use LibreOffice or Microsoft Word (etc.) to save it as HTML.

Leave a Reply

Your email address will not be published.