    Hmm. I just did an experiment creating a .pdf using InDesign and pasting the data from the .pdf into excel (mac) and I don't get the above problem, though for some reason my tab stops turn into daggers rather than being recognized as column breaks and my numerals are converted to latin)

    [nb. I don't speak any Arabic, so I needed a random list of Arabic words — if anyone cares, the above is a list of the bitmap fonts and point sizes included with the very first Arabic version of the mac system software (System 4.1-AB) — Cairo, Baghdad, Geezah, and Nadeem]
    There are a number of factors that can affect directionality of text copied and pasted from a PDF to another application. One factor is how the PDF itself was created, and whether the original text strings are stored in the PDF or Acrobat is attempting to reconstruct them from glyph strings. Another factor is how the target application handles text directionality (not just bidi character directionality, but overall text directionality at the story, paragraph, or line level.

    Displaying the Arabic characters with unconnected forms makes no difference to this sort of issue.
  • I don't have Ideas about pdf create but this is a common issue with Arabic text in converting or the scanning process.
    Here are free & good tools for converting any Arabic file into PDF:
    And here is free & good tool for converting any Arabic PDF into Text:
  • André G. Isaak Copy and paste from PDF is not reliable. PDF can embed fonts, use any encoding, and position characters geometrically somewhere. E. g. the combining marks (accents, diacritics) can be somewhere in the storage order, but positioned somewhere else on the page. The most reliable way to convert PDF to text is convert it to an image and OCR this image.

    If you copy and paste from PDF paste it into an empty file in a plain text editor like TextEdit on the Mac. Then you see what happened. Nearly. Just tried: Copy and paste of BIDI from an editor is not reliable, because it's hard to mark/select a BIDI string. Pure RTL text is easier.

