Wednesday, October 28, 2009

Those annoying hard returns when you paste text from a pdf or email

There's one major reason why I cling to Microsoft Word as my primary word processor. It's the ability to fairly easily deal with the mess that you get when you copy and paste text from either a pdf document or an email.
I am sure you will have already discovered that the text frequently comes across with either a hard return (ie, a paragraph return) or a soft return at the end of every line. It's like someone has pressed the enter key or else shift-enter after each line. And, of course, you want the text to flow on as it should. Here's the method I have developed to sort this out:
1) Enter your cursor at the top of the text, and press Ctrl-H, which brings up the Find and Replace dialogue box.
2) In the top field, enter ^p^p. (The symbol ^ is called a carat, by the way.) This will find all instances where there is a double paragraph return, which usually marks where the real paragraph breaks occur.
3) In the bottom field, enter some characters that don't occur in the text, like ##.
4) Click "Find all". You have now created markers for the real paragraphs.
5) In the top field of Find and Replace, enter a single carat and p, eg ^p.
6) In the bottom field, delete the ## and enter a spacebar space.
7) Click "Find all". You have now stripped out all the paragraph returns.
8) To restore the real paragraph breaks, enter ## in the top field, and ^p^p in the bottom field. Click "Find All".
9) Finally, to tidy up, in the top field enter a double spacebar space, and in the bottom field a single spacebar space. This will strip out all the unwanted double spaces.
Voila! The text is now how it should be.
If the document has come across with soft returns rather than hard (paragraph) returns, the same technique can be used. Just insert the search characters ^l instead of ^p. This may seem rather convoluted the first time you try it, but it becomes second nature pretty quickly.
Sadly, only Word seems to have this ability to search using wild characters for paragraph and soft returns - believe me, I have tried all the others, to no avail. Incidentally, the search characters for tab breaks is/are ^t.

No comments: