While working on a large-ish WordPress post import (about 3,500 posts), I encountered a bug in the WordPress import utility. The gist of the problem is that double-newlines (aka paragraph breaks) get converted to single newlines (aka line breaks). It doesn’t happen all the time, but it is a common occurrence. This causes what would have become an automatic <p>…</p> to be turned into a <br />, which completely screws up the formatting of a post. See “double line breaks changed to single line breaks after importing xml file” at WordPress.org for the gory details. I had to learn how to fix WordPress import if I wanted to avoid the copy-and-paste routine for an entire site.
In a post in that thread, I volunteered up a utility that uses the WordPress wpautop function to pre-wrap the content with the appropriate <p> tags before feeding the xml file to the import utility. The details on how to use WordPress in a non-interactive “batch” CLI process are omitted from the code below, but I’ve stashed the entire fix-wordpress-export-wpauto.php file away for you to download later. Here are the interesting bits:
&lt;?php $accum = 0; $buffer = ''; while ( $line = fgets( STDIN ) ) { $line = preg_replace( '/\r\n/', "\n", $line ); $line = preg_replace( '/\r/', "\n", $line ); $start = false; $end = false; if ( preg_match( '/^\s*&lt;content:encoded&gt;&lt;!\[CDATA\[/', $line ) ) { $line = preg_replace( '/^\s*&lt;content:encoded&gt;&lt;!\[CDATA\[/', '', $line ); $start = true; } if ( preg_match( '/\]\]&gt;&lt;\/content:encoded&gt;\s*$/i', $line ) ) { $line = preg_replace( '/\]\]&gt;&lt;\/content:encoded&gt;\s*$/i', '', $line ); $end = true; } if ( $start &amp;&amp; $end ) { echo $line; } elseif ( $start ) { $accum = true; $buffer = $line; } elseif ( $end ) { $accum = false; $buffer .= $line; echo '&lt;content:encoded&gt;&lt;![CDATA[' . wpautop( $buffer ) . ']]&gt;&lt;/content:encoded&gt;'; } else { if ( $accum ) { $buffer .= $line; } else { echo $line; } } } exit(0);What this code does:
- Read thru the input on STDIN.
- Convert Windows and MacOS line endings to the one true *nix-style newline.
- For each post’s content, which is found between <content:encoded>…</content:encoded> XML tags in the input, accumulate that content spanning multiple lines into one buffer.
- Call wpautop on that buffer to change the content so that paragraph tags are correctly inserted around the text blocks (and only the text blocks) that need them.
- Write the whole file out to STDOUT.
I hope this utility helps you learn how to fix WP import.
Sorry if I’m being dense. This sounds like exactly what I’m looking for but I don’t understand how to use it. Where do I put the file and how do I feed it the XML file?
Unzip the file into the same directory you put your XML file into, then run this on the command line:
php fix-wordpress-export-wpautop.php < original-export-file.xml > new-export-file.xml
Then run your import.
Thanks for this. It works perfectly.
You are quite welcome.
Hi, Scott…
Thank you very much for this. We needed it, and it has worked brilliantly for us.
I note that on your site, just now, when I click on your hamburger menu, then “Contact”, I’m delivered to a page when the bulk of the content is
Contact
Contact
and no real contact information. Then, when I press the Back button in the browser (Chrome), the hamburger menu vanishes.
Please let me know if you’d like help troubleshooting or reproducing this.
Oh, sorry about that, I am in the middle of leaving Elementor for GeneratePress, and things are broken all over. I don’t use this site much and it doesn’t get much traffic, so I just wasn’t worried about the breakage yet.