How to fix WordPress import with wpautop

While working on a large-ish WordPress post import (about 3,500 posts), I encountered a bug in the WordPress import utility. The gist of the problem is that double-newlines (aka paragraph breaks) get converted to single newlines (aka line breaks). It doesn’t happen all the time, but it is a common occurrence. This causes what would have become an automatic <p>…</p> to be turned into a <br />, which completely screws up the formatting of a post. See “double line breaks changed to single line breaks after importing xml file” at WordPress.org for the gory details. I had to learn how to fix WordPress import if I wanted to avoid the copy-and-paste routine for an entire site.

In a post in that thread, I volunteered up a utility that uses the WordPress wpautop function to pre-wrap the content with the appropriate <p> tags before feeding the xml file to the import utility. The details on how to use WordPress in a non-interactive “batch” CLI process are omitted from the code below, but I’ve stashed the entire fix-wordpress-export-wpauto.php file away for you to download later. Here are the interesting bits:

&amp;lt;?php

$accum  = 0;
$buffer = '';

while ( $line = fgets( STDIN ) ) {
    $line = preg_replace( '/\r\n/', "\n", $line );
    $line = preg_replace( '/\r/',   "\n", $line );

    $start = false;
    $end   = false;
    if ( preg_match( '/^\s*&amp;lt;content:encoded&amp;gt;&amp;lt;!\[CDATA\[/', $line ) ) {
        $line = preg_replace( '/^\s*&amp;lt;content:encoded&amp;gt;&amp;lt;!\[CDATA\[/', '', $line );
        $start = true;
    }
    if ( preg_match( '/\]\]&amp;gt;&amp;lt;\/content:encoded&amp;gt;\s*$/i', $line ) ) {
        $line = preg_replace( '/\]\]&amp;gt;&amp;lt;\/content:encoded&amp;gt;\s*$/i', '', $line );
        $end = true;
    }

    if ( $start &amp;amp;&amp;amp; $end ) {
        echo $line;
    } elseif ( $start ) {
        $accum = true;
        $buffer = $line;
    } elseif ( $end ) {
        $accum = false;
        $buffer .= $line;
        echo '&amp;lt;content:encoded&amp;gt;&amp;lt;![CDATA[' . wpautop( $buffer ) . ']]&amp;gt;&amp;lt;/content:encoded&amp;gt;';
    } else {
        if ( $accum ) {
            $buffer .= $line;
        } else {
            echo $line;
        }
    }
}

exit(0);

What this code does:

  • Read thru the input on STDIN.
  • Convert Windows and MacOS line endings to the one true *nix-style newline.
  • For each post’s content, which is found between <content:encoded>…</content:encoded> XML tags in the input, accumulate that content spanning multiple lines into one buffer.
  • Call wpautop on that buffer to change the content so that paragraph tags are correctly inserted around the text blocks (and only the text blocks) that need them.
  • Write the whole file out to STDOUT.

I hope this utility helps you learn how to fix WP import.


		

6 thoughts on “How to fix WordPress import with wpautop”

  1. Sorry if I’m being dense. This sounds like exactly what I’m looking for but I don’t understand how to use it. Where do I put the file and how do I feed it the XML file?

    Reply
    • Unzip the file into the same directory you put your XML file into, then run this on the command line:

      php fix-wordpress-export-wpautop.php < original-export-file.xml > new-export-file.xml

      Then run your import.

      Reply
  2. Hi, Scott…

    Thank you very much for this. We needed it, and it has worked brilliantly for us.

    I note that on your site, just now, when I click on your hamburger menu, then “Contact”, I’m delivered to a page when the bulk of the content is

    Contact

    Contact

    and no real contact information. Then, when I press the Back button in the browser (Chrome), the hamburger menu vanishes.

    Please let me know if you’d like help troubleshooting or reproducing this.

    Reply
    • Oh, sorry about that, I am in the middle of leaving Elementor for GeneratePress, and things are broken all over. I don’t use this site much and it doesn’t get much traffic, so I just wasn’t worried about the breakage yet.

      Reply

Leave a Comment