Sillybean

Import HTML Pages

This plugin will import a directory of files as either pages or posts, according to configurable settings. You may specify the HTML tag containing the content you want to import (e.g. <body>, <div id=”content”> or <td width=”732″>) or the name of a Dreamweaver template region (e.g. “Main Content”).

If importing pages, the directory hierarchy will be preserved. Directories containing the specified file types will be imported as empty parent pages. Directories that do not contain the specified file types will be ignored.

As files are imported, the resulting IDs, permalinks, and titles will be displayed. On completion, the importer will provide a list of Apache redirects that can be used in your .htaccess file to seamlessly transfer visitors from the old file locations to the new WordPress posts or pages.

Options:

  • import pages or posts
  • specify content and title as HTML tags or Dreamweaver template regions
  • remove a common phrase (such as the site name) from imported titles
  • specify file extensions to import (e.g. html, htm, php)
  • specify directories to exclude (e.g. images, css)
  • if importing pages, specify whether your top-level files should become top-level pages or children of an existing page
  • choose tags, categories, and custom taxonomies
  • choose status, author, and timestamp
  • use meta descriptions as excerpts
  • choose whether to clean up bad (Word, Frontpage) HTML

Requires PHP 5.

Download at wordpress.org »

Files to be imported

Files to be imported

Imported pages

Imported pages

Options screen

Options screen


new: clean up Word HTML option
Results: imported pages and rewrite rules

Results: imported pages and rewrite rules

116 comments

Sharing this post? The short URL is http://sillybean.net/?p=2321

Comments

  1. I have downloaded and installed the plugin but wondered if this could be used to take a site NOT at my domain and bring it in. I created a site for another individual and want to bring it into a wordpress themed site. Can I do that with this plugin? I have the entire site on my local computer. Can I pull it from my computer if I need to?

    Thanks so much!

    Posted by Mike on August 9th, 2009 at 2:32 am

  2. Yes, that’s precisely what this plugin does. Just feed it the full path to the files on your computer and let it rip!

    Posted by Stephanie on August 9th, 2009 at 9:53 am

  3. Also, Mike, your email is bouncing.

    Posted by Stephanie on August 9th, 2009 at 9:54 am

  4. Just to clarify – will this also work on a remotely hosted website? Or will I need to set up the site I want to import content from locally?

    Posted by Lynne on August 11th, 2009 at 2:20 am

  5. It might work on a remotely hosted site, but a lot depends on your server’s settings and whether you’re allowed to open remote files. I can say for sure that it works with local content, but remote is iffy.

    Posted by Stephanie on August 11th, 2009 at 8:24 am

  6. Thanks! Will give it a try but if it fails remotely it’ll be worth transferring it to my local server for the amount of data I want to import.

    Posted by Lynne on August 11th, 2009 at 1:08 pm

  7. I do think handling the import locally is a better practice in general, since the process is somewhat resource-intensive.

    Posted by Stephanie on August 11th, 2009 at 1:13 pm

  8. I am excited to use your application, but I keep getting the following error.

    Fatal error: Call to a member function asXML() on a non-object in /dir/path/public_html/xyz.com/wp-content/plugins/import-html-pages/html-import.php on line 444

    I changed the directory path name, and domain name for sake of privacy.

    Help, what can I do?

    Posted by Confused on August 18th, 2009 at 9:04 pm

  9. Hi,

    I think there is a typo in the code. I tried replacing line 444
    //$my_post['post_content'] = $my_post['post_content'][0]->asXML(); // asXML() preserves HTML in content

    with this line, and it worked:
    $my_post['post_content'] = $content[0]->asXML(); // asXML() preserves HTML in content

    Posted by mjos on August 22nd, 2009 at 6:32 am

  10. mjos, that’s interesting. I tried that in testing but got errors. Confused, can see if that change works with the files you’re trying to import and let me know?

    Posted by Stephanie on August 22nd, 2009 at 8:57 am

  11. Would not import from PC, so placed ONE file in
    /home/domain/public_html/wp/html
    setting import from tag html.
    Instead of simple import, made 80+ placeholders until I aborted connection.
    How about a SIMPLE “import this file”?

    Posted by mugger on August 22nd, 2009 at 11:58 am

  12. Stephanie, what errors exactly did you get? I was able to upload 120 html pages with no problems (they had very simple structure though – only html, title, body a p tags).

    Posted by mjos on August 22nd, 2009 at 12:29 pm

  13. I got a tree of placeholder “pages” titled
    .
    ._
    .__
    .___

    ._____________
    and could only halt it by breaking connection.

    I would welcome a simple example (text form ok) to copy.
    Please be specific on the commands used.

    You might consider a “stop” button.

    Posted by mugger on August 22nd, 2009 at 2:07 pm

  14. Hi,
    I was trying your plugin, but ot’s not work and I don’t know what the problem. The error message for the first time is asXML() problem, then I’ve loaded automaticaly any extensions related.

    But, I recently facing the second problem told me that DomDocument:: need at least 1 param, while 0 param here?

    Is it because of path of the directory? I used it on my XP and then facing the same error when I’ve tried it on my Linux system.

    Any help and suggestion rellay appreciated.
    Thanks

    Posted by Aman on August 24th, 2009 at 6:28 am

  15. Hi Stephanie,

    I’m uploading from local directory and get this error: failed to open dir: No such file or directory. Seems pretty simple, I entered the path wrong. But I can’t figure out how… I have it entered exactly like this: /Users/admin/Sites/aj2

    Thanks for your help.

    Posted by chad on August 24th, 2009 at 10:06 am

  16. +1 on mjos’ edit. I hung with the same error on line 444. Replacing with mjos’ suggested code allowed pages to import.

    Thanks for developing this plugin. I’ve been looking for something like this for a while!

    Posted by Adam on August 24th, 2009 at 2:26 pm

  17. +1 on mjos’ edit also. that finally got it working.

    Posted by bpmore on August 25th, 2009 at 10:07 pm

  18. Hi,
    It’s work perfectly. I’ve edited as mjos’ suggestion. I’ve test it on WinXP and Linux, worked on all platforms.

    Thank you very much.

    Posted by Aman on August 27th, 2009 at 6:02 am

  19. Hi guys. I’m working crazy hours this week, but I’ll take a look at the errors you’re getting over the weekend and see what I can do.

    Posted by Stephanie on August 27th, 2009 at 8:21 am

  20. This will be such a great tool for me. Thanks for looking at my problem. I’m really excited about using this because I have a few friends who want to migrate sites with 100s of static html pages. Oy.

    Posted by chad on August 28th, 2009 at 5:36 pm

  21. Hi Stephanie – feedback: I had to make the 444 error changes that mjos suggested (thank you mjos) for your plugin to work. After that it works fine.
    A suggestion: it would be nice to have the option to use the html name as title – eg. from “01. September 2009.html” the Article title would be: 01 September 2009.

    Thanks for this plugin!

    Greetings :) Bette

    Posted by Bette on September 5th, 2009 at 11:50 am

  22. Hi Stephanie – feedback 2: After a closer look I realized: all umlauts / mutated vowels like ä,ü,ö,Ä,Ü,Ö are NOT imported!! A “weströmischer Kaiser” becomes “westrmischer Kaiser”.
    What do I have to change, to make umlauts (and possibly special characters too) to be imported?

    Greetings :) Bette

    Posted by Bette on September 5th, 2009 at 2:31 pm

  23. NO it’s not working properly, unfortunately. My wordpress website is selfhosted and although I entered all the paths directories and files to be excluded the plugin made me around 1000 empty pages with . or — as subject. Frustrating and sucks. as somebody wrote earlier, consider the possibility to add a button to choose the exact directory and the file exactly.

    Posted by sonia on September 6th, 2009 at 3:43 pm

  24. I’m having issues on import with quotes. Quotes are replaced by either blank boxes or boxes that have little zeros in them and numbers below the little zeros (e.g., 92, 93, etc.). I assume these are some kind of representations of the quotes. Is there a way to get the quotes to import correctly? Also, a great feature would be the ability to assign a tag or a custom taxonomy when importing. I’d donate $100 if it could do these things!

    Posted by Sven on September 7th, 2009 at 9:32 am

  25. mjos, you were right about that typo. The error I was seeing was somewhere else entirely.

    I’ve just uploaded version 1.12 to the plugin repository. It fixes that typo and adds the tag/category option Sven requested.

    Bette, Aman, and mugger, I’m still looking into the problems you encountered.

    Posted by Stephanie on September 13th, 2009 at 2:40 pm

  26. Chad, what version of PHP are you using?

    Sven and Bette, can you send me one of the files with the quotes and/or special characters that caused problems for you? Email attachments to stephanieleary at gmail dot com.

    Posted by Stephanie on September 13th, 2009 at 8:56 pm

  27. Hi Stephanie,
    it took me some time. I found why it’s not importing vovels: I’m creating html-files in Windows with a Windows program, that are NOT UTF-8 encoded! It’s a kind of ANSI I’m using. Your script is defined for UTF-8 coded html-import.
    I changed in your html-import.php:
    $encoded = mb_convert_encoding($contents, ‘HTML-ENTITIES’, “UTF-8″);
    to:
    $encoded = mb_convert_encoding($contents, ‘HTML-ENTITIES’, “Windows-1252″);
    One has to find out what System-coding is used by the program creating the html-files(there are several types depending on the System eg Mac, Linux etc.) and change it in your script to the appropriate type. The advantage is, that one can use any type of program to create html-files.

    Greetings :) Bette

    Posted by Bette on September 15th, 2009 at 5:57 am

  28. Good to know, Bette. The script is supposed to convert whatever encoding your file uses into UTF-8, which is what WordPress wants, but I gather it isn’t working as expected. I’ll work on that.

    Posted by Stephanie on September 15th, 2009 at 11:04 am

  29. I nthank you for taking this project. I am personally willing to donate for me a good amount of money but I would rather specify how you could assist offline.

    This is a fairly successful system you have got going here. OPbviously sometimes it is dependent on themes. I have been wanting for some time to convert a large website to php and although I know other systems exist, I have become quite happy with wordpress because of its ease of use, flexibility and many themes. Your work is greatly appreciated and I seriously want to donate.

    Jimmy

    Posted by Jimmy Deguara on September 16th, 2009 at 6:05 am

  30. Hello, Is it possible to add a feature to auto update html pages/content imported? The pages I’m importing aren’t static and they’re updated daily using another script. What I’m simply looking to do is import the output of the other script into a wordpress page, and update the imported data at predetermined intervals.

    Thanks

    Posted by Chris on September 21st, 2009 at 8:33 am

  31. Also, upon attempting to import my html page, I get the following error (same as above):
    ——————–
    Fatal error: Call to a member function asXML() on a non-object in /path/to/www/public_html/wp-content/plugins/import-html-pages/html-import.php on line 476
    ——————–

    Line 476 reads (same as the typo fixed above):
    ——————–
    $my_post['post_content'] = $content0->asXML(); // asXML() preserves HTML in content
    ——————–

    Posted by Chris on September 21st, 2009 at 8:39 am

  32. I got the same issue as the poster above after entering in my import path:

    Fatal error: Call to a member function asXML() on a non-object in /home/username/public_html/test/wp-content/plugins/import-html-pages/html-import.php on line 476

    Posted by Kym on September 21st, 2009 at 11:21 pm

  33. Hi,
    due to whatever reason I am not able to import a single page onto 2 different hosts showing different behaviour.
    ( i would like to upload a few html-file from my HD, using WP 2.8.4)
    1st host: not warning, nothing .. 0 pages imported
    2nd host: several warnings:Warning: scandir(C:/Photos) [function.scandir]: failed to open dir: No such file or directory in /home/kuku/public_html/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 2): No such file or directory in /home/kuku/public_html/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in /home/kuku/public_html/wp-content/plugins/import-html-pages/html-import.php on line 396

    Please help!

    Posted by Michael on September 22nd, 2009 at 11:57 pm

  34. Prior post showed failure ver 1.12 importing from PC.
    Just tried v 1.13 from http://domain/wp3/html with copy of http://domain/index.htm and got
    Warning: scandir(http://domain/wp3/html) [function.scandir]: failed to open dir: not implemented in /home/domain/public_html/wp3/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 0): Success in /home/domain/public_html/wp3/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in /home/domain/public_html/wp3/wp-content/plugins/import-html-pages/html-import.php on line 396

    Posted by mugger on September 26th, 2009 at 11:46 am

  35. Oops! I definitely did NOT mean to show my actual domain in those warning msgs.

    Posted by mugger on September 26th, 2009 at 11:48 am

  36. I think the weird character issue was a browser issue. Is it possible to add a feature to apply a custom taxonomy tag to the imported docs (rather than a regular “tag”)? I’m sending in the donation!

    Posted by Sven on October 12th, 2009 at 7:28 pm

  37. Michael, it looks like you’re entering a local path from your desktop machine while the plugin is installed on a remote server. The files need to be on the same machine as WordPress. You could upload them all to the server, or install WordPress locally (XAMPP makes it easy), run the import, and then export the new entries in the WordPress format for uploading to your server.

    Sorry that’s so cumbersome. There just isn’t a good way to upload multiple files in a browser without using Flash, which I’m reluctant to get into.

    mugger, are you entering a URL or a file path? If URL, try a path instead — your host might not have enabled the fopen wrappers necessary to accept URLs as an input for the scandir() function. If you are using a path, make sure it’s pointing to a directory and not a single file. (The importer doesn’t handle single files well. I’m working on that.)

    Sven, HUGE thanks for the donation. Let me know if this is what you’re referring to.

    Everyone — sorry I went quiet for a while. I had to do a huge amount of preparation for the HighEdWeb conference that was held last week.

    Posted by Stephanie on October 12th, 2009 at 8:36 pm

  38. Stephanie

    thanks for the plugin – I am fairly new at using WP, and want to convert my Joomla 1.0 to WP 2.8.4.

    I copied the files from my Joomla blog to my WP site /import/Blog/YEAR/MONTH directory, and tried to import the files. I get “Imported 0 files in 11.66689″.

    I see that the error on line 444 was correct, but do not know what else may be happening on my end.

    Any thoughts?

    Posted by mahill510 on October 13th, 2009 at 5:15 pm

  39. Since the importer crunched for a good long time, it looks to me like it went through all the subdirectories and didn’t find any files with the right extensions. If your Joomla site used any kind of clean URL structure (/content/view/nn/nn), the importer saw those as directories and skipped them.

    The Joomla2WordPress importer will probably work better for you than my plugin, and I’m not saying that in an attempt to blow off your question. I used it to migrate an old Joomla site just a couple of months ago and it worked great.

    If you’d rather stick with my importer, you’d have to find a way to add a file extension to all your Joomla pages.

    Does that make sense?

    Posted by Stephanie on October 13th, 2009 at 5:24 pm

  40. Ah – it all makes sense, and I also found that the blog/yy/mm was actually a blogger archive before I moved to Joomla in 07, but it should still work.

    I was able to open the blog post, that is how I found out I was using the blogger version in stead of the Azrul MyBlog post…

    I actually downloaded the Joomla2WordPress importer but had reservations since I am not that technical and had concern about editing a config.php file with the appropriate table names.

    Should we take this conversation off line?

    Posted by mahill510 on October 13th, 2009 at 6:00 pm

  41. Stephanie – yes – the custom taxonomy feature I’m looking for is just using WP’s custom taxonomy feature. So just like you can say “all imported posts should be under a certain category or a certain tag” I’d like to be able to just designate a certain item under an existing custom taxonomy that I have set up in WP. Basically, I use Joost DeValk’s plug-in to create custom taxonomies in WP:

    http://yoast.com/wordpress/simple-taxonomies/

    And the posts I’m importing I want to tag with (at least) one of the custom taxonomy items that I designated. So I have two custom taxonomies “Name” and “Type” where “Name” has things like “Bob”, “Sven”, etc. and “Type” has items like “Funny”, “Serious”, “Gag”, etc. So what I’d like the plug-in to do is let me tag all imported one of these items – so if I import all of Bob’s posts, I can designate them as “Bob”. This way, once imported, I don’t have to re-tag for that item. All I have to do is go in later and further tag them with “Funny”, etc. if I want to.

    Here is another (separate) feature request. This may be harder. Some of the things I’m importing are .txt files. Is there any way to add an option to the importer to convert any .txt files it finds to HTML and then import them? I know there are PHP scripts that can do this. It would be great for the importer to be able to do this as .txt and .html are the two general types of files most people would want to import.

    More donations to come!

    Posted by Sven on October 14th, 2009 at 7:06 am

  42. Hello, I have som problems…
    Error: Fatal error: Call to a member function asXML() on a non-object in /home/www/famosadesnudas/public_html/wp-content/plugins/import-html-pages/html-import.php on line 476.
    Could you help me?

    Posted by richard on October 14th, 2009 at 7:11 am

  43. And now: Call to a member function asXML() on a non-object in /home/www/famosadesnudas/public_html/wp-content/plugins/import-html-pages/html-import.php on line 951
    ???? hELP PLEASE!!!

    Posted by richard on October 14th, 2009 at 7:16 am

  44. Helo, Ihave change the code on line 476 but that don’t work I have the same error!!
    Anybody can help me?

    Posted by richard on October 14th, 2009 at 7:47 am

  45. Hi,

    This plugin is exactly what I am looking for to import my old static html site into a new WP install. However, I am working on it locally with XAMPP on my WIN XP Pro computer and keep getting this error:

    Warning: scandir(/archives/sitefoldername/sitesubfoldername/articles/) [function.scandir]: failed to open dir: Result too large in E:\xampp\htdocs\test\wp-content\plugins\import-html-pages\html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 34): Result too large in E:\xampp\htdocs\test\wp-content\plugins\import-html-pages\html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in E:\xampp\htdocs\test\wp-content\plugins\import-html-pages\html-import.php on line 396

    I’m not sure what is going on with it. I have checked the code from above and it is corrected already as I downloaded the newest version. Any help with this would be great as there are over 10K articles going back to 2000 that I need to import.

    Thanks in advance!

    Posted by Elliott on October 18th, 2009 at 2:02 pm

  46. Hey there. :-)

    Still trying to do the imports but I keep running into the same issue.

    Fatal error: Call to a member function asXML() on a non-object in /home/username/public_html/test/wp-content/plugins/import-html-pages/html-import.php on line 476

    Is anyone else experiencing the same issue and do you know of any fixes?

    Posted by Kym on October 20th, 2009 at 11:43 pm

  47. Yes, it hasn’t worked for me, I am experiencing the error below:
    Warning: scandir(/test/) [function.scandir]: failed to open dir: No such file or directory in /home/tra47941/public_html/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 2): No such file or directory in /home/tra47941/public_html/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in /home/tra47941/public_html/wp-content/plugins/import-html-pages/html-import.php on line 396

    Posted by joan on October 22nd, 2009 at 11:10 am

  48. Hi Stephanie,

    I am working on a project that if I can get this to work, I will send you a donation.

    I was testing on my XAMPP local server and it wouldn’t work (see note above).

    I have moved things over to my live server for testing and am still getting this:

    Warning: scandir(/public_html/dev/wp-content/jeffersonreview/articles/2000/011000/) [function.scandir]: failed to open dir: No such file or directory in /home/MYHOST/public_html/dev/wp-content/plugins/import-html-pages/html-import.php on line 396

    Warning: scandir() [function.scandir]: (errno 2): No such file or directory in /home/MYHOST/public_html/dev/wp-content/plugins/import-html-pages/html-import.php on line 396

    Warning: Invalid argument supplied for foreach() in /home/MYHOST/public_html/dev/wp-content/plugins/import-html-pages/html-import.php on line 397

    I’m not sure what the problem is with it as all of my import files are in folders and I am specifying the “root” folder to begin the import.

    Please, please help me with this as I don’t understand what is causing this issue.

    Thanks in advance for your time.

    Posted by Elliott on October 28th, 2009 at 12:37 pm

  49. I am getting same problem as most people here the :

    Fatal error: Call to a member function asXML() on a non-object in /home/namei/public_html/blog/wp-content/plugins/html-import.php on line 476

    Has anyone found a solution to this yet? Please!!

    Posted by Web Design Selby on October 29th, 2009 at 2:39 pm

  50. Hi, guys. I’m traveling again and I don’t have my laptop, just the phone, so I’m not in a position to troubleshoot anything right now. I’m sorry! I’ll be back next week and I’ll try to take a look then. If you can, find the file that’s causing the XML error and paste the source code into pastebin. Then email me (stephanieleary @ google’s mail service) or reply here with the settings you’ve entered for the content and title regions.

    The scandir errors have stumped me for the time being, but I’m looking into it.

    Posted by Stephanie on October 29th, 2009 at 3:55 pm

  51. Hello,

    I try many time still nothing success to import any file, I say the following message, actually not sure what is redirects.

    Can you help for this problem ?

    Dick Leung

    *** message below
    .htaccess Redirects

    if you need to redirect visitors from the old file locations to your new WordPress pages, copy these redirects into your .htaccess file above the WordPress rules. Note: You might need to search & replace first if your import root directory was not the same as your web root. Also, if you imported many files, the complete list of redirects might slow your web server’s performance. Consider copying only essential ones, or if there’s a pattern to your file or directory names, create a RewriteRule instead.

    Posted by dickleung on November 3rd, 2009 at 4:33 pm

  52. I hope you return soon :-)

    Posted by Michiel Ebberink on November 4th, 2009 at 10:24 am

  53. I’m back, but I can’t replicate the problems with any of my test files. If you’re getting XML errors, please please send me one of your files, either by email or pastebin, and let me know what your import settings look like for the content and title regions.

    Posted by Stephanie on November 6th, 2009 at 2:09 pm

  54. Sorry, I think I’m missing something: is “Beginning directory”
    (a) a path on my local machine,
    (b) a path relative to the wordpress install directory, or
    © a fully qualified URL?

    It’s not clear at all what I need to put there. If I import from a directory on my local machine, must the wordpress install being running on that same machine, or can I import local files to a remote server?

    I’ve tried many iterations, but keep getting scandir() errors similar to those of others. I will definitely make a big donation if you can get this going for me.

    Posted by Scott Shumaker on November 6th, 2009 at 5:06 pm

  55. Scott, it should be an path (from root) on the same machine where WordPress is installed. I might eventually get it working with URLs, but I don’t believe it’s working just yet.

    Posted by Stephanie on November 12th, 2009 at 8:28 pm

  56. Hi, finally got it to work, however it imported all of the pages with the as the content and periods and dashes at the titles…nothing else, before it locked up and ran out of memory.

    Any ideas would be great…

    Posted by Elliott on November 13th, 2009 at 12:57 pm

  57. Hi

    Tried this on Xampp on WinXP and now on a Mac.

    Finding out what the path to the directory is is a huge problem.

    Once I did, I have the same problem as Elliott – the plugin churns out hundreds of empty pages and locks the system, unable to recognize what a proper directory tree looks like, and seemingly creating imaginary files with periods as the filenames.

    On the mac, once I found the correct path, even though I have a file there, the plugin fails to recognize that it is a html file. I’ve tried renaming etc etc.

    It’s a great idea for a plugin, I hope you find what the bugs are!
    Failed to find the right combination on the PC.

    On the mac, finally I

    Posted by Dermod on November 15th, 2009 at 3:18 am

  58. finally got it to work on my pc, winxp, using the path ../../onefamily – even though onefamily and wordpress are on the same level of the directory tree!

    But i now get the error: Call to a member function asXML() on a non-object in D:\xampp\htdocs\wordpress\wp-content\plugins\import-html-pages\html-import.php on line 476

    any help on this would be much appreciated. on my mac all I get is a repeat of Elliot’s problem.

    Posted by dermod on November 15th, 2009 at 9:28 am

  59. You should use the absolute path from the root. On a Mac, you can easily find it by dragging your chosen folder into a Terminal window. On the PC, you can find the complete path in the location bar at the top of your Explorer window.

    The XML error most often occurs when the importer comes across a file that does not contain the tags you’ve told it to look for.

    I will be happy to try to duplicate your errors if you can send me a zip containing some of your files (address given in an earlier comment). I have not been able to reproduce any of the reported errors with the files I’ve imported.

    Posted by Stephanie on November 15th, 2009 at 11:46 am

  60. Thanks for the reply. I’m looking at the function asXML() and I see that it’s for XML markup only – the old but fairly well-formed HTML files that I want to import are:

    I’m trying this with just one html file in a folder, and even though it had some small errors like a dropped anchor, which I’ve fixed, I still get the same error. Images are old html, ending in >, not />, which is find in html.

    Before sending you files, which is a generous offer, can you confirm that old html works with this plugin? And that asXML() copes with old HTML?

    Cheers,
    Dermod

    Posted by dermod on November 15th, 2009 at 3:36 pm

  61. dear stephanie,
    your plugin rocks! I have 2500 or so files to import. I’ve successfully imported 10 just as a test. My question: should i run the import on all 2500 at once, or will it freeze? is it better to break these up?

    Posted by yonation on November 17th, 2009 at 5:20 pm

  62. Glad you like it! I have imported that many files before, but it all depends on whether your server will allow the script to reset the execution time (which it will try to do). My suggestion is: go for it, but grab the Mass Page Remover plugin (linked on the import page) just in case you need to start over.

    Posted by Stephanie on November 17th, 2009 at 6:03 pm

  63. Hi !

    Fisrt of all many thanks for this great plugin !

    It could save me HOURS of work trying to copy-paste more than 100 pages… !

    But…

    I encounter the same issue as mugger in his coment : http://sillybean.net/code/wordpress/html-import/#comment-15292

    How to fix it ?

    Many thanks in advance.

    A blogger from Belgium ;-)

    Posted by Désiré Dupas on November 23rd, 2009 at 6:43 pm

  64. Well !

    Nice !!!!

    In reply to my post, It works fine, just didn’t added the HTML tag :

    Select content by:

    HTML tag

    fill in with HTML or BODY or P or whatever ;-)

    Many many many thanks for this plugin.

    I’ll donate for this great tool !!!!! :-)

    Posted by Désiré Dupas on November 23rd, 2009 at 7:10 pm

  65. You’re welcome! I’m glad you got everything worked out.

    Posted by Stephanie on November 23rd, 2009 at 7:24 pm

  66. dear stephanie,
    i have one more question, is there anyway to mass import into separate tag fields? lets say i have many html files with text like this in the body:

    blue
    5kg</span

    is there any way to get the value of the classes into tags, so then we can do advanced searches (ie search by color)?

    Posted by yonation on November 24th, 2009 at 12:04 pm

  67. Sorry stephanie, i meant get those values into custom fields so we can search them!

    Posted by yonation on November 24th, 2009 at 12:16 pm

  68. Yonation: not yet, but please email me with more details about what you would like the importer to do. I’m working on adding the custom taxonomy features, and I can take a look at custom fields as well.

    Posted by Stephanie on November 24th, 2009 at 1:23 pm

  69. Hi, Stephanie. A tip for other soon-to-be-happy users.

    I had a batch of files exported from PBWorks which I wanted to import into WP … however they had no surrounding tags (, ), each just started out with the content markup.

    After a couple of attempts at leaving the HTML tag empty, I tried an asterisk *.

    Bingo. Great!

    -Jay

    Posted by Jay Collier on November 28th, 2009 at 8:13 pm

  70. Good tip, Jay. Thanks!

    Posted by Stephanie on November 28th, 2009 at 9:06 pm

  71. Hi,
    I must have done something terribly wrong. I got a 500 Internal Server Error, and cannot access my site anymore. During import, I saw paths unfolding to extreme length, and then it stopped. I can still FTP to my server, but don’t know what to do to get my site back.
    Can you please help?

    Posted by RHCdG on November 29th, 2009 at 1:16 pm

  72. RHCdG: I’m so sorry. I don’t know what could have caused that. Is there anything in your PHP or MySQL error logs from the time you ran the import?

    Posted by Stephanie on November 29th, 2009 at 3:06 pm

  73. Thanks for your kind reply. Can you tell me where I would find the PHP error log? Is it a file with a name?

    Posted by RHCdG on November 29th, 2009 at 3:13 pm

  74. The location is defined in your php.ini file, if the error log exists at all. If it’s not working or you don’t have access to php.ini, you can set up error logging in wp-config.php.

    Posted by Stephanie on November 29th, 2009 at 3:34 pm

  75. I’m using mamp with wordpress 2.8.6. I put an absolute url to the site, but nothing imports. No matter what directory or choices I check, the results come back “imported 0 files”. Anything I can try?

    Posted by csleh on November 30th, 2009 at 10:06 pm

  76. I misread the very first comment, so will try with local files. The error I’m getting is the line 395 and 396 one, which seems to be about looking for the files.

    The problem is obviously my own user error, but a note on the FAQ that this works with html files on the same server as wordpress might be helpful.

    Posted by csleh on November 30th, 2009 at 11:26 pm

  77. I got something using local files, but 375 empty pages with dashes and dots for title. My settings were process html, use div tag tag id main, no cleanup, title from html h1 tag. Mac mamp, wordpress 2.8.6, import html 1.13.
    Help appreciated!

    Posted by csleh on December 1st, 2009 at 2:01 pm

  78. For anyone sure that they’re putting in the correct path and still getting “imported 0 files” or the “importing…” screen, make sure that you have the non-default mbstring PHP extension enabled.

    i.e. “extension=php_mbstring.dll” in PHP.ini
    and php_mbstring.dll in your extensions folder

    http://www.php.net/manual/en/mbstring.installation.php

    There is a call to mb_convert_encoding() that fails otherwise.

    Posted by Piotr on December 3rd, 2009 at 10:04 am

  79. Thanks for the tip, Piotr. The next version of the plugin will include a check for that function before it gets called.

    Posted by Stephanie on December 3rd, 2009 at 12:59 pm

  80. Dear Stephanie,

    I am converting my old static HTML site to a WP site. Your import HTML plugin comes really handy! However, I have two questions:

    1. I read somewhere that you are contemplating to develop the plugin so that it will also import images. Something in the pipeline?

    2. When I import pages I realize that is should be good to be able to define f.ex. more than one tag och Dreamweaver region. On my present site I hace one editable region called “Text” and another one called “Byline” in the same document. Ias I understand it I can only import one of these regions. Am I right?

    Posted by Anders Olofsson on December 4th, 2009 at 8:16 am

  81. Hi Stephanie,

    I got my site back in order but I am having the same issue as mugger, where all I get is this:

    .
    ._
    .__
    .___
    …
    ._____________

    and only through breaking the connection can I stop the process. Afterwards, I need to delete dozens (hundreds) of empty pages.

    This is the path I am entering for the Beginning Directory: /usr/local/WWW/A/.5c2/r/xxxxxxxx/htdocs/ (where ‘xxxxxxx’ is my site ID)

    I am telling it to skip certain directories, and to only import html pages. Furthermore, I am telling it to only import from the tag on.

    I am using the latest Wordpress version and the latest plugin version as well.

    Can you or anyone else please help? I need to import 986 pages, and this could be such a great help.

    Thanks,

    Rutger

    Posted by RHCdG on December 11th, 2009 at 6:43 pm

  82. I meant from the “body” tag on (put it in brackets by accident)

    Posted by RHCdG on December 11th, 2009 at 6:45 pm

  83. Wow. Another plugin….. You are so happy to boast what it does. Wow… But that really is worthless.

    How about how to use it. WTF do i do to use it.
    Instructions? I am very new to WP and am about to never try it again. I have installed the plug in. And now all I can do is see the “installed plugins”.

    What does it read my mind and import what I am thinking.

    Please call me an idiot. I fear for people that have not been a web developer for 10 years as I have.

    Posted by selvsl on December 18th, 2009 at 8:37 pm

  84. Is there nobody here to reply to my post above?

    Posted by RHCdG on December 19th, 2009 at 9:54 am

  85. Scratch my request. I have found the Readme FIle.

    Posted by selvsl on December 19th, 2009 at 5:42 pm

  86. RHCdG, I’m sorry, I’m preoccupied at the moment. Can you email me an example of one of your files? I’ll take a look when I get a chance.

    Posted by Stephanie on December 20th, 2009 at 3:19 pm

  87. Hello Stéphanie,

    Seems like my site5 hosting does need the php script to be called .php5 to lock on scandir.

    I tried a bit to hack around and see that WP is rather secure and won`t allow renamed php script to be run. I figured I would drop you a email and wish you a happy new years before going further.

    do you know what I should do? or could you publish a new version of your plugin using php5 as an extension?

    More serious merging would happen following my unlocking (~300 press releases for another web site), and then, an eventual donation, of course :)

    thanks!

    Posted by JF on January 2nd, 2010 at 10:47 am

  88. I apologize in advance for being a novice. When I attempt to import my existing live site I get the following errors:

    Importing…
    Warning: Division by zero in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir(com) [function.scandir]: failed to open dir: No such file or directory in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 2): No such file or directory in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 396

    I’m trying to find out what this means and what I can do to correct the problems. Thanks

    Posted by Marc Tolbert on January 5th, 2010 at 2:54 pm

  89. Stephanie, we have modified your plugin in many ways and I would like to send it back to you as a contribution if you are interested.

    What we did was this;
    We took the plugin and made it so the actual names of the html files are used for the permalink rather than the title. So if my html file is advertising.html, when I do the import my permalink is advertising (/ or .html or .php depending on my WP settings) the actual <title of the document is still put in as the post/page title.

    Next we put in a check box for "All In One SEO" so if you check it, it will add the meta description, meta keywords and title to the AIOS accordingly.

    Next I redid your admin panel

    I did all this because one of my data conversion guys struggled for weeks with a large project where we had to import 350 very nasty looking static HTML pages into WordPress after cleaning up the code, page by page. Your plugin is beautiful, we just enhanced it for this project and I'd like to send it back as a contribution to your project.

    Posted by Jared Ritchey on January 7th, 2010 at 8:01 am

  90. Hey Jared I’m getting these errors while attempting to import a site:

    Importing…
    Warning: Division by zero in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir(com) [function.scandir]: failed to open dir: No such file or directory in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: scandir() [function.scandir]: (errno 2): No such file or directory in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 395

    Warning: Invalid argument supplied for foreach() in /home/mtolbert/public_html/barginshopperonline.com/blog/wp-content/plugins/import-html-pages/html-import.php on line 396

    I’m trying to find out what this means and what I can do to correct the problems. Thanks

    Posted by Marc Tolbert on January 7th, 2010 at 10:34 am

  91. Hey, guys. I’ve been sick recently. I apologize for letting your comments sit so long without a reply.

    JF, I’m afraid I’ve never encountered that problem before. I’ll ask around and see if anything can be done.

    Marc, what’s the path you entered? Make sure there is no slash at the end. Also, there’s no need to repeat your question or pester the other commenters.

    Jared, thanks. Please do send your changes. I’d be happy to take a look.

    Posted by Stephanie on January 12th, 2010 at 7:16 pm

  92. Stephanie,
    Sorry for the repeat question to Jared. The path I was using was http://www.barginshopperonline.com .

    Posted by Marc Tolbert on January 12th, 2010 at 9:43 pm

  93. local path is C:\Documents and Settings\Marc\My Documents\My Web Sites\Bargin

    Posted by Marc Tolbert on January 13th, 2010 at 3:55 pm

  94. Hi,

    I want to report a possible error on this plugin regarding the 476 line in the code :

    $my_post['post_content'] = $content0->asXML();

    Actually it should be like this :

    $my_post['post_content'] = $xml->asXML();

    At least that way worked for me. It is because as i looked back in the code the object

    $content = $xml->xpath($xquery);

    the function xpath as part of the $xml object returns an array but only an array of strings so you can’t make an expression like $content0 and then make $content0->asXML() because $content0 it’s just a string so it’s not a class who would contain a function like asXML(). Actually asXML() it’s a function of the $xml object which is a of SimpleXMLElement class type.

    Posted by Munteanu Ramiro on January 14th, 2010 at 4:41 pm

  95. Hi,
    thanks for the terrific app.
    I’m looking for a small tweak and unfortunately I’m not proficient at php.
    I’d like to designate the timestamp as a field in the html file I’m importing.
    I can hard code it – it doesn’t have to show as an option.
    Can you point me to where I should make this change? I’ll do the work and debug but it would help to be nudged in the right direction.
    Thanks again!

    Posted by Scott on January 20th, 2010 at 10:11 pm

  96. Sure, Scott. Take a look at PHP’s XPATH query functions. What you want to do is build a query that matches the HTML surrounding your date. If it’s in a <div class=”time”> tag, you’d write something like:

    $date = $xml->xpath(‘//div[@class="time"]‘);

    Then you’d use the PHP date functions to manipulate the date into a unix timestamp. Then replace lines 456-458 with the code you just wrote.

    Does that make sense?

    Posted by Stephanie on January 20th, 2010 at 10:23 pm

  97. Stephanie,
    Thanks for your help. I did get it working, once I figured out that I needed the // in front of the tag name even if I didn’t have any attributes, and that the response is an array for which I needed to take only the zero element. (and I needed to invoke strip_tags). But even with all the “discovery”, it worked great and let me maintain the dates on some 1500 posts that had been moved from another server (hence losing their create dates.) Pretty cool app once you start poking around and find all the things it can do.

    Posted by Scott on January 22nd, 2010 at 3:18 pm

  98. Glad it worked, Scott!

    Posted by Stephanie on January 22nd, 2010 at 3:26 pm

  99. I’ve just uploaded version 1.2 to the repository. It includes custom taxonomy fields, better error handling for the asXML() and mb encoding functions, and translation support.

    Posted by Stephanie on January 24th, 2010 at 10:40 am

  100. I am trying to use this plugin to import 2000+ pages into wordpress, and I’ve got it working, but it just imports a few hundred pages (there are only 10 files in the directory I’m testing) with titles like —– or .

    I see above some others had this problem – does anyone know how to fix it? I’m dealing with some pages that have some really, really malformed html so I’m wondering if that’s part of the problem…

    Posted by Tabytha on January 25th, 2010 at 12:32 am

  101. Tabytha — very likely. As it says on the tin, this plugin really only works with well-formed HTML. It might work on bad HTML, and it might not.

    Posted by Stephanie on January 25th, 2010 at 11:24 am

  102. Is there anything else that could possibly cause that? I’ve been trying to clean some of the code up, and I’ve tried running it through a directory with only 1 file that does have correct html, and I get the same results everytime. :(

    Posted by Tabytha on January 26th, 2010 at 11:29 pm

  103. Hi-

    When I run the plugin, the WP html pages are created successfully, the content is not imported and I get the following errors:

    Warning: SimpleXMLElement::xpath() [simplexmlelement.xpath]: Invalid expression in /home/site/public_html/wp-content/plugins/import-html-pages/html-import.php on line 635

    Warning: SimpleXMLElement::xpath() [simplexmlelement.xpath]: xmlXPathEval: evaluation failed in /home/site/public_html/wp-content/plugins/import-html-pages/html-import.php on line 635

    I have a screenshot of my settings in PDF format that I’d like to send you. Can you offer some assistance as to how I can correct this error message?

    Thanks.
    Lisa

    Posted by Lisa on January 29th, 2010 at 3:21 pm

  104. Sure, Lisa. Send me your file and I’ll try to take a look this weekend.

    Posted by Stephanie on January 29th, 2010 at 4:50 pm

  105. Complete noob question — I can’t find where the HTML import screen is hiding to even try this out — its not showing up in my “Import” screen on Dashboard…

    Posted by Jonathan on February 2nd, 2010 at 5:42 pm

  106. Sorry, Jonathan. This behaves more like a normal plugin, and its screen is under Settings -> HTML Import.

    Posted by Stephanie on February 2nd, 2010 at 8:28 pm

  107. Hello,

    i’m having the same issue as Rutger and Mugger above where the importer, irrespective of how many pages there are to import (2 – 1400), creates hundreds of pages (regardless of if i opt for posts) all entitled “.” and each being a child page of the one before it. In the page admin it looks like this:

    .
    _.
    __.
    ___.

    and so forth. I’ve tried using different settings for parsing the content and i know i have the import directory set correctly because it gives me an error otherwise. Please help!

    Thank you,
    _ian

    Posted by ian on February 3rd, 2010 at 4:25 pm

  108. I’m experiencing the same thing as some above – repeated .
    _.
    __.
    ___.

    that don’t stop until you halt the connection. Very frustrating. Had hoped I’d find a way to import 450+ html pages but it looks like back to the drawing board for me.

    Posted by Kathie on February 5th, 2010 at 5:27 am

  109. Stephanie, this is a great plugin and I tested it successfully with a subdirectory but for some reason when I try to copy about 200 of my clients’ files from her root/domain/ directory it imports about half of them but fails to complete and doesn’t provide me with the htaccess rewrite rules. It prints out this error:

    The PHP functions fopen and file_get_contents have both failed. We can't import any files without these functions. Please ask your server administrator if they are enabled.

    Any idea what this means and how to get it to perform correctly. Also tried copying all the HTML files and running the process on my local server with MAMP but get the same error.

    Posted by Cody on February 5th, 2010 at 6:52 am

  110. Kathie and Ian: as noted several times in this thread, I really need to see the settings you’re using to import. Rutger and Mugger haven’t provided any additional info, and I can’t troubleshoot the problem. Lisa sent me a screenshot of her import settings, and I was able to fix her problem in about two minutes. Knowing the end result is helpful, but in order to solve the problem, I need to know how you got there.

    Posted by Stephanie on February 5th, 2010 at 8:40 pm

  111. Cody, that’s one I haven’t seen before. There are a couple of things you might check. Are the permissions, owner, or group different on the last file that the importer is trying to read? Is it a different file type? Is it an empty file?

    Posted by Stephanie on February 5th, 2010 at 8:43 pm

  112. Stephanie, you’re awesome thanks for the quick reply. I went through and found there was an empty “default.html” file which I think is what was halting the process. Cheers! Hopefully I’m home-free from here on out. Great great plugin, thank you

    Posted by Cody on February 6th, 2010 at 5:53 am

  113. Hi Stephanie,

    I appreciate the time you’re putting to help everybody out, including me. I did, however, send you a sample html-file through the email form on your website; perhaps you did not receive it? Or maybe it did not give you all the information you need? Are there any other settings you need to know to fix this problem? Perhaps it would be useful if you could list the exact details or settings you need to solve this
    _.
    __.
    ___.

    problem?

    Thanks,
    Rutger

    Posted by RHCdG on February 6th, 2010 at 7:18 am

  114. Stephanie – Thank you for this fabulous tool. I’m using it to import 100s of pages from an old Radio Userland site into a Wordpress blog. This presented a few new problems: multiple posts on the same date are stored on a single html page, and the post date has to be retrieved either from the url or a bolded item in the page.

    I was able to modify your code to handle this, pulling out the code that handles each new page onto its own, and from there extracting the “identify the post and insert it” code into a second new function, which I then installed into a loop. Happy to send it for your review/inclusion if you like. It’s more or less clean.

    Posted by Chris Berendes on February 7th, 2010 at 9:21 pm

  115. Thanks, Chris, that would be great! I was just thinking earlier today that I need to create some more flexible options for dates and bylines, but it’ll be about two months before I can devote the time.

    Posted by Stephanie on February 7th, 2010 at 9:24 pm

  116. Hey, Rutger. I’m sorry, I don’t seem to have received your email. However, I think I know what the problem is. Did you by any chance erase the preset options for directories to skip? If so, but this back in:

    .,..

    (Period, comma, two periods.) I think I need to hardcode that into the plugin; otherwise, if you take it out and you’re on a UNIX-based OS, the importer will trip over all the hidden system files.

    Try that out and let me know how it goes!

    Posted by Stephanie on February 7th, 2010 at 9:27 pm

Leave a Reply

-- or --

Textile formatting is in effect.

RSS feed for comments on this post. TrackBack URI