Sillybean
Import HTML Pages
This plugin will import a directory of files as either pages or posts, according to configurable settings. You may specify the HTML tag containing the content you want to import (e.g. ,
<div id="content"> or <body>) or the name of a Dreamweaver template region (e.g. “Main Content”).
If importing pages, the directory hierarchy will be preserved. Directories containing the specified file types will be imported as empty parent pages. Directories that do not contain the specified file types will be ignored.
As files are imported, the resulting IDs, permalinks, and titles will be displayed. On completion, the importer will provide a list of Apache redirects that can be used in your .htaccess file to seamlessly transfer visitors from the old file locations to the new WordPress posts or pages.
Options:
- import pages or posts
- specify content and title as HTML tags or Dreamweaver template regions
- remove a common phrase (such as the site name) from imported titles
- specify file extensions to import (e.g. html, htm, php)
- specify directories to exclude (e.g. images, css)
- if importing pages, specify whether your top-level files should become top-level pages or children of an existing page
- choose tags, categories, and custom taxonomies
- choose status, author, and timestamp
- use meta descriptions as excerpts
- choose whether to clean up bad (Word, Frontpage) HTML
Requires PHP 5.
Support note: If you have a problem importing your files, please provide a screenshot of your import settings and an example of one of your files. Without these things, it’s difficult to diagnose your problem.
Sharing this post? The short URL is http://sillybean.net/?p=2321











Hi Stephanie,
I appreciate the time you’re putting to help everybody out, including me. I did, however, send you a sample html-file through the email form on your website; perhaps you did not receive it? Or maybe it did not give you all the information you need? Are there any other settings you need to know to fix this problem? Perhaps it would be useful if you could list the exact details or settings you need to solve this
_.
__.
___.
problem?
Thanks,
Rutger
Hey, Rutger. I’m sorry, I don’t seem to have received your email. However, I think I know what the problem is. Did you by any chance erase the preset options for directories to skip? If so, but this back in:
.,..
(Period, comma, two periods.) I think I need to hardcode that into the plugin; otherwise, if you take it out and you’re on a UNIX-based OS, the importer will trip over all the hidden system files.
Try that out and let me know how it goes!
I don’t know about Rutger, but I had that bug, I was really trying everything abd was feeeling kind of desesperate… and adding .,.. in the «Directories to skip» field solved it!
It’s a very disturbing bug, as it add hundreds of empty pages each time you try an import. (it’s nice that you point to the Mass deleter plugin right there in the plugin setup page!)
So, yes, you probably should add that preset somewhere else, or maybe just a line of text next to the field.
Thanks for that great plugin, keep up the good work.
Stephanie – Thank you for this fabulous tool. I’m using it to import 100s of pages from an old Radio Userland site into a WordPress blog. This presented a few new problems: multiple posts on the same date are stored on a single html page, and the post date has to be retrieved either from the url or a bolded item in the page.
I was able to modify your code to handle this, pulling out the code that handles each new page onto its own, and from there extracting the “identify the post and insert it” code into a second new function, which I then installed into a loop. Happy to send it for your review/inclusion if you like. It’s more or less clean.
Thanks, Chris, that would be great! I was just thinking earlier today that I need to create some more flexible options for dates and bylines, but it’ll be about two months before I can devote the time.
Hi Stephanie,
Do you mean I should put .,.. before each directory name in the box under “Skip directories with these names”?
Like so:
.,..audio, .,..pdf, .,..pics
?
I tried this, and the import was different than before but I got scared when it started displaying “cgi-bin” hundreds of times during the import… I hit the Stop-button on my browser, and I now have 820 “planned”, i.e. empty pages.
I know from others in this thread that your tool works miracles, but how I wish I would get it to work for my site!
Thanks,
Rutger
PS I did upgrade to the new version of the plugin!
No, you just need it once. In your case, it would look like:
.,..,audio,pdf,pics
Can you email me a screenshot of your import settings?
Hi Stephanie,
I just emailed you a screenshot; thanks again for your help!
Rutger
hi
nice plugin
but i have this problem:
i have one html page in this page i have some article.
the plugin workin fine for the first article but the other are ignore.
eseple
article1
content
article2
content
article3
content
some one can help me
thanks advanced
Hi Stephanie,
Super great tool. Thanks so much! Quick question – even though i include in the Allowed HTML field, still I can’t get the tool to keep my BRs. I’m using Version 1.21. Thanks for any pointers!
David
What I meant to say is:
Hi Stephanie,
Super great tool. Thanks so much! Quick question â
I’m getting these error messages. Please help. I have a lot of pages to import into my wordpress site! Thanks!
Warning: scandir() [function.scandir]: URL file-access is disabled in the server configuration in D:Hosting. . . pluginsimport-html-pageshtml-import.php on line 554
Warning: scandir(http://www.. . . .com/azrefiles/) [function.scandir]: failed to open dir: no suitable wrapper could be found in D:Hosting. . .pluginsimport-html-pageshtml-import.php on line 554
Warning: scandir() [function.scandir]: (errno 0): No error in D:Hosting. . .import-html-pageshtml-import.php on line 554
Warning: Invalid argument supplied for foreach() in D:Hosting. . .import-html-pageshtml-import.php on line 555
Hi Stepahnie !
Such a great tool
))
Unfortunately, got the same problem as Kent (http://sillybean.net/code/wordpress/html-import/comment-page-3/#comment-19774)
When I tried to parse something, the plugin tell me :
??
Warning: scandir(http://pagesdor.truvo.be/search/Soins_infirmiers_%C3%A0_domicile.html) [function.scandir]: failed to open dir: not implemented in D:xampplitehtdocsDocTelwp-contentpluginsimport-html-pageshtml-import.php on line 395
Warning: scandir() [function.scandir]: (errno 0): No error in D:xampplitehtdocsDocTelwp-contentpluginsimport-html-pageshtml-import.php on line 395
Warning: Invalid argument supplied for foreach() in D:xampplitehtdocsDocTelwp-contentpluginsimport-html-pageshtml-import.php on line 396
??
Any idea where does it come from ?
I’ve used your plugin a few months ago and it worked fine (http://sillybean.net/code/wordpress/html-import/comment-page-2/#comment-16531)
Don’t know why does it bug now ….
Please help
))
Actually, I just figured it out. Instead of using the url to the source files, you need to use the absolute hosting path. It should be something like D:Hosting[and then some numbers]. Good luck.
Stephanie, thanks for this great tool. I also liked the output of the list of necessary redirects. I then used the 301 redirect plugin to input those.
Regards,
Kent
Hi there. I ran the plugin but ran into this error:
Call to a member function xpath() on a non-object in /home/lulu/public_html/wordpress/wp-content/plugins/import-html-pages/html-import.php on line 593
The plugin seems to be running but the contents won’t import:
“Could not import 10000-neopoints-winners.html. You should copy its contents manually.
Could not import 13goingon30.html. You should copy its contents manually.
Could not import 13goingon30_sol.html. You should copy its contents manually.
Could not import 200mpeanutdash.html. You should copy its contents manually.
Could not import 200mpeanutdash_sub1.html. You should copy its contents manually.”
Quick update – I am able to import pages now by editing the “select content by” area.
I have select content by HTML checked and changed the HTML recognition tag to p (to represent ).
The only problem is that I have many tags in my html pages. The importer only imports the first paragraph and skips the rest.
Example page http://www.pinkpt.com/pages/200mpeanutdash.html
Is there a way to just have it import the entire HTML page?
Thanks in advance for such a time-saving plugin. You’re a life saver.
Update: Found the fix! http://sillybean.net/code/wordpress/html-import/comment-page-2/#comment-16661
This may help someone else:
If you’re not certain what your full path is on your server, you can find that when you log into your Control Panel of your host.
Also, I was having trouble bringing in the HTML pages until I actually put the folder with the files inside the folder where the Import HTML Pages plugin exists. The tip says to make sure the files are on the same server as WordPress, but I had to move the folder into the plugin’s folder before it would work correctly. My host is GoDaddy, just in case anyone thinks it might be a hosting issue that causes me to have to do this.
Thanks, Stephanie, for this plugin. I will definitely be making a donation.
Hi,
Can’t get this to work. Either I get hundreds upon hundreds of pages with content “.” or nothing happens, no file is imported. I don’t see any way of attaching screenshots here, but nothing happens with .,.., in the exclude field and the path server path copied faithfully from the ftp utility, and the hundreds and hundreds happens — well, I’m not sure, it’s just a major pain deleting them.
When are you coming out with single file version?
I’m sorry you’re having trouble. The Mass Page Remover plugin (linked from the importer screen’s tips section) should make it easy to remove those extra pages.
I’ll work on the next major version this summer. I plan to include the single file version and the image processing.
This really needs a fundamental reworking for usability. This reliance on .., etc. and one server or the other, is absurd from a usability standpoint. One should not have to be a developer to figure out how to use this, or how to get it to work. When it creates hundreds of unwanted blank html pages, which cannot be deleted except manually (because the remove html page plugin does not work), I’d have to say this plugin was 50-50, and therefore unusable.
This is a great plugin. I have one question — when importing by tag, is it possible to remove the wrapping tag that gets imported?
For example, if I am importing , I end up getting in my WordPress page (in the source).
I thought that simply adding /text() to the xpath around line 635 in your code would solve this. It does not! Any help? Thanks.
My comment got messed up because i included html in it.
I am searching for:
[div id="body"]
and that ends up in my WordPress page.
I don’t believe it is possible. However, you could use the Search and Replace plugin to get rid of those tags after the import is done.
Being able to import .txt as well as HTML files would be great!
Sven, I’ll email you. That might require a completely separate plugin, which is fine, but I’ll need some sample files to work with.
what is the path for a localhost install on a mac ?
It depends. If you’re using MAMP, it’s /Applications/MAMP/htdocs. If you’re using the built-in Apache server, it’s /Library/WebServer/Documents unless you’ve changed it in /etc/apache2/httpd.conf.
Hello Stephanie – the html page import plugin deletes all special characters from my imported html text, such as the German ö, ä, and ü.
Is there a way to allow those special characters and import them correctly?
Thanks,
Frank M.
Frank, I’m sorry it’s not working correctly. Try commenting out (or removing) line 568 (“if (function_exists(‘mb_convert_encoding’))…”) and see if that works better.
Hello Stephanie, the plugin works really well except it is deleting some characters, e.g. â
The characters didn’t come through in the comment properly.
The characters that done work are e.g. left sing quotation mark and right single quotation mark, but the apostrophe works ok (although turned into a left quote in the previous comment).
Thanks
Leon
I’ve noticed that WordPress chokes on curly quotes in general. You could try commenting out (or removing) line 568 (“if (function_exists(‘mb_convert_encoding’))…”) and see if that works better.
Hi Stepahnie
I am trying to use your plugin and getting stuck on the beginning directory. I have tried everything to get this working without any joy.
I have uploaded the html file onto the worpress server. i have put every combination i can think of but it just wont upload it. any help would be appreciated
phil
just notice i need php 5 for this to work. how do i know if i have this and where can i get if it if i dont have it
phil
Hi,
i’ve tried the plugin but I habe problems with tags.
The plugin always remode the /iframe closing tag.
thank you.
Hi,
I have also tried to save my html page into a php page without any success. The only thing that is reoved in the created page is the tag.
I have tried to add iframe to supported tags when chosing the cleaning option for word document.
my pages are not word documents . They are clean pages.
thank you.
i’m sorry for my previous post. i’m talking about the tag.
Hello there and well met
I am at my wit’s end… I am a person that knows how to write out my HTML.
My Web Hosting has incorporated, WordPress, to be used. This is thier take on the matter ‘We do not provide training, so users should be somewhat familiar with CMS and cPanel.
I am frustrated with using WordPress admittedly and I want to try to use it BUT it is aggervating to me..
I have a series of HTML Pages I want to try to bring it into the website etc
I have attempted to configure Import HTML Pages using SETTINGS – HTML Page Import Options – Beginning directory:
/public_html/wp-content/plugins/import-html-pages
This should be a full path from the server root, on the same server where WordPress is running now.
but it errors out with the following:
Warning: scandir(/public_html/wp-content/plugins/import-html-pages) [function.scandir]: failed to open dir: No such file or directory in /home/graceofc/public_html/wp-content/plugins/import-html-pages/html-import.php on line 554
Warning: scandir() [function.scandir]: (errno 13): Permission denied in /home/graceofc/public_html/wp-content/plugins/import-html-pages/html-import.php on line 554
Warning: Invalid argument supplied for foreach() in /home/graceofc/public_html/wp-content/plugins/import-html-pages/html-import.php on line 555
/wp-content/plugins/import-html-pages
I have asked the web hosting person if they could help.. and they tried and could not aid me..
PLEASE if anyone would be kind enough to help me, with this plugin and also wordpress …I would gladly talk to them on the phone(US only) or through Emails. … I am sorry but I would not be able pay someone
Doug, I think you’re going to have a hard time using this if you’re not familiar with WordPress yet. It’s not really a beginner’s plugin.
That said, it looks to me as though you’ve entered the path to the plugin directory as your beginning directory. It should instead be the path to your HTML files. Judging by the paths in the error messages, it should be something like:
/home/graceofc/public_html/folder-where-my-html-files-are
Hi Stephanie,
I’m contemplating the possibility of using your plugin to import my Tumblr blog into WordPress. I’m aware there already is XML exporters, but they do not export media, only links. So all my stuff is still on Tumblr’s servers. I’d rather have all my media (pictures, etc.) stored on my self-hosted WordPress blog.
Now, Tumblr has a great backup utility which allows anyone to download a full HTML backup of his Tumblr blog locally (it creates a folder on his or her machine). It’s fast and it work like a charm. More importantly, it does a backup of everything : picture and music are downloaded, draft, tags, queued posts etc.
How do I get started with this? Do I have to upload my HTML backup folder to my wordpress site?
I’m sure someday someone will come up with an easy export/import solution, but for now I’m looking for workarounds.
Thanks a lot,
P.
I don’t think this would work very well for Tumblr. For one thing, my importer doesn’t (yet) handle media uploads, so you’d get the HTML with paths to the old file locations.
Ok Stephanie. That’s a useful information regarding what I intended to do. Thanks anyway for taking the time to answer me. Have a good summer.
P.
What a great find Stephanie:
I’ve got hundreds of pages to import into WP 3.0. Trouble is, nothing happens when I try to import. No messages, no errors, nothing shows as imported into the HTML Page Import window, and of course no pages are created.
I’ve double checked the “beginning directory”. As best as I can determine I’ve got it right. Perhaps not?
I’ve got the “.,..,” in the “skip directories…” input field.
My host is using PHP 5.2.11.
What else can I check to get your terrific plugin working for me?
Best,
Loren
UPDATE: Definitely the setup at the hosting company I use.
I moved the HTML files to a staged WP installation I have to Godaddy which I use for testing. I installed the Import HTML Pages plugin and it performed perfectly once I got the path correct. My mistake there was obvious based on the error message I received. I had to play a bit with the settings to get titles right, etc. I expected that, and, this took a fraction of the time it takes to cut and paste content from an HTML file into WP.
From there, it was an easy task to export the content into an XML file, and import into the new site.
I don’t have time to “bird dog” what server setup causes this problem which would be interesting to know. I’m looking for solutions. Found this one, and it works great! Far from a disaster, nor does it create more problems than it solves.
Thanks for providing this plugin Stephanie!
I’m so glad you got everything working. Thanks for letting me know!
One glance at these comments, this plugin is a disaster, creating more problems than it solves. The few positive remarks are fawning on its creator in the hope she’ll help them make it work.
It’s a dog, a usability nightmare, and a failure.
Either redo it right, or drop it from the WP list.
Steven, I took your comments on the UI to heart, and I’ll work on that in the next version. Thank you for that. However, please do not hijack every comment thread to air your complaints. Many people have used the plugin successfully. A few, like Loren, have server configurations that prevent it from working properly.
Constructive criticism, like your first comment, is welcome. Any further pointless screeds will be deleted.
Steph,
Thanks for this script. Apparently Steven is a tool. This worked like a charm for us and saved a ton of time. I have put in a request that you be donated to. It takes a lot to put yourself out there with a plugin like this. I want to make it work with the Custom Permalinks script by Michael Tyson. That way you can import it and forget it and google is happy. Any tips you can offer before I begin would be appreciated. If this works I will submit back to you.
Hey Steph,
sounds like a great plugin, been pulling my hair out trying to get dreamweaver content into WP.
Do you have more directions? I’m still lost with whats in the plugin and what you have above.
Just tring to upload single pages.
thanks
Norm, it really just does not work with single files right now. I’m going to try to fix that as soon as I get a breather, but my travel schedule is crazy at the moment. Use it on a directory of files, and it should be fine.
Hi,
I have hundreds of static html pages which I’d like to import to WP.
Each file has data for the content, but there are some parts of the data that I’d like to go to several custom fields.
Is that possible with this plugin to import part of the data to custom fields?
If not, I’d be happy to know how can I use the core base of this plugin to achieve such a capability.
p.s. – I’ve posted this question also in the WP forums –
http://wordpress.org/support/topic/425368
Many thanks,
Maor
I installed this plug in. I tried everything possible and nothing happens. I don’t even get any error messages to see what might be the mistake.
All it says is 0 pages transferred… Can someone please help me out here?
Thank You
Check my post above Adi, on June 23-24. I was having similar problems, and I detailed what worked for me.
Hi,
Any news on a version that will upload the images as well? I’m learning WP as I transfer and old site to WP, and this seems like something that would be interesting. I’d be happy to test beta’s etc if need be.
Thanks,
Tomek
For anyone who is having problems, check to make sure there are no brackets in your content and title tag fields.
Even though the instructions clearly say to remove them, I left them in the title field and saw both of the problems reported earlier: proliferating posts and xml errors.
Thanks, Jay! I’ll add some code in the next version to remove the brackets if they’re present.
Working great! Thanks for this important plugin, Stephanie.
I just thought I might be able to help a bit. I was also getting the warning messages but did a couple of things. I moved all the files into a folder and placed it in the same folder as the import plugin as someone mentioned above.
I still got the warning errors but realised where it says no such directory and then the path, the first part of this was the path that I wasn’t using, for example /home4/example/public_html/
I was using a path from /public_html/…… where I should have been using the path from /home4/example/public_html/
I hope this makes sense.
To maybe make it plainer, check the error message for the path you should be using.
Hi Steph:
I am using “Beginning WordPress 3″ (which is great!) as my guide to converting an old static HTML website to WordPress. I would like to use the “Import HTML pages plugin”, but it fails find the directories with the HTML files. Both the HTML files and WordPress are on the same server at netfirms.com.
I have the beginning directory as /www/natasar.org (this where the old home page is, with a number of subfolders). WordPress is installed in /www/natasar.org/wp/.
The first error message is:
Warning: scandir(/www/natasar.org) [function.scandir]: failed to open dir: No such file or directory in /mnt/w0337/d42/s41/b026f150/www/natasar.org/wp/wp-content/plugins/import-html-pages/html-import.php on line 554
At this point I’m stuck. Do you have any suggestions? Thanks in advance for any help.
The beginning directory needs to include the full path from the server root, so in this case it should probably be /mnt/w0337/d42/s41/b026f150/www/natasar.org.
(Glad you like the book!)
Hi Steph, do you have a cvs server set yet? I am going to try this plugin for my own purposes but make it a bit more user friendly when time permits and I would like to submit it back to you using a svn or cvs server setup.
Obviously for beginners the explanation about relative paths from local documents is needed as well. Some of the issue s I foresee with word docs is that some file owners lock it down from exporting the images.
Hi Stephanie.
I am testing/sandboxing the plugin with a small sub-set of my large 90+ page / 1,000 image site. Am getting the pages to import ok (will need a fair amount of tweaking, but that’s ok) having cleaned them up drastically beforehand.
However, the big bugbear is not bringing in the images -
[QN1] I gather this is a feature not yet included, and not my own error somewhere along the line . . ?
[QN2] Assuming this is the case, would it be possible to set all imported pages to look for their images in one place (=folder_A), use WP to import all images into the Gallery (=folder_B), and then do a Search and Replace on the code to substitute folder_B for folder_A . . ? Would this work, would WP actually recognise and properly incorporate the images within the pages . . ?
.
Would much appreciate your thoughts/comments/suggestions/workarounds. Nick
Update. [I should add that I'm working offline]
I actually changed all the src/href’s throughout the site to static ones, of the form src=”x:/xxx/xxx/wordpress/wp-content/images/xxx.jpg” before importing the pages. I also imported all the images, in WP Gallery, into that /images/ folder. When I imported the pages all the images are present and correct. So that’s good.
.
However . . None of my internal site links work. A link to pagexxx.html becomes broken because that imported page is now referenced/called ‘page=xxx’ or ‘pagename’.
Assistance would be appreciated. I must be doing something wrong, somewhere.
ok, I realise this is the function of the .htaccess file. Would 100 filename redirects be too many there? It would be good if the plugin could (say) create a page slug from the original page file name, then the Permalink could be set to use this and pre-existing links would still work . . (??)
I’m going to try changing all my page titles to page file names, then importing.
ok, changing html page titles to page file names, then importing, works.
Just a note — if you have a file of zero size in among the files to be imported, the plugin will fail with an error indicating that neither fopen nor file_read_contents is available. Removing the zero-size file — or renaming it to something the plugin will ignore — cures the problem.
Enjoying your book. I’m trying to determine if WP is right for my current sites that are in Joomla and Dreamweaver and your book is giving me confidence that it is.
Quick question on the plugin…is it compatible with WordPress 3? Just asking as the installer says it hasn’t been tested with 3.0.1.
It does! I just haven’t gotten around to updating the readme.txt file on the plugin repository.
Hi Stephanie,
Thank you for writing this plugin. It has enabled me to migrate a large legacy site to WordPress in just a couple of days. I can certainly confirm that it works with WP V3.0.
All the best,
Dominic
There is a blog post at http://www.dbrenton.com/2010/08/migrating-a-static-legacy-site-to-wordpress/ that goes with this.
Hi Stephanie,
Your plugin is an absolute lifesaver. I was prepared to have to fork out large sums of money to convert my legacy site (1000+ articles) to WordPress. I will probably still have to do lots of work to get rid of unnecessary code/formatting, but 70% of the job is already done by your plugin.
It definitely works with WP 3.01 and with no issues thus far.
Derek
There is a plugin called Search and Replace that can help with the garbage code. We used it quite a bit. Find it at
http://wordpress.org/extend/plugins/search-and-replace/
Yep, that’s the one I use, too.
Can’t see download button on http://wordpress.org/extend/plugins/import-html-pages/
Can you give me a link ?
Woodwolf, the WordPress site administrators have been working on the plugin repository for the last few weeks. The missing download buttons were a temporary glitch. You should be able to download the plugin now.