Hi HN,
I've broken my head on this, and haven't found a a reliable way to programmatically convert documents (doc, docx, pdf etc) to HTML. The only option seems open-office as a server - but this keeps crashing (at least once a day). I would like something that can process thousands of docs per day and not crash. Any one here has faced this problem / knows a solution?
[ PS: In case you're wondering why, we run a web app for recruiting ( recruiterbox.com ) which requires converting resumes to html ]