« Web feature use survey | Main | PS/MacTel benchmarks »

April 13, 2006

Udell, data, PDF

Udell, data, PDF: I'm not sure I understand the article, but I think Jon Udell may have been presented with some arbitrary PDF files and wanted to extract some-but-not-all of the text in there. It can be hard to impose order on an unstructured document. It's straightforward to incorporate external XML data with PDF files or to save an entire PDF as XML, and if you're creating a structured PDF then you can take any user-entered form data out into concise XML, but from what I read in Jon's article, I'm not sure whether any of these may apply to the document which was presented to him. Here's an entry point to the Acrobat XML information.

Posted by John Dowdell at April 13, 2006 05:08 PM

Trackback Pings

TrackBack URL for this entry:
http://weblogs.macromedia.com/mtadmin/mt-tb.cgi/7324

Comments

Every Acrobat reader has a convert to text function or supplemental program which will do it. Once the data is machine readable, 12 lines of Rexx code will reformat the text to Excel. I use a Rexx freeware Dos program.

http://ftp.gwdg.de/pub/languages/rexx/brexx/html/rx.html

Alternatively, the text can be inputted into Word and massaged into a table, readable by Excel.

As you can see from the article he is 35% ego and 0% practical, simple work.

Posted by: Bill at May 8, 2006 02:30 PM