This is “YYYY-MM-DD” or “no date” if no date was found. the document name is changed to the “name” returned by the perl script.the applescript reads the file’s timestamp and sets the internal DTP record “creation date”.returns the date “name” back to the applescript.sets the file’s timestamp to the date (if found).Actually, I’ll recommend a smart book scanner to scan and process document to remove background. transforms and searches the text for appropriate date strings Answer (1 of 2): I guess you’re asking post-processing of digital documents with graphic design software like PS, AI and etc.passes the PDF text and location (path) to my new perl script.takes the list of currently selected docs within DTP and for each doc.This was not possible with my first solution as changing a file’s timestamp within the DTP DB structure doesn’t automatically get reflected in the object’s metadata - although there is another bundled script for that!Īnyway, I now have a DTP applescript that: I wanted something that was repeatable/re-runnable so that improvements in the date searching algorithm could be applied to documents already imported into DTP. Python program for OCR with option to select custom Region of Interest without specifying coordinates - OCR-opencv-tesseract/OCR automator. Updated the OCR engine for improved performance and accuracy. It is only available on the Mac App Store. Most of them are however complex, slow or not really suited for scanning documents or letters. Advanced users can use the included Automator action to create custom OCR workflows or folder actions PDFScanner runs on all macOS versions starting with Catalina up to Monterey including native support for M1 Macs. There were a few issues with the built-in perl PDF text extraction modules which meant that some downloaded (rather than scan-generated) PDF docs couldn’t be parsed so I changed to use the “mdimport -d2” system command which was messier but more robust.Īfter all of that, I then started work on managing the workflow via a DTP script. There are many applications for macOS that allow scanning of images or text. This finally produced record names within DTP like: After more trial-and-error than I had hoped, this script set the file’s timestamp (modification date) to the first date found within the document (if any).Īfter importing the document into DTP, I would then use one of the bundled scripts “rename to creation date” (?) which changed the document name to reflect the “creation date” of the object in DTP, which itself originated from the file’s timestamp. I’ve been working on a solution for this over the past week or so.Īt first, I developed a perl script outside of DTP that would search freshly scanned and OCR’d documents for date text - there are a few pre-built, intelligent PDF text extraction and date parser modules for perl on CPAN.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |