Will be the same as previous but instead use the extension of. Textutil -cat docx “file1.webarchive” “file2.webarchive” The standard output will no longer be (stdout), this time it print to file with name ‘out.txt’ Textutil -cat txt “file1.webarchive” “file2.webarchive” It will print a text format of webarchive to the standard output (stdout)Ģ files will be concatenated into 1 and print to standard output (stdout) Textutil -cat txt “file1.webarchive” “file2.webarchive” -stdout Set AppleScript's text item delimiters to "" Set AppleScript's text item delimiters to theReplacementString Set theTextItems to every text item of theText Set AppleScript's text item delimiters to theSearchString On findAndReplaceInText(theText, theSearchString, theReplacementString) Set o to do shell script "export _CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100" & " " & space & "textutil -cat txt " & quoted form of (thePath) & space & "-stdout " & space & "|" & "pbcopy" set thePath to POSIX path of (path to desktop as alias) & "myArchive.webarchive" Update of my Script, now everything looks okey in Script Editor at least. If any have the same problem when using pbcopy… or use pbcopy to pipe bash script to AppleScript. So what does that line do… it will change pbcopy to use utf-8 encoding. To include this line “export _CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100”īefore the command textutil the text is same as in terminal. The things was… terminal shell use utf-8 as default but not pbcopy command. The thing that hit me for hours was the text output from textutil, it look fine in terminal.īut it was not correct when I use pbcopy to pipe the text back to Script Editor. I realize textutil is very powerful tool… That could be a good idea before doing any search. Mark, textutil has cat function and if there is multiply files it will concatenate them and What would be the best approach to be able to search in webarchive, find matching, extract text from it ? I do understand I have to clean the code somehow… hmmm The result of Script Editor is not same as I do directly in command-line… Set out to do shell script "textutil -cat txt " & quoted form of thePath & space & "-stdout " & "|" & "pbcopy" Here is a fast AppleScript… set thePath to POSIX path of (path to desktop as alias) & "myArchive.webarchive" If I choose to do it with textutil everything are done in background and that is great. This format are more close to rtf format. Very interesting if I need to edit and later… for printing. I also find out that doc, docx and wordml had very good output in TextEdit. So I was thinking about using apple textutil command to convert or cat to txt format, do find string matching. I do not like to import the webarchive to Safari to be able to extract text or copy… I also know that Spotlight, mdfind and maybe other could search inside this format. I know QuickLook and Safari and (TextEdit in limited way) read this binary plist file. I use webarchive as a way to read documents… but also to archive. I’m a big fan of Safari webarchive format and many times I find it to be better and covert to PDF.
0 Comments
Leave a Reply. |