Difference between
modified post 9e3742c by Anonymous on Sat 2nd Jun 2018 19:12
original post 772c9 by Anonymous on Sat 2nd Jun 2018 19:01
Download diff
Show
old version |
new version |
both versions
1 | 1 | pdfseparate ../source.pdf page%04d.pdf | |
2 | 2 | ls *.pdf | awk '1==1 {printf("convert -quality 100 -density 200 %s %s.tif\n",$0,$0)'} | |
3 | 3 | ls *.tif | gawk '1==1 {printf("tesseract -l eng+ita %s %s.txt pdf \n",$0,$0);}' | sh | |
4 | 4 | pdfunite *.txt.pdf out.pdf | |
5 | 5 | pdftotext out.pdf out.txt | |
6 | + | cat out.txt | tr "'" ' ' | tr ' ' '\n' | tr A-Z a-z | tr '.' ' ' | tr ',' ' ' | sort | uniq -c | sort -rn > words_frequency.txt |
Submit a correction or amendment below (click here to make a fresh posting)
After submitting an amendment, you'll be able to view the differences between the old and new posts easily.