pastebin - collaborative debugging tool
pdf.kpaste.net RSS

Difference between
modified post 98b by Anonymous on Wed 11th Jul 2018 03:18
original post e410ed8 by Anonymous on Mon 4th Jun 2018 01:33
Download diff
Show old version | new version | both versions

    
11
pdfseparate ../source.pdf page%04d.pdf
22
ls *.pdf | awk '1==1 {printf("convert -quality 100 -density 200 %s %s.tif\n",$0,$0)'}
33
ls *.tif | gawk '1==1 {printf("tesseract -l eng+ita %s %s.txt pdf \n",$0,$0);}'  | sh
44
pdfunite *.txt.pdf out.pdf
55
pdftotext out.pdf out.txt
66
cat out.txt | tr "'" ' ' | tr ' ' '\n' | tr A-Z a-z | tr '.' ' ' | tr ',' ' ' | sort  | uniq -c | sort -rn > words_frequency.txt
1010
convert -density 50 file.pdf[0] page.jpg  # convert page #0 to jpg  [0,10] 0 and 10
14+
pdfunite *.pdf out.pdf
15+
ls *.jpg | awk '1==1 {printf("convert -resample 300  -quality 50  %s %s.pdf\n",$0,$0);}' | cat

Submit a correction or amendment below (click here to make a fresh posting)
After submitting an amendment, you'll be able to view the differences between the old and new posts easily.

Syntax highlighting:

To highlight particular lines, prefix each line with {%HIGHLIGHT}




All content is user-submitted.
The administrators of this site (kpaste.net) are not responsible for their content.
Abuse reports should be emailed to us at