Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That PDF actually has some weird corner cases.

First it's all the same font size everywhere, it's also got bolded "headings" with spaces that are not bolded. Had to fix my own handling to get it to process well.

This is the search engine's view of the document as of those fixes: https://www.marginalia.nu/junk/congress.html

Still far from perfect...



> That PDF actually has some weird corner cases.

Heh, in my experience with PDFs that's a tautology




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: