Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TeX's default language (US English) uses the hyphenation patterns from a certain American dictionary (Liang's thesis mentions Webster’s Third New International Dictionary; probably that's the one). American hyphenation tradition tends (there are cases where different American dictionaries don't agree), more often than British, to hyphenate at phonological boundaries. (This has some interesting corollaries: the word “record” is hyphenated “re-cord” as a verb and “rec-ord” as a noun.)

British English tradition tends (more often than American) to hyphenate at morphological/etymological boundaries. For instance, if you run xetex with `\uselanguage{ukenglish}` followed by `\showhyphens{typographer}` then you get:

    ty-po-grapher
meaning that in the word “typographer”, a hyphen is allowed in those two places, instead of

    ty-pog-ra-pher
in the US English version. (Both the US English and UK English hyphenation patterns set \lefthyphenmin=2 and \righthyphenmin=3, i.e. enforce at least two letters before a hyphenation point and at least three after.)

So the hyphenation patterns have done the “right” thing in the US case, in the sense that they produced the same hyphenation as given in US dictionaries: https://www.merriam-webster.com/dictionary/typographer https://www.ahdictionary.com/word/search.html?q=typographer = https://www.thefreedictionary.com/typographer https://www.wordsmyth.net/?ent=typographer https://www.infoplease.com/dictionary/typographer (I don't have access to any UK English dictionary that shows hyphenation, as far as I can tell, but I imagine it's doing the right thing for the UK case too.)

You can see more at https://www.tug.org/tex-hyphen/ and in particular Liang's thesis https://www.tug.org/docs/liang/ is very readable (both the English and CS/data-structure parts).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: