Markup is only indirectly related to hierarchical structure. “Markup” means that there is text that is being “marked up” with additional attributes (styling, structure information, metadata, …). This is how HTML and XML work, and also languages like TeX, Troff, and Markdown. For example, in the text “this is some text”, you can mark up the word “some” as being emphasized, as in “this is <em>some</em> text”.
The general principle is that the base content is plain text, which is augmented with markup information, which may or may not have hierarchical aspects. You can simply strip away the markup again and recover just the text. That’s not at all how PDF works, however.
You cite a comparison to JSON and YAML. Those are not markup languages (despite what YAML originally was an abbreviation for, see [0]). (HTML also isn’t DOM.)
I was quoting the article there about JSON/YAML, not making that claim myself.
Did you take a look at the article I linked? It shows visual examples of hand-coded PDFs that demonstrate the structural similarities I am talking about.
Thanks for the clarification on terminology. I could have been clearer and more precise. I referred to "DOM-like structures" as an analogy for the hierarchical nature of PDF objects, not to claim HTML is DOM.
My core point wasn't about the technical definition of markup languages, but about the structural similarity between PDF's object model and hierarchical formats.
When coding a PDF document by hand, you work with nested structures using delimiters like "<<" and ">>" that create hierarchical relationships between objects - which has practical parallels to working with nested elements in other formats.
The forest vs. trees metaphor was to acknowledge that while PDFs aren't primarily markup formats (the trees), they do share structural characteristics with hierarchical formats (the forest) based on my hands-on experience with manual PDF creation.
The general principle is that the base content is plain text, which is augmented with markup information, which may or may not have hierarchical aspects. You can simply strip away the markup again and recover just the text. That’s not at all how PDF works, however.
You cite a comparison to JSON and YAML. Those are not markup languages (despite what YAML originally was an abbreviation for, see [0]). (HTML also isn’t DOM.)
[0] https://stackoverflow.com/a/18928199