Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Related Website Finder Experiment Thingy (marginalia.nu)
5 points by marginalia_nu on Oct 9, 2022 | hide | past | favorite | 5 comments
Been messing with cosine similarity and decided to try calculating nearest neighbors over the entire link graph for the marginalia search engine.

Turns out that you can just bruteforce that in a day or two. And the results are pretty good.

One drawback is that depending on if you're looking at an older website, a lot of the links are dead. The deduplication isn't great either.



This is a great idea. Marginalia Search is already very useful for finding interesting niche/older articles and posts on many subjects (I have used it regularly to find interesting things to write about on my website). Being able to find related sites to interesting sites that pop up on Marginalia will be a great feature. Thank you for all the hard work you do on the project.


Yeah, I've got a few ideas on how to integrate this further. Just listing them is cool, but it would be exceptionally neat to be able to use this to create ad-hoc filters, and say if you submit the query

  plato near:classics.mit.edu
and then get results from this list:

https://explore2.marginalia.nu/search?domain=classics.mit.ed...

... possibly ranked with consideration to their relatedness.


That's super fun, have you tried comparing it to methods like normalized cuts on the adjacency graph?


Haven't tried all too many methods, a lot of the standard approaches go out the window due to the sheer size of the graph means it can't be loaded into most tools. Got 12 million nodes and 40 million edges. Tried to load it into numpy, but that was no good at all. Gonna have to code up some dimensionality reduction algorithm if I want standard tools like that.

Although you can get away with some pretty surprising stuff due to the extreme sparseness of the matrix.


Nice looking project. Is there a GitHub code that I can take a look at?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: