Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
YaCy takes on Google with open source search engine (theregister.co.uk)
71 points by a_w on Nov 29, 2011 | hide | past | favorite | 17 comments


I'm liking this P2P trend which seems to be spreading everywhere. I don't think we're ready to have everything P2P yet, but it's good to see the trend growing. At least now we know that if or when Google will be forced to censor more results than we'd like them to, there will be a P2P alternative available waiting for us.


Let's hope your views are like those saying the same before open source came into it's own. We live in exciting times...


IMO what's missing in the search space is a web search engine with an API, especially with access to the raw crawled content. Amazon used to do this was the Alexa web crawl data ( see http://www.readwriteweb.com/archives/alexa_turned_in.php ) but later withdrew that part of the service.


I have a gut feeling that we won't see something like that soon due to legal implications.


I (among quite a lot of people I guess) have thought about using P2P for web search.

In fact, P2P protocols like KAD have been using to search for quite some time. What I would like to see is a search system composed of a client:

1. Implemented in Javascript (so that the user does not need to download a program to use it). 2. Defining a file format which describes one URL, with any extra useful metadata (document type, last crawling date, text content, etc) 2. Share those files using a P2P protocol like KAD 3. Is able to search in the content of the URL file for words, phrases, etc

As gubatron said, having an online "frontend" would be optimal. In addition to that, people could embed the "crawling" client in their webpage (which might double as ad server) to help the crowling effort.


YaCy (while a cool project) is not new and has been around for lots of years now. I think it has quite some potential, but don't expect it to suddenly lift of. It had enough time to do so, but didn't.


I'm sure a lot of people are interested in the implementation details of YaCy's privacy mechanisms. Does anyone know the default privacy settings? Are search words that are sent in any way protected? I found this page: http://yacy-websuche.de/wiki/index.php/En:Privacy

But it's not that helpful. I'm currently looking at the source code: https://gitorious.org/yacy .


"Build a search engine" == "takes on Google" ? Well ... I guess so.


I haven't looked into the internals of it, but couldn't a black hat SEO run nodes that manipulate results in favour of their own sites?


This could be a good idea, IF there was a way to stop all kinds of malicious people to tamper with the search results in so many ways. Google already has to deal with the manipulation of the signals about page relevance, just think if you had to also deal with tampering with the ranking system itself...


The intranet search engine concept is interesting and will help this grow. Anyone know of anything else which is a search engine in a box, basically an open source competitor to the various Google Search Appliances?


The distributed search model that YaCy uses would never work in a large scale enterprise. Security, safe harbor, etc are all difficult enough using a traditional, centralized approach. Trying to imagine this done in a distributed way across the enterprise is giving me a headache.

And the closest thing to open source, turnkey search is gluing together Apache SOLR and a web crawler. Lucid Imagination offers this (plus other features) as a commercial product, but it not open source to the best of my knowledge.


I was playing with YaCy a little, and there is an "Intranet" mode. As far as I can make out, this can operate in a distributed way, but behind the firewall. I didn't look into how to set it up in great detail yet though, was playing with web search.


Was that subtitle necessary ("good idea, stupid name"), people probably thought Google and Yahoo were stupid names at first too.


So I have to install software on my computer to use it? No, thanks. They claim an advantage: "no content can be censored and no search results can be recorded and analyzed on central servers", this is extremely important for some applications, but for searching source code, I couldn't care less. It just raises the bar of adoption to the point I'm not interested in it.

General purpose client side software is dead. Client side software makes sense only for niche applications.


their idea is good, but the way that it's executed trumps its growth.

instead of having people install this on their computer, they should make it instead so that sysadmins run nodes and put ads on their node search results.

the end user would just go to a .com site, and search. everyone running nodes make money, more nodes are installed. The network would be larger than google in a short amount of time.

wonder why the hell they haven't thought of this.

people aren't going to be typing http://localhost:port to make a search and keep an engine running, also uptime and firewall configurations leaves a lot of the desktop nodes out of the equation if they can't do NAT traversal to participate in the network.

me #facepalms to still see them doing this, going to yacy.net is the most frustrating thing ever to the curious non-techie user.


Well... you've thought of it... and they're open source...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: