Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
News.YC in Polar Coordinates (diffle.com)
29 points by nostrademons on Sept 19, 2007 | hide | past | favorite | 10 comments


This was something that John Yu (yubrew) and I hacked up during DevHouseBoston this weekend. It scrapes the front page, figures out which stories have commenters in common, and plots it in polar coordinates. The radial coordinate is the total number of comments, on a logarithmic scale (with the inner 1/3 or so reserved to reduce clutter). The angular coordinate attempts to cluster together stories with similar commenters. It updates every 6 hours, and there's a delay while scraping to keep from overloading PG's server.

A lot of the features (animation, selection, title/domain swapping) are basically gratuitous - I wanted to get more practice with JQuery since I need that functionality in my startup. Hell, the whole thing is basically gratuitous. But it was pretty fun to write.

John was working on a scraper for Reddit...that may be a bit more interesting, since it updates more frequently and the larger audience means more posts in common.


Nice hack nostro. What process did you use to scrape? Been working on a newsyc scraper

- extract & parse with python + beautifulSoup, display with YUI

- parsing the RSS feed for user, story points, story title and date posted

- then grabbing user details via homepage for karma, inception date

- exporting to xml

- skinning with YUI

Will display when ready, heres a screenshot ~ http://flickr.com/photos/bootload/1400863977/


Regexps. John tried a beautifulSoup parser, but it turned out to be more trouble than it was worth. Insert joke about "now you have two problems", but it works in a pinch, and it really only had to last till we showed off our projects at 8:00 PM ;-).

I don't mind sharing the code if PG doesn't mind the potential flood of screen scrapers. It just handles the front page and comment page for individual articles, and it's pretty compact - only 60 lines including doc comments.


"... Regexps. John tried a beautifulSoup parser, but it turned out to be more trouble than it was worth ...'

Took me a bit of mucking around to get it working. My trick was just utilising the FOX HTML Validator to find the page structure, then CUT+PASTE the text into IDLE and call up BS in IDLE, then write the BS expression to oarse the string till I had the right data.

"... I don't mind sharing the code if PG doesn't mind the potential flood of screen scrapers. ..."

A better suggestion might be supply the raw data you collected as a service and let others do what they want with it. Most just want the data and are not particularly interested in the HOW. If they are interested in the HOW then it's better to point them to some scrapers to play with.


Similar commenters mostly appears to be Paul. Why did you choose that criterion?


Depends where you're looking. pg seems to be common to the south. However, up at the north there are a bunch that are all gwenhyfaer, while to the west is joshwa and Readmore. Which makes sense, since the plot tries to cluster by common commenters along the angular direction.

I chose this criterion mostly so you could see who's participating in other threads similar to the articles you like.


Doesn't seem to be working correctly on my browser then (ff 2, XP). Commenters do not always load correctly and sometimes requires clicking back into one of the links. In the end, I'm not sure I'm seeing what you are seeing.


Umm scratch that. Just now figured out how it works. Sorry.


This is awesome! You may be able to improve it further by considering other things besides common comment posters to determine the angle measure for a story. I haven't seen this good of a web information graphic since the treemap used at "newsmap" found here: http://www.marumushi.com/apps/newsmap/newsmap.cfm


This is a hot new view.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: