Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Toward a URL for every function (sourcegraph.com)
133 points by joeyespo on June 7, 2016 | hide | past | favorite | 65 comments


Sourcegraph founder here. We built this to make it much easier to grok code. It saves us hours every day. Would love to hear your feedback!

The README has some good links to try Sourcegraph at https://github.com/sourcegraph/sourcegraph/blob/master/READM...:

https://sourcegraph.com/github.com/square/okhttp/-/def/JavaA... (semantic code browsing for Java)

https://sourcegraph.com/github.com/golang/go/-/info/GoPackag... (http.NewRequest used in 8801 repositories)

Sourcegraph supports Go and Java right now. If you want to get access to the upcoming beta of JavaScript, Python, or other languages, send us a note at support@sourcegraph.com or https://twitter.com/srcgraph.


> These URLs always refer to the latest definition and won’t break if the file is edited, unlike links to a specific line number.

You can always get the URL on GitHub for a particular line number at a particular revision which won't break when the file changes. A persistent link to a function can break in a different way in that maybe the function, while named the same, changes behaviour in a way that it no longer does exactly what it originally did. It's not obvious to me which of these breakages is more relevant.

When you talk about "hackable" URLs it would be great to be able to get a URL for a named function at a particular revision. This solves both problems. I have an immutable reference to a particular piece of code, but then by hacking the URL I should be able to still see the most recent version.



Exactly :) Cool! Although it seems weird to me to have the revision in the middle of the URL.


I think it actually makes sense there, as a logical hierarchy: you want a given repository, at a given revision (e.g. a commit hash), and a given file within that revision.


Where would you want the revision? At the very end? Right now, it's associated with the repo, which makes sense and is easier on our URL routing. But I'm curious to hear what you'd prefer.


Actually on further thought maybe this does make more sense. I don't really have a strong preference either way anyway.


I think I agree with parent - if the premise is that the units we're dealing with are the functions, then I want to usually access `/function`, and `/function/revision` is being more specific.

I get what you're saying, but it seems like the URL mixes where you're coming from with where you want to take us.


This also works for tags and branches (or any 'ref' for that matter instead of the sha1)


I notice that the URLs contain file paths in them. What happens if you move a function to a different file? A different sub-directory in the project? Would the url continue to work?

(Hi Quinn! We met last year after the discussion around https://news.ycombinator.com/item?id=8308881)


Hey Kartik! Good to hear from you. They contain package paths, such as Go import paths or Java "com.example.mypkg" paths, but not file paths.


At least in the case of Java package paths map 1-1 with file paths, right? Is Go similar? In that case, moving a function to a different package/directory would change the url? Would the old url break, or continue to show the old/stale version?


Yep, but those languages also define equality in their type systems using the package paths. It's tricky, of course. :)

We want to be able to track function moves and renames across packages or even repositories. It'll take some more work to get there.


Yes, certainly a hard problem. I'm just trying to understand what happens today. Does the link break or (better) just stop getting updated?


It will break if there is no revision, or a non-absolute commit ID (such as a branch or tag). An absolute commit ID link will continue to work.


When I read the title, I was imagining something more theoretical, e.g. a URL encoding of lambda calculus terms.


Same here. As others have indicated, that would be more consistent with the title (and more interesting).

The more general idea is a content-addressable function repository, where, as you point out, code would have to be in some kind of normal form. Joe Armstrong toys with this idea in his talk "The mess we're in," one of my favorites. [0]

[0] https://www.youtube.com/watch?v=lKXe3HUG2l4&t=33m10s


My semantic-web-loving cold dead heart disagrees with you on the "more interesting".

The one really good idea in the whole Semantic Web train-wreck (and at this distance, I think it's fair to call it that) was that everything should have a negotiable, dereferencable URL. REST includes that core principle.

A lot of single-page web apps make me sad because they've been designed without reference to that. If what you're building is genuinely an application, then I get it; but most of the time what you're building is a catalogue, and everything in that can have an address, so it should.


I agree 100% about catalogues and addressability. I'm also a diehard in that respect, and I've come to distinguish "apps" in a similar manner, as sites where either there's no "business case" for an ontology, or the content is too transient for it to matter.

Naming is hard... I think that's what makes the "semantic web" a kind of chimera even for those who endorse it. Can a URL really capture the worldwide identity of a thing? Over all time? And who's going to maintain all those names?

Suppose that naming everything is intractable as a human effort, but we still want addressability. Alan Kay takes this to the extreme, saying, why not give every object on the internet an IP address? [0] Not every resource, every object, in every program. It sounds facetious, but it's consistent with his general objects-as-computers-all-the-way-down view. His system designs (including those from VPRI) express the belief that hard barriers between the layers of a system (usage and application, application and framework, framework and OS) account for much of today's uncontrollable code bloat and the limits on how much scale systems can tolerate. The "everything gets an IP address" idea is just a recognition that network boundaries will eventually be seen the same way. From this perspective, it might be fruitful to think about how we'd identify things on the internet if they were homogenous with the objects in our applications.

[0] It's in one of his talks but I don't remember which.


Strongly agree! One of the reasons I'm excited about IPFS is for content addressable code linking. For example, running single js functions through google's closure compiler to normalize the symbols and using a package manager that would recursively replace IPFS links with the code.



Same here! And then I went, "Uh oh, … CORBA strikes back!"


Am I the only one in the world that liked CORBA? I used ACE and TAO back in the day, and built multicast discovery mechanism to decentralize the ORB.

Then the SOAP shitshow came to town, and now REST which is the rockstar, but people want better defined endpoints (RAML), um IDL was there in CORBA.

REST is definitely easier to use than CORBA was, but it's much more limited, and it always feels like we're just reinventing the wheel. And yes, REST is better for web APIs, but it's always felt lacking after using CORBA.


It's been a while since I looked at it, but I think Computational REST (CREST) tried to do something like that: http://www.erenkrantz.com/CREST/


Sort of reminds me of the Redfoot project for Python. It was sort of a URI based code repository...


Check out Binary Lambda Calculus.


Sourcegraph folk, are you aware of Rich Hickey's codeq [0][1] for clojure:

codeq allows you to track change at the program unit level (e.g. function and method definitions) and query your programs and libraries declaratively, with the same cognitive units and names you use while programming

[0] http://blog.datomic.com/2012/10/codeq.html

[1] https://github.com/Datomic/codeq#codeq


It lacks the ability to track changes, but BBQ http://browsebyquery.sourceforge.net/ can query JVM and CLR programs - and holy crap is that useful when you need it.

Speaking of tracking changes at the method level, does anyone remember VisualAge for Java?


Sourcegraph founder here. Yep, that's an extremely compelling idea. We were definitely aware of this when we started Sourcegraph. Are you using codeq?



Sourcegrapher here. And if you want to see everywhere in the world that function is used, check out the usages list on the right side, which takes you here: https://sourcegraph.com/github.com/golang/go/-/info/GoPackag....


For Go in particular, a possible alternative is godoc.org:

https://godoc.org/flag#Arg


And Java has Javadoc, and so on. This seems interesting in that it's not just about the semantic docs, but actually provides an IDE-like source view in the browser. The closest I've seen before would probably be SXR/Scala X-Ray[1], but this seems much more polished.

[1]: http://www.scala-sbt.org/0.13/sxr/CrossVersionUtil.scala.htm...


For embedding into documentation, there's also a question of longevity. Which URL is less likely to break?


Some of you may find Unison interesting: http://unisonweb.org/2015-05-07/about.html#post-start


A URL for every function on GitHub, at least. Cool idea.


It appears to be namespaced, so presumably sourcegraph could add more repos at their own leisure.


Sourcegrapher here. Indeed. Sourcegraph has repositories hosted elsewhere, such as https://sourcegraph.com/bitbucket.org/gotamer/bbpost/-/def/G.... These are picked up by automated backend processes; if you have a Git repository that you'd like to specifically add, just email us (support@sourcegraph.com) or Tweet at us (https://twitter.com/srcgraph) for now.


Any plans to support Hg or TFS?


Yep, we will, but we don't have a timeline for those right now. Any VCS that can implement this Repository interface (https://sourcegraph.com/sourcegraph/sourcegraph/-/def/GoPack...) is fine. We have some code written to support Hg, but nothing ready to release yet.


This should be a great long-tail SEO boost. It's just like one of (Rap) Genius's best early SEO advantages, which was that they had a URL for every line.

https://moz.com/blog/how-i-would-do-seo-for-rap-genius


Most JS programmers seems to use modules (require/import) as masqueraded globals, like importing complexed functions instead of just standalone modules. And in that case it's better to just declare all dependencies in the root (html file). You would probably want to use a package manager though, to keep track of name conflicts and manage the script tags (dependencies of dependencies).

As for central hosting of packages I think it will work. But we will probably need to be able to have many src attributes in script-tags for redundancy and optimal caching.


Neat idea, although I'm not sold on the style of the URLs themselves. It'd be cool to introduce a new URL scheme:

    code://github.com/edicl/hunchentoot/master/log.lisp?macro=with-log-stream
That would handle multiple definition namespaces. One could use

    code://github.com/edicl/hunchentoot/master/log.lisp?macro=with-log-stream&commit=0951a0df8fe93d99e6f2aa3f9612a2d6e581e84f
to refer to a particular commit. No idea what the equivalent would look like for other VCSes though.


I've been toying with roughly the same idea for some months. I've come up with new URL schemes for JSON objects[1], and (Iot)hings[2]. They differ in what they do but the purpose is to explore how specific URL schemes could open the door for improvements.

[1] json://the-domain.com/example

Would return:

    {"json":"data"}
[2] thing://ip-address/example

In this case, it returns JSON for the sake of readability, so:

    {"name":"car","speed":88,"location":"1985"}


What would be the advantage of those schemes over HTTP? In the case of code, I can see that there's a potential benefit to being able to refer to conceptual objects (e.g. variables, functions, macros) within version control systems, and maybe that's worth breaking with HTTP URLs, although I'm not completely sold there.

What would the advantage of json: be over Content-Type: TYPE+json?

It's a little easier to see that iot:UUID might indicate something like 'over any number of protocols, over any number of networks, please contact this device in my locality' or somesuch.


I'm exploring if there is a benefit to using new protocols for networked things. Mostly research. :)


Please don't do this if you want future-proof URLs - that's the whole point of linking to a specific commit. Functions and files will get moved, renamed, refactored and deleted.


Curious why they decided not to work on adding Ruby support especially when underlying srclib which they use has support for it.


Sourcegrapher here. We'll definitely release Ruby support in the future. Because Ruby has a lot of dynamic language features, it's important we do it well, and it takes some and thought. We track the coverage % of our analyzers for all languages we support, and our Ruby support isn't at our quality threshold yet.

We'll release Ruby when we can do it almost as well as Go (e.g., https://sourcegraph.com/github.com/golang/go/-/def/GoPackage...).


Nice, looking forward to it!


I'm not sure if I understand this correctly, but my first thought is, what if the function that I am using needs to change? For instance, using css, if I later discover that the design was incorrect, I would rather just change the design code instead of updating each linking instance.


Sourcegrapher here. This is for linking to the source code of functions (and other definitions), for when you are discussing or explaining code. It's not for importing code at compile time or runtime.

P.S. Sourcegraph supports semantically linking to CSS as well: https://sourcegraph.com/sourcegraph/sourcegraph/-/def/basic-....


Got it, thanks for the clarification.


It'd be cool if you could create a permalink from a github URL (with a line no. param).

Then the interface could look like tinyurl (anonymously paste a github link, get a sourcegraph link in return).

Bonus points if it simply redirects you to the new line number on GitHub's master.


We have something like that—and better in some ways. Check out the Sourcegraph Chrome extension: https://chrome.google.com/webstore/detail/sourcegraph-for-gi.... You get jump-to-def by clicking on code on GitHub. And it uses these semantic URLs in a backward-compatible way, so that they encode the definition/function name but also the line number (so anyone not using the extension can still use the URLs).

It's a great idea to make a little app to get a permalink, too!


Don't we already have this in the form of NPM?


What happens if a function is renamed?


We don't handle that case right now, but it's a TODO. Will be an interesting problem to address.


...for an extremely narrow definition of "world". (Github)


Sourcegrapher here. Git repositories hosted anywhere with Go or Java code will work. Admittedly, that's still narrow, but we are just beginning. :) Also, see https://news.ycombinator.com/item?id=11856255.


Or, just embed the entire function in the URL. :)


Hi, this looks very interesting but maybe more on the side of a feature i wish GitLab, Bitbucket etc would have than other getting a dedicated for.


Sourcegraph founder here. If you install the Chrome extension (https://chrome.google.com/webstore/detail/sourcegraph-for-gi...), you can get it on GitHub. Support for GitLab and Bitbucket is coming soon.

And what's more, the Chrome extension actually adds # fragment URLs to github.com that are semantically meaningful, just like on Sourcegraph.


Would make a lot of node modules as functions redundant


You mean like RPC? If so take a look at Thrift, Protocol buffers, Avro, or whatever.

If the idea is to just expose reusable functions, you can take a look at https://algorithmia.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: