Janus itself doesn't make any assumptions about a specific use-case. All the functionality outside of the core RTC stuff is implemented with plugins. The default "Video Room" implementation is just a Lua script [0]. Mozilla has written their own SFU plugin (in Rust) [1] for game networking that powers Mozilla Hubs [2].
No: the vast majority of the VideoRoom functionality is written in C, and it is where all of the actually-hard-to-do video SFU stuff -- like quality control feedback and SVC support, particularly in an end to end encryption context with all the codec-specific workarounds--is commingled together with the notion of "rooms" (which is a really awkward and specific high-level abstraction with a schema that you have to abuse for basic use cases).
Yes: if you don't want any of the complex video functionality, you can easily write your own Janus plugin, and that maybe sounds reasonable for some trivial game "SFU" where you are just going to move around some data channel packets... but at that point you can (and I argue should) just use libwebrtc (I do this, and I helped one of my friends do this for his product in a weekend: people act like it is hard to compile but it really isn't).
(Even more so: the Lua script you linked to looks more like a demo/example of a way to use the Lua plugin to get some functionality vaguely similar to the VideoRoom plugin, and it is notably ridiculously long and contains a lot of codec-specific knowledge, while not having anywhere near the actual functionality of the actual real C VideoRoom plugin. It is as if Janus is just a super low-level WebRTC library in the form of a framework, with an explicitly monolithic plugin doing everything.)
At that point why wouldn't I just use libwebrtc? The reason to get an off-the-shelf SFU is because all the hard work is in handling all the codec-specific workarounds, being able to handle keyframe request sharing, responding to RTCP bandwidth feedback to do SVC layer switching, and now doing all of this while most of the state is encrypted due to insertable streams... this is all hard stuff that people keep learning more about and for which the state of the art is a moving target due to browser changes.
I would expect 100% of applications doing anything at all with video to want all of that functionality, but only some small number to have a "room" concept that maps to the idea of the specific schema imposed by the VideoRoom plugin. It is thereby strange that all of that general video functionality is commingled together in a 8k line C file with all of the high-level room abstraction... the answer with Janus is always "write your own plugin", but either you are doing something so trivial that Janus doesn't seem to be doing anything but the lowest level WebRTC layer, or, as far as I can tell, you have to fork the VideoRoom plugin and then hope you can merge changes from upstream back into your plugin.
Am I wrong here? Like, I would love to find out I am wrong here ;P. (Which is why I was asking the OP about if their "custom solution" was a fork of VideoRoom: to see if they told me something I don't know.) But when I skim through that C file (or even the Lua file! though that demo very notably seems "incomplete" vs. the "real" C copy) I see tons of code referencing all of the codec-specific negotiation and stuff that I would explicitly be using Janus to get, so I can't not use or fork the VideoRoom plugin without losing the purpose of the platform as I would be reimplementing all of the hard parts myself (again, unless you are doing something so trivial--broadly speaking, something that doesn't involve video--that you frankly should be using libwebrtc or one of its various alternatives, such as Pion).
Not sure what you were expecting: at the very foundation of WebRTC is SDP, which implies negotiation, and with endpoints supporting potentially different codecs, negotiation is very much important whether you like it or not. That's why the VideoRoom plugin does need to take that into account. I won't get into the discussion of how complex a fork is to maintain: I always hope people contribute back what they add (assuming it's generic enough to fit the project and not customer-specific), rather than keeping it to themselves.
That said, the vast majority of people don't really need to write their own plugin, or even customizing existing ones. What we foster a lot is leveraging existing plugins as much as possible, maybe combining them at an application level, and not reinvent the wheel, and it seems to work for most (it certainly does for us, for our own applications).
On the Lua demo, it is indeed a bit more limited than the C counterpart (we clearly didn't invest as much time on it), but I'd disagree on the "incomplete" part. All the relevant parts are there, and most importantly, it's supposed to be much easier to extend and modify than the C version. There's at least one big company we're aware of that's using it in production and is very happy with it.
> Not sure what you were expecting: at the very foundation of WebRTC is SDP, which implies negotiation, and with endpoints supporting potentially different codecs, negotiation is very much important whether you like it or not.
I would expect the logic for negotiating streams to be an unrelated layer of abstraction to the concept of room management? That the code and work 100% of video apps want--SVC, end to end encryption, negotiation complexity--is mixed up in the same giant C file as a monolithic plugin with the code for JSON configuration files of a "rooms" abstraction that is a hardcoded notion of a single narrow vision of a multiparty video chat server is really awkward, and means that at best every single application ends up either as a messy fork or with a thick middleware adapter that attempts to translate between these concepts.
It is like wanting a pub-sub solution to build your own chat system but being handed a full IRC server as your building block, where you either need to fork the system to rework the notion of "channel" and the various user mode flags to match how you want to do chat--and then hope you can easily still rebase your work to the latest codebase, as the implementation of basic things like "send a message and have other people receive it" is mixed together with the notion of "a half-operator is someone who can kick users but not change the list of operators"--or build some thick middleware adapter layer that is simulating a simpler pub-sub system on top of degenerate channels.
If the code for "rooms" was a different layer of abstraction from the code for "WebRTC VP9 SVC signaling", it would allow me to just build the parts I want on top--so I can get the semantics of a public government hearing, which is different from a business meeting or a webinar or a "house party" without figuring out how I am going to translate my concept onto the existing meeting semantics of the VideoRoom plugin--or at least if the code for this was cleanly placed into a separate C file then I would be much happier with this idea that I am supposed to "extend and modify" the codebase to implement my own semantics, as I wouldn't be so worried that one day I am going to get a merge conflict on this 8k line file full of C code I am hacking on :(.
Then I think what you can use are the Lua or Duktape plugins, which were indeed written to allow people to write their own logic without having to worry about C or forks: even if the C code of the plugins is updated, your code is in a script that is loaded dynamically and is external to them.
If you forget about the videoroom.lua code and do something from scratch, you're free to handle the logic however you want: handling media is as simple as saying "send incoming media from A to B and C", and media-wise that's all you need to do in the script itself to have the C portion do the heavy lifting for you. You still need to take care of SDP and signalling, but you can do that on your own terms. I still have a plan to implement yet another plugin that delegates the logic to a remote node using something RPC-based, but unfortunately I didn't have time for that yet.
If you want low-level video routing functionality without the added layer of room-like logic, i.e. something that handles WebRTC and then tells you "now, here is your incoming video flow, do whatever you want with it", you might want to check out Kurento.
However our WebRTC stack has the minimum of congestion control features (plain REMB, no simulcast), and it doesn't implement SVC or newer toys like insertable streams, nor does it completely abstract you from the grunt work that WebRTC leaves up to the user (like signaling, setting up a TURN server, or having a minimum of understanding about ICE in order to troubleshoot when problems arise).
Yeah: the whole reason I want to use someone's off the shelf solution is because I want to have all the new hard stuff like SVC and insertable streams both done for me and maintained by someone other than me ;P.
Like I said, Janus doesn't do anything by default except for some core RTC stuff and push packets around. You have to implement your own use-case via a plugin. It sounds like the Video Room plugin that's implemented in C is not exactly what you want. You can write your own plugin (maybe based on it) yourself.
If you want to implement E2E via insertable streams then you can start right here:
[0] https://github.com/meetecho/janus-gateway/blob/master/plugin...
[1] https://github.com/mozilla/janus-plugin-sfu
[2] https://hubs.mozilla.com/#/