Hacker Newsnew | past | comments | ask | show | jobs | submit | loevborg's commentslogin

This looks like a useful set of guidelines. I see the most value in reducing the bikeshedding which invariably happens when designing an API. I wonder if anyone is using AEP and can comment on downsides or problems they've encountered.

One thing I've noticed is that the section on batch endpoints is missing batch create/update. Also batch get seems a little strange - in the JSON variant it returns an object with a link for missing entities.


I'm a big Rich Hickey fan. He's a big user of a (to me) peculiar variant of the phrase, "it ends up": a total of 144 times in https://github.com/matthiasn/talk-transcripts

It also struck me as a bit of a sleight of hand - but maybe it's just rhetorical flourish. Or more charitably you could say it's inevitable - in a conference talk of finite length, you can't possibly back up every assertion with detailed evidence. "It turns out" or "it ends up" are then a shorthand way of referring to your own experience.


PS all 17 hits for "it turns out" in the repository are from other speakers.

Literally every interview I've done recently has included the question: "What's your stance on AI coding tools?" And there's clearly a right and wrong answer.

In my case, the question was "how are you using AI tools?" And trying to see whether you're still in the metaphorical stone age of copy-pasting code into chatgpt.com or making use of (at the time modern) agentic workflows. Not sure how good of an idea this is, but at least it was a question that popped up after passing technical interviews. I want to believe the purpose of this question was to gauge whether applicants were keeping up with dev tooling or potentially stagnating.

To be fair, this topic seems to be quite divisive, and seems like something that definitely should be discussed during an interview. Who is right and wrong is one thing, but you likely don't want to be working for a company who has an incompatible take on this topic to you.

Nice write-up, thanks for sharing. How does your hand-vibed python program compare to frameworks like pipecat or livekit agents? Both are also written in python.

I'm sure LiveKit or similar would be best to use in production. I'm sure these libraries handle a lot of edge cases, or at least let you configure things quite well out of the box. Though maybe that argument will become less and less potent over time. The results I got were genuinely impressive, and of course most of the credit goes to the LLM. I think it's worth building this stuff from scratch, just so that you can be sure you understand what you'll actually be running. I now know how every piece works and can configure/tune things more confidently.

FWIW, Typescript is using Strategy 2: https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

I'm a bit confused by the fact that the array starts out typed as `any[]` (e.g. if you hover over the declaration) but then, later on, the type gets refined to `(string | number)[]`. IMO it would be nicer if the declaration already showed the inferred type on hover.


I agree, it's always been unsettling to see any[] on hover, even though it gets typed in the end.

I think one reason might be to allow the type to be refined differently in different code paths. For example:

    function x () {
        let arr = []
        if (Math.random() < 0.5) {
            arr.push(0)
            return arr
        } else {
            arr.push('0')
            return arr
        }
    }
In each branch, arr is typed as number[] and string[], respectively, and x's return type is number[] | string[]. If it decided to retroactively infer the type of arr at declaration, then I'd imagine x's return type would be the less specific (number | string)[].

It depends on your tsconfig. An empty array could be typed as never[], forcing you to annotate it.

I don't believe this is correct. There's no settings that correspond to that AFAIK, and it'd actually be quite bad, because you could access the empty array and then get a `never` object, which you're not supposed to be able to do.

https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

`unknown[]` might be more appropriate as a default, but TypeScript does you one better: with OP's settings, although it's typed as `any[]`, it'll error out if you don't do anything to give it more information because of `noImplicitAny`.


Which setting specifically? Can you repro in the typescript playground?

Yeah that's a painful process, as I know from experience. What do you think is the reason for the gradual shift?


I think when you are new with good ideas, you are judged against average. If you are above average, you are listened to.

As years pass, you are judged against the standard you set, and if you do not keep raising this standard, you start being seen as average, even if you are performing the same when you joined.

I've seen this play out many, many times.

When an incompetent person is hired, even if issues are acknowledged, if they somehow stay, the expectations from them will be set to their level. The feedback will stop as if you complain about same issues or same person's work every time, people will start seeing this as a you problem. Everyone quietly avoids this, so the person stays.

When a competent person is hired, it plays out the same. After 3/5/10 years, you are getting the same recognition and rewards as the incompetent person as long as you both maintain your competency.

However, I've seen (very few) people who consistently raised their own standards and improved their impact and they've climbed quickly.

I've seen people lowering their own standards and they were quickly flagged as under-performers, even if their reduced impact was still above average.


I agree with this summary to a degree. Additional problem arises when you simply cannot raise the standard as you lack political influence to do so. As it is said in the article - sometimes companies are comfortable with status quo, irregardless of the problems, whether they are technical or not. Another issue stems when product, rather than looking at tech as a partner in pursuit of common goal starts to see it as an underling.


While I can't say that I observe that kind of radical shift for myself, one of the reasons I still can see something similar is AI development.

Basically manager asks me something and asks AI something.

I'm not always using so-called "common wisdom". I might decide to use library of framework that AI won't suggest. I might use technology that AI considers too old.

For example I suggested to write small Windows helper program with C, because it needs access to WinAPI; I know C very well; and we need to support old Windows versions back to Vista at least, preferably back to Windows XP. However AI suggest using Rust, because Rust is, well, today's hotness. It doesn't really care that I know very little of Rust, it doesn't really care that I would need to jump through certain hoops to build Rust on old Windows (if it's ever possible).

So in the end I suggest to use something that I can build and I have confidence in. AI suggests something that most internet texts written by passionate developers talk about.

But manager probably have doubts in me, because I'm not world-level trillion-dollar-worth celebrity, I'm just some grumpy old developer, so he might question my expertise using AI.

Maybe he's even right, who knows.


It seems like a quote clear cut case?

You mention the tradeoffs between rust. Including the high level of uncertainty and increased lead time as you need to learn the language.

The manager, now having that information, can insist on using rust, and you get er great opportunity to learn rust. Now being totally off the hook, even if the project fails, as you mentioned the risks.


“Truly I tell you,” he continued, “no prophet is accepted in his hometown."

- Luke 4:24

It's why people often trust consultants over the people inside the organization. It's why people often want to elect new leaders even if the current leaders are doing a decent job.

The baby almost always gets thrown out with the bath water.

https://en.wikipedia.org/wiki/Don't_throw_the_baby_out_with_...


I find this hilarious given that I've experienced it from both viewpoints - 1. consultant implemented their half baked solution that continued to bite us for my tenure and imo was completely unmaintainable; how were they able to convince leadership about their ideas - sometimes it's just snake oil 2. In new place am preaching certain things to people that do listen and seem to want to do it - it makes me a bit uncomfortable and to a degree scary in how easily you can find acolytes. They do validate my suggestions, ask questions and most importantly - think, so I am hopeful that I won't turn out to be a false prophet


I've also played both roles myself at times. I've been the wise consultant. And I've been the Cassandra that nobody would listen to. My wisdom was never as good as presumed when I was the consultant. And my wisdom was far better than was assumed when I as the Cassandra.


The prevalent pattern I can see is making things mundane. Capabilities that you are enabling are no longer something that only you could do, was you expertise there at all? Things running smoothly is something that is granted. Doing your job well becomes unexceptional


This is fascinating. It sounds like you're building "cloud datastructures" based on S3+CAS. What are the benefits, in your view, of doing using S3 instead of, say, dynamo or postgres? Or reaching for NATS/rabbitmq/sqs/kafka. I'd love to hear a bit more about what you're building.


It's just trade-offs. If you have a lot of data, s3 is just the only option for storing it. You don't want to pay for petabytes of storage in Dynamo or Postgres. I also don't want to manage postgres, even RDS - dealing with write loads that S3 handles easily is very annoying, dealing with availability, etc, all is painful. S3 "just works" but you need to build some of the protocol yourself.

If you want consistently really low latency/ can't tolerate a 50ms spike, don't retain tons of data, have <10K/s writes, and need complex indexing that might change over time, Postgres is probably what you want (or some other thing). If you know how your data should be indexed ahead of time, you need to store a massive amount, you care more about throughput than a latency spike here or there, or really a bunch of other use cases probably, S3 is just an insanely powerful primitive.

Insane storage also unlocks new capabilities. Immutable logs unlock "time travel" where you can ask questions like "what did the system look like at this point?" since no information is lost (unless you want to lose it, up to you).

Everything about a system like this comes down to reducing the cost of a GET. Bloom filters are your best friend, metadata is your best friend, prefetching is a reluctant friend, etc.

I'm not sure what I'm building. I had this idea years ago before S3 CAS was a thing and I was building a graph database on S3 with the fundamental primitive being an immutable event log (at the time using CRDTs for merge semantics, but I've abandoned that for now) and then maintaining an external index in Scylla with S3 Select for projections. Years later, I have fun poking at it sometimes and redesigning it. S3 CAS unlocked a lot of ways to completely move the system to S3.


> You can imagine us continuing to iterate here to Step 5, Step 6, ... Step N over time. The tradeoff of each step is complexity, and complexity has to be deserved. This is working exceptionally well currently.

Love this approach


> Failover happens by missing a compare-and-set so there's probably a second of latency to become leader?

Conceptually that makes sense. How complicated is it to implement this failover logic in a safe way? If there are two processes, competing for CAS wins, is there not a risk that both will think they're non-leaders and terminate themselves?


The broker lifecycle is presumably

1. Start

2. Load the queue.json from the object store

3. Receive request(s)

3. Edit in memory JSON with batch data

4. Save data with CAS

5. On failure not due to CAS, recover (or fail)

6. On success, succeed requests and go to 3

7. On failure due to CAS, fail active requests and terminate

The client should have a retry mechanism against the broker (which may include looking up the address again).

From the brokers PoV, it will never fail a CAS until another broker wins a CAS, at which point that other broker is the leader. If it does fail a CAS the client will retry with another broker, which will probably be the leader. The key insight is that the broker reads the file once, it doesn't compete to become leader by re-reading the data and this is OK because of the nature of the data. You could also say that brokers are set up to consider themselves "maybe the leader" until they find out they are not, and losing leadership doesn't lose data.

The mechanism to start brokers is only vaguely discussed, but if a host-unreachable also triggers a new broker there is a neat from-zero scaling property.


This is the hardest part because you can easily end up in a situation like you're describing, or having large portions of clients talking to a server just to have their writes rejected.

Further, this system (as described) scales best when writes are colocated (since it maximizes throughput via buffering). So even just by having a second writer you cut your throughput in ~half if one of them is basically dead.

If you split things up you can just do "merge manifests on conflict" since different writers would be writing to different files and the manifest is just an index, or you can do multiple manifests + compaction. DeltaLake does the latter, so you end up with a bunch of `0000.json`, `0001.json` and to reconstruct the full index you read all of them. You still have conflicts on allocating the json file but that's it, no wasted flushing. And then you can merge as you please. This all gets very complex at this stage I think, compaction becomes the "one writer only" bit, but you can serve reads and writes without compaction.

https://doi.org/10.14778/3415478.3415560

Note that since this paper was published we have gotten S3 CAS.

Alternatively, I guess just do what Kafka does or something like that?


Love this writeup. There's so much interesting stuff you can build on top of Object Storage + compare-and-swap. You learn a lot about distributed systems this way.

I'd love to see a full sample implementation based on s3 + ecs - just to study how it works.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: