Hacker Newsnew | past | comments | ask | show | jobs | submit | londons_explore's commentslogin

I think the only reason for mapping is to be able to block off 'no go' areas (no escaping out the front door!) and to be able to go home to the charger.

For the actual cleaning, random works great.


my previous robot vacuum did not do any mapping, but did always manage to find its way back to the charger. It'd just follow the walls until it saw the chargers IR beacon.

Clever design if you ask me. Doing a lot with a little.


Surely mapping also helps reducing the time it takes to achieve the task?

A robot vacuum isn't time constrained. It literally has all day.

They make noise, and people remote, so that might not be the case.

In addition, more working time equals more wear and tear for parts.


You are right. The original Roomba was discussed on HN 3 months ago:

https://news.ycombinator.com/item?id=46472930


If mass produced, no part of a robot vacuum is expensive. Blower fans are ~$1. Camera is $1. Cheap wifi MCU with a little ML accelerator + 8 Mbytes of ram is $1. Gyro is $1. Drive motors+gearboxes together are $1. AC charger $2. Plastic case $2. Batteries are the most expensive bit (~$3), but you can afford to have a battery life of just 10 mins if you can return to base frequently.

The hard part is the engineering hours to make it all work well. But you can get repaid those as long as you can sell 100 Million units to every nation in the world.


Yeah agreed 100%, might also need to factor in the cost of the charging dock but the overall thesis is still sound.

Do you know any cheap wifi MCU with a little ML accelerator that we can buy off the shelf? The only one we could think of was the Jetson Orin Nano and thats not cheap


I am not an expert but this seems like model distillation could work to get the behavior you need to run on a cheap end-user processor (Raspberry Pi 4/5 class). I chatted with claude opus about your project and had the following advice:

For the compute problem, you don't need a Jetson. The approach you want is knowledge distillation: train a large, expensive teacher model offline on a beefy GPU (cloud instance, your laptop's GPU, whatever), then distill it down into a tiny student network like a MobileNetV3-Small or EfficientNet-Lite. Quantize that student to int8 and export it to TFLite. The resulting model is 2-3 MB and runs at 10-20 FPS on a Raspberry Pi 4/5 with just the CPU - no ML accelerator needed. For even cheaper, an ESP32-S3 with a camera module can run sub-500KB models for simpler tasks. The preprocessing is trivial: resize the camera frame to 224x224, normalize pixel values, feed the tensor to the TFLite interpreter. The CNN learns its own feature extraction internally, so you don't need any classical CV preprocessing. Looking at your observations, I think the deeper issue is what you identified: there's not enough signal in single frames. Your validation loss not converging even after augmentation and ImageNet pretraining confirms this. The fix is exactly what you listed in your future work - feed stacked temporal frames instead of single images. A simple approach is to concatenate 3-4 consecutive grayscale frames into a multi-channel input (e.g., 224x224x4). This gives the network implicit motion, velocity, and approach-rate information without needing to compute optical flow explicitly. It's the same trick DeepMind used in the original Atari DQN paper - a single frame of Pong doesn't tell you which direction the ball is moving either. On the action space: your intuition about STOP being problematic is right. It creates a degenerate attractor - once the model predicts STOP, there's no recovery mechanism. The paper you referenced that only uses STOP at goal-reached is the better design. Also consider that TURN_CW and TURN_CCW have no obvious visual signal in a single frame (which way to turn is a function of where you've been and where you're going, not just what you see right now), which is another reason temporal stacking or adding a small recurrent/memory component would help. Even a simple LSTM or state tuple fed alongside the image could encode "I've been turning left for 3 steps, maybe try something else." For the longer term, consider a hybrid architecture: use the distilled neural net for obstacle detection and free-space classification, but pair it with classical SLAM or even simple odometry-based mapping for path planning and coverage. Pure end-to-end behavior cloning for the full navigation stack is a hard problem - even the commercial robots use learned perception with algorithmic planning. And your data collection would get easier too, because you'd only need to label "what's in front of me" rather than "what should I do," which decouples perception from decision-making and makes each piece easier to train and debug independently.


Wow this is awesome! Thanks a ton for taking the time to think about the project and post this! Yeah I think the way to go is:

1. Improve the model input by stacking frames 2. Then try model distillation to a smaller model


The mel spectrum is the first part of a speech recognition pipeline...

But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?

Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?

An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.


I've done this in my own solution in this space (https://thundergroove.com). I use a realtime beat detection neural network combined with similar frequency spectrum analyses to provide a set of signals that effects can use.

Effects themselves are written in embedded Javascript and can be layered a bit like photoshop. Currently it only supports driving nanoleaf and wled fixtures, though wled gives you a huge range of options. The effect language is fully exposed so you can easily write your own effects against the real-time audio signals.

It isn't open source though, and still needs better onboarding and tutorials. Currently it's completely free, haven't really decided on if I want to bother trying to monetize any of it. If I were to it would probably just be for DMX and maybe midi support. Or maybe just for an ecosystem of portable hardware.


I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.

Btrfs allows migration from ext4 with a rather good rollback strategy...

Post-migration, a complete disk image of the original ext4 disk will exist within the new filesystem, using no additional disk space due to the magic of copy-on-write.

Why isn't the repair process the same? Fix the filesystem to get everything online asap, and leave a complete disk image of the old damaged filesystem so other recovery processes can be tried if necessary.


Starship velocity seems to have really slowed - over a decade in and no commercial revenue yet.

I wonder if it's a lack of talent? Lack of investment?


Doing something 10x as big is 100x as difficult. And the last 10% takes 50% of the work. With that in mind, Starship is right on schedule. Something will be operational by 2030.

Motivation has declined with realization that it's not about Mars, but normal military industrial complex drudgery..

https://ioc.exchange/@muskfiles


The "million people on mars in my lifetime" dream is dead.

Might happen, but certainly not in his lifetime unless we discover an asteroid headed directly towards earth...


The talent that was originally driving SpaceX is gone. And I don’t mean Elon’s brain. I mean the real engineers designing the rockets.

The talent has mostly gone because the US is fiercely politically divided, and musk changing teams from democrats to republican pretty much meant his whole staff were forced to jump ship because he no longer aligned with their values.

Do you have any evidence of this? I don’t think there have been a lot of high profile departures in the past few years.

Every single xAI exec left...

Same with most top Tesla execs since he changed political sides.

That's pretty high turnover for a company with RSU's tying people in seats


We're talking about SpaceX, not Tesla or xAI, which only very recently joined SpaceX.

It's a harder problem.

At much larger scale

It's the cyber truck of space.

It's what happens when Elon jumps into the k-hole and convinces himself that because he owns a company that successfully did a thing, his genius will make those companies do an even better thing. He's wrong. And he can stay wrong for years and decades even.

Starship is too big for orbital payloads, and too heavy to go beyond orbit. Yes and only if it actually achieve target payload capacity, it takes 15 refueling missions to refuel to do anything other than an orbital mission. If it doesn't achieve target payload capacity, it's cooked.


There are probably more running copies of freeRTOS than windows in the world...

RDP in the windows XP days supported all kinds of tricks to work with low bandwidths like doing rendering on the client not the server.

I think most of those tricks have been disabled in modern windows for better security (you don't want some guest user able to feed your not-so-robust awfully complex rendering code some malicious inputs...)


This needs to be augmented with a new bit of contract law which enables a new type of 'subscription' where the terms are set by law.

Those terms would include things like "payments are monthly, service automatically ends when payments end, etc."

As things stand today, plenty of consumers end subscriptions by blocking payment, which practically works, but opens the doors to a scumbag company bulk chasing all those unpaid subscriptions through the courts and getting leins on millions of homes for $150 each and templated court cases.


Integrating tool use into the training process should fix this.

Rather than learn about President Lincoln, the model can learn to look that info up with a search tool and use it to get better answers.

Just like a human does. I don't learn what 76x35 is... I learn that a calculator can give me that answer so I don't need to memorize it.


My guess is the training process is their secret sauce...

Yes, but their training speed is not secret. If their process were fast, they would have said so.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: