Cooling related failure (in Google London DC)

dgl · on July 20, 2022

Oracle also had issues around the same time: https://ocistatus.oraclecloud.com/#/incidents/ocid1.oraclecl...

Obviously the weather is easy to blame, but I wonder if the underlying cause is the same datacenter. It's kind of annoying when clouds use availability zones as such an opaque thing, it's not possible to map a zone from one cloud to another and potentially your failure domains overlap.

Shameless self promotion: I have a map of all the cloud regions (but not going into as much detail as availability zones): https://cloud-regions.bodge.cloud -- the clouds just don't publish the data.

sofixa · on July 20, 2022

For the AZs that's slightly complicated by the fact that at least for AWS, they are randomised (your eu-west-1a isn't necessarily the same as mine), so it might be pretty much impossible to actually check for actual overlap even if you know one DC is used by both Google and Oracle. Spreading out regions seems more appropriate to increase redundancy.

dgl · on July 20, 2022

Amazon does provide a way to map to the real availability zone: https://docs.aws.amazon.com/ram/latest/userguide/working-wit...

It's also not appropriate in all cases to spread out, for example the nearest zones to London are around ~7ms away in mainland Europe.

klohto · on July 20, 2022

The provided info is only relative to AWS. If you wanna map geographical destination across clouds you’re out of luck unless you have reports with precise location available. Generally, AWS already provides certain assurances that AZs are spread out.

formerly_proven · on July 20, 2022

I didn't know this, but it makes perfect sense and randomizing the a/b/c is a tried and true solution - e.g. in the electrical grid, L1/2/3 are rotated for each customer, because people tend to connect more stuff to L1 on average.

Nick87633 · on July 20, 2022

Most people only have single phase power here in the US with +120 and -120 legs separated by 180 degrees angle. The power company steps the voltage down from a single phase on the distribution wires. Circuit breaker panels are set up with alternating legs so that the neutral current is approximately balanced whether you fill up the panel on one side first or from top down.

thunderrabbit · on July 20, 2022

> people tend to connect more stuff to L1 on average.

I just plug into the wall socket and it works.

How could I choose to connect things to L2?

aetherspawn · on July 20, 2022

If your property has 3 phase power, chances are your single phase breakers are just on some random phase (picked by your electrician) ie phase 1.

Because phase 1 is picked so often, they actually rotate the phases at the street ie switch 1 2 3 with 3 1 2 to avoid having ie 300% more load on phase 1 and having the grid just die.

mschuster91 · on July 20, 2022

> If your property has 3 phase power, chances are your single phase breakers are just on some random phase (picked by your electrician) ie phase 1.

That's bollocks or speaks for a completely incompetent electrician. Usually, you have three phase rails coming out of your GFCI that distribute the phases sequentially to each breaker [1] to avoid that scenario, as well as ending up with a tripped main breaker because the customer overloads the phase.

The only case where great care is being taken to avoid random assignment of phases is in event stage technology - you do not want your lighting dimmer packs on the same phase as your amplifiers because dimmer packs inject extreme amounts of EM noise, and you also have to take care to not put overloads on any single phase anywhere. However, it's questionable on how long this will stay relevant, given that most lighting load is moving off to LEDs.

[1] https://www.amazon.de/-/en/Busch-Jaeger-Hage-Phase-KDN380A-3...

lathiat · on July 20, 2022

Guessing you’re from the UK where you seem to have fancy pre configured sub boards. In Australia the standard and 99% of house installs are just hand wired with single wires from breaker to breaker and they’re twisted together and rammed into the bottom of the breakers :)

mschuster91 · on July 20, 2022

Germany, and I've worked on construction sites and in stage tech for a time in my career.

What you describe is... beyond awful and an absurd fire risk.

fennecfoxy · on July 21, 2022

Lmao I moved from NZ to UK and the house wiring here has very often not been touched anywhere from the 60s-80s or so.

They are very slowly modernising here, but they seem to have a cultural resistance to change/modernisation; they're in love with the old days.

Very different to Aus/NZ where we're eager to get the latest stuff/NZ is often used as a test market.

gerdesj · on July 20, 2022

Ours are called red, green and blue which helps avoid all on P1.

rkangel · on July 20, 2022

UK Phases used to be coloured as well. The wiring regs changed to make it much less confusing (and cause fewer issues for colourblind people) so that they're now L1, L2, L3.

The GP is also a bit misleading. Most residential properties in the UK are only fed one phase - either the street only has one phase or the grid connects houses up L1, L2, L3 in sequence. Of course I'm speak about detached and semi-detached houses, in larger groups of houses (e.g. blocks of flats) you usually get 3 phases, and one of the jobs of the electrician is phase balancing.

gerdesj · on July 21, 2022

I'm from Yeovil, Somerset! I have a fair idea about 'leccy here.

rkangel · on July 22, 2022

Then you'd know they're Red, Yellow and Blue, and that colouring went out in 2006:

https://www.newfound-energy.co.uk/electrical-three-phase-wir...

Majestic121 · on July 20, 2022

I wonder if there can still be a statistical discrepancy due to the electrician favorite color.

It seems blue is strongly favored by psychology students for example : https://www.livescience.com/34105-favorite-colors.html

terom · on July 20, 2022

Hopefully not 123 -> 312 but e.g 231.

AFAIK phase order matters for e.g. three-phase motors, they will spin in the wrong direction if wired with the phases in the wrong order.

scatters · on July 20, 2022

312 is also a clockwise direction; the anticlockwise ones are 132 213 321.

falcor84 · on July 20, 2022

I'm not following, aren't 123,231 and 312 all in the same order?

BillinghamJ · on July 20, 2022

Your own mappings are available in the console and on the API. So definitely still possible to tell

angch · on July 20, 2022

You have some of the empty lat, lon data (e.g. AWS GovCloud (US)) interpreted as at (lat,lon) = (0,0), putting them incorrectly on Null Island.

dgl · on July 20, 2022

Thanks, hidden.

TheOtherHobbes · on July 20, 2022

UK infra - physical and electronic - is typically rated to around 30C. So higher temps are going to cause failures in multiple locations.

If yesterday's temps lasted for a week instead of a day or two a lot of essential stuff would stop working.

nsteel · on July 20, 2022

Carrier-grade (i.e. critical) network equipment should be rated for 40C ambient air and then up to 55C for some short length of time (3 days??) to allow for air conditioning failure. This is what we design and test for. Cheap stuff, including Google's software solutions running on 'commodity hardware', won't handle that. You get what you pay for.

jimmySixDOF · on July 20, 2022

I have not seen a lot of carrier equipment in the DC/CO rated to 55C. Optical transport nodes maybe but not the Switches and Routers and servers etc. Temp ratings like that are normal for field systems in passively cooled cabinet shelters or tower sites and Industrial Ethernet and CCTV camera housings etc are even built to handle 65C. The Uptime Tier-1/2/3/4 builds overcapacity redundancy into the cooling facility itself to avoid failures like this and I don't recall seeing anything NEBS related to go beyond that either. This looks like it was a matter of intake air/water to the cooling system going outside of design parameters so even with extra capacity they couldn't meet the t/h set points.

nsteel · on July 20, 2022

I think it very much depends on the equipment. I think we have multiple tiers of what we support at each type of facility, with traditional data centres having less stringent requirements (maybe w.r.t noise/filters rather than temp) compared to the facilities housing big routers where -5/55C is the spec to meet.

You are right, and it's arguably harder for those smaller boxes sitting in cabinets / on telegraph poles, despite being much lower power systems to start with. It might be 50C+ in there before they even turn on! Those things might have a two-stage boot process where they just run their fans for a bit to cool things down before actually booting the main system. It must be a real nightmare for entirely passive stuff, I have no experience with that.

roneythomas6 · on July 20, 2022

Google doesn't run its networking equipment on commodity hardware. That's been the case for over a decade source from Open Network Summit talks. The issue here isn't about network equipment temperature rating, rather whole datacenter losing cooling.

nsteel · on July 20, 2022

I was under the impression that they still did some of their own software defined routing, maybe at lower bandwidths? That could be an out-dated view. I do know they also buy traditional equipment from the big vendors for high-bandwidth stuff. P4 is very cool.

I was trying to point out that high-end hardware should survive a couple of days despite a data centre losing its cooling. it is designed for exactly that situation.

dekhn · on July 20, 2022

a typical google data center has a networking room which often will have large numbers of standard commercial networking devices. That room has extra redundancy, locked off from the cattle, etc.

ben_w · on July 20, 2022

Does the UK even design with the assumption of air conditioning being necessary and present, let alone designing for it failing?

nsteel · on July 20, 2022

Yes. You must be able to control the temperature of a building densely packed full of hot radiators. You might be able to avoid active AC within the Arctic circle but you'd m still want to filter that incoming (very) cold air, at which point maybe you might as well basically have an AC system. And yes, you must ensure you can survive a failure because these systems do fail (normally when you need them most) and then otherwise everything inside cooks.

fulafel · on July 20, 2022

The arctic is warming much faster than other places. See eg this recent article about record 38C temps in siberia: https://www.sciencenews.org/article/siberia-verkhoyansk-reco...

FartyMcFarter · on July 20, 2022

Datacenters always have A/C I thought?

ben_w · on July 20, 2022

Thanks :)

Relestio · on July 20, 2022

Google and other companies do make risk assessments including temperature scenarios.

reaperducer · on July 20, 2022

Cheap stuff, including Google's software solutions running on 'commodity hardware', won't handle that. You get what you pay for.

Sounds like the downside of the "pets versus cattle" methodology is that natural phenomenon will wipe out your entire herd, while the pets survive.

tambre · on July 20, 2022

Why does your map have AWS's eu-north-1 in Estonia? Officially it's said to be in Stockholm and I can find no references nor know of it existing there.

cjg_ · on July 20, 2022

The eu-north-1 AZ’s are in Västerås, Eskilstuna and Katrineholm, all ~100 km from Stockholm and ~40-80 km from each other.

dgl · on July 20, 2022

Strange, I'm not sure where I got that from, will fix.

sneak · on July 20, 2022

You have some locations simply pinned to the central squares of cities when that is definitely not where the facilities are. Maybe include a +/- error range on the provided data?

exikyut · on July 20, 2022

Wow, this map is really cool. I'm (very) idly curious how accurate the reporting for the Sydney region is, because I just looked up the lat/lon, and found myself in the middle of an urban mall environment that I've walked past many times when I've been in the city.

Being able to look up at the buildings there and know there are indeed 5 different clouds somewhere up above my head in that specific location would be really cool. Being able to point at a specific building would be even cooler.

I do of course (sadly) appreciate the flip side of this coin which is one of (many of) the reasons precise data is not published. So I guess I'm just wondering out loud, probably rhetorically :), how I might find out one day. "<-- That building" is enough resolution for me :)

EDIT: A quick Google found baxtel.com (among other websites) has address-level locations for most providers. The buildings are all so boring, haha! (Understandably so though.)

larrybud · on July 20, 2022

It’s pretty far off, at least for azure. It shows a number of azure regions as being located in downtown urban areas, which is typically not the case

threeseed · on July 20, 2022

It is possible to infer the locations if you had a lot of time.

AWS at least in the past use to co-locate with other third party servers in the data centre. And so you if you had such a server you could ping AWS endpoints to triangulate where physically those servers might be.

jhugo · on July 20, 2022

I'm pretty sure a majority of AWS AZs globally are still not in AWS-owned-and-operated facilities.

sneak · on July 20, 2022

At this point I would be quite surprised if they were not majority AWS owned and operated.

jhugo · on July 21, 2022

Of the ones that I know details about in Europe and Asia, not a single one of them are AWS owned and operated. I'm sure the situation is different in the US, and wouldn't be surprised if they own and operate all of their US DCs.

jeffbee · on July 20, 2022

I also think it would be nice if Google prominently stated which cloud regions are hosted in their real datacenters and which are in third party facilities. It's not just that you run the risk of some off-brand piece of junk overheating, but also because they so loudly trumpet their PUE and carbon neutrality, but then resell cloud capacity in other people's facilities where you have no idea what the carbon impact is.

dgl · on July 20, 2022

They do tell you the carbon impact right on the region selector: https://cloud.google.com/blog/topics/sustainability/pick-the... and also https://cloud.withgoogle.com/region-picker/ is quite nice.

mike_d · on July 20, 2022

Google doesn't really have a concept of datacenters internally. In simplified terms everything runs on a borg cell, which usually but not always exists in a single building. Within a region one of your GCP services could be running on one cell, with another service running on another in a different building. If a failure happens or maintenance needs to be done, a cell could be drained to another one in the same region.

dekhn · on July 20, 2022

Google has "data centers" internally. People argue about the various definitions but a DC is usually one or a few buildings containing several clusters, each of which is fairly "close" to all its components.

jeffbee · on July 20, 2022

Really, Google doesn't have an entire org called "dcops"?

mike_d · on July 21, 2022

Not in a way that is relevant to a cloud customer asking what building their jobs are running in.

lupire · on July 20, 2022

That's not how carbon impact works. You don't buy "clean" products. You buy products from vendors who have an overall average mix of production that meats a "clean" threshold, and/or buys credits from someone else who exceeds threshold.

sofixa · on July 20, 2022

In Google's (and for instance Scaleway too), the point is that some of their DCs use renewable energy only, thus are "clean". If however you use the "wrong" Google region which is hosted in Equinix's DCs, which happen to be powered by coal, it's not even remotely close. However Google make the distinction and you can easily check which region is "clean" and which isn't.

ggm · on July 20, 2022

Electricity generation is also affected by weather in two ways. Firstly, by the droop in the wires which are built to weather tolerances much as railways are, and if you get outside the tolerances then the risks of problems increase.

Secondly, within a limit of my understanding, the efficiency of a turbine system hot-to-cold is affected by the climate it operates in. The ambient temperature, humidity and pressure affects the final stage.

Both things might mean that in times of high heat and humidity, the electricity supply system is least able to cope with increased demands for cooling systems, which will themselves draw more power fighting the weather.

Separately the HVAC systems for the DCs will have been designed for a specific climate, with margins. I guess the sustained change in night and daytime temps and humidity has hurt their efficiency too, in this window of time. They'll be fine when the weather system passes through, as will the supply network.

Met Office says both overnight and daytime peak temps for the inland south have been records. Thats where a lot of ICT infrastructure is.

tjmc · on July 20, 2022

> Separately the HVAC systems for the DCs will have been designed for a specific climate, with margins.

Exactly right. For a Tier IV data centre that margin is ASHRAE N=20. For London City until recently that was only 34.5C (dry bulb).

bschne · on July 20, 2022

Interesting rabbit hole! Does that mean the margin is defined to contain once-in-twenty-years extremes?

tjmc · on July 20, 2022

That’s right. However these estimates are now being regularly exceeded. Going forward, it would probably be wiser to do what the mining industry does here in Australia - still design to a reasonable extreme temp, but also specify that the equipment must run at a much higher temperature at a derated capacity. Here that’s 50C ambient.

fulafel · on July 20, 2022

Even disregarding climate change, a 5% chance each year that your equipment will fail to handle the weather seems lacking for a high-availability facility. (You'll factor in many other this type of figures together, so your total risk budget would be higher still).

kumarvvr · on July 20, 2022

> Secondly, within a limit of my understanding, the efficiency of a turbine system hot-to-cold is affected by the climate it operates in. The ambient temperature, humidity and pressure affects the final stage.

Usually this is only partially true. Systems have enough margin to account for such conditions (cooling pumps have to pump more water to compensate for higher temps). Also to note that the output of a turbine can be kept constant. Only the efficiency will come down slightly.

ggm · on July 20, 2022

I wasn't sure about the slightly. What I read elsewhere indicated that for safety reasons, gas and coal fired steam turbines back off load when the ambient heat rises. I had assumed it was the output cool/evaporate efficiency but it might be something else.

usrusr · on July 20, 2022

Sounds like yet another argument for investing in that energy storage low-hanging fruit of thermal reservoirs. Datacenter cooling does not need a huge delta from ambient, the advantage of pumping against nighttime ambient over pumping against daytime ambient would be huge. Datacenter cooling should be dispatchable demand.

bob1029 · on July 20, 2022

I feel like we could do this for homes & businesses too...

How big of a thermal battery (assume we are freezing water) would you need to store the cooling capacity of a 5 ton HVAC system running for 1 summer day?

Perhaps we could design a new generation of heat pumps with this approach in mind.

nucleardog · on July 20, 2022

If you (as a person) can tolerate cooler temperatures well, you can already do this to some extent using the house itself as a thermal battery.

Run your A/C overnight when power is cheaper and the air conditioner is more efficient (and rolling blackouts are less likely...). Cool your house to say 4-8F cooler than you'd normally keep it (close off some vents in your bedroom if you need to, though personally I prefer sleeping in the cold). If your house is well insulated you may be able to make it through a significant portion of the next day without the air conditioner needing to cut in again.

In the grand scheme of things wood doesn't have a lot of thermal mass, but if there's a lot of it it still adds up. Even as the air begins to warm, the floor and walls still feel noticeably cooler.

usrusr · on July 20, 2022

Key difference: homes and small businesses buy electricity at a fixed rate, large datacenters are big enough for dynamic prices. In a market that doesn't completely abandon fixed rate prices this isn't really possible for smaller customers, because offering dynamic rates requires some trust that the consumer doesn't abuse the fixed rate by dynamically selecting whatever is cheaper at the time. This trust can only be established for consumers large enough for people checking the numbers and perhaps even the occasional audit.

bob1029 · on July 20, 2022

This is the case now, but it wasn't always.

In Texas, we used to have residential plans where you payed wholesale rates for electricity (e.g. Griddy). I used to be on one of these plans and would constantly fantasize about being able to accumulate HVAC capacity at night when you would sometimes be paid to consume electricity.

usrusr · on July 20, 2022

Did the "least cost router" scenario ever come up? I'd imagine that a certain kind of people would feel very much entitled to e.g. share consumption with a neighbor, switching both houses to a shared meter on either variable or fixed rate based on time of day, and power retailers frantically balancing between chasing them looking for suspicious patterns and pretending that it never happened to avoid inspiring others.

(I do believe that switching to variable rates would be a good idea, that in even the poorest of the poor would the grand scheme of things be better off if they had to occasionally disconnect, less bad than if the alternative was the entire grid occasionally browning out, including services that might be more important to them than their home consumption. And peak prices would even be as high as they are now, if they weren't propped up by an army of fixed rate consumers who don't show any have of demand flexibility almost be definition)

bob1029 · on July 20, 2022

> I do believe that switching to variable rates would be a good idea

Completely agree. Any time we try to control/subsidize costs we introduce instabilities and bad incentives into the marketplace.

Texas grid was about as close as you could get to reality for a while. I would much prefer a situation where everyone is impacted by the cost in the same way. At grid scale, nothing can be stored, so financial arbitrage is effectively a scam.

baxtr · on July 20, 2022

What does that mean? Could you expand on that?

corint · on July 20, 2022

At a guess, it'd mean building large slabs of concrete, or tanks of water, or other cheap item with large thermal inertia. Ideally you'd cool these down overnight, or at a period with cheap power, then under regular external temperatures you'd use these to reduce the need for cooling at periods of high power cost. Or, when it's very warm outside, these would work in tandem with the cooling systems to cope with a hot day, with the hope being that overnight you'd catch up, without needing to interrupt workloads.

ThePadawan · on July 20, 2022

Thirdly, there's also drought related issues - here in Switzerland, a nuclear power station had to reduce its capacity because it could not take in enough water from the local river without endangering the fish stock therein.

Scoundreller · on July 20, 2022

We’re also just under a month from the summer solstice (in N. Hemisphere: June 21). London is at 51 degrees North, so the day is pretty long at 16 hours right now.

IE: less of that relieving overnight low.

mnd999 · on July 20, 2022

Fortunately it rained last night and the temperature dropped pretty quickly.

asdff · on July 20, 2022

So many governments, agencies, orgs, etc, are getting caught with their pants down with changing climate bringing more extreme weather conditions, because capitalism favors maximizing profit and considering tolerances too wide as an expense.

It's cathartic seeing the capitalists lack of foresight burn them time again, but also sad, knowing the response to this will be just buying slightly better HVAC and other systems, which will be globally sourced no doubt and burn carbon to produce and probably damage some ecosystems in the resource gathering process. Then when temperatures get even hotter its time to buy new systems etc.

Once again capitalism favors this outcome, because it allows multiple sales opportunities over time with the potential to increase monetization at every one, versus buying one system that can handle wider temperature swings and being done with it potentially forever if its made to be repairable/upgradeable. Even if you invented such a thing and sold it out of your garage, GE or whoever your competitors would be would ensure you have difficulty sourcing necessary parts or coming to market or getting word out to your potential customer base. Investors you need to afford to grow under these conditions would expect you to play the game and start cutting costs and engage in the rat race, or be replaced.

It's going to be hard to work our way out of climate change with the degenerate, consumptive nature of capitalism mining the planet while sucking resources from the wider economy where this salvation is to be invented, to the top that will always be able to afford to hole up and hide away from whatever disasters befall the working people.

qeternity · on July 20, 2022

You can blame capitalism all you want, but the issue is deeper. This is a well studied market failure called tragedy of the commons. People and markets have not been paying the true cost of their consumption, which has resulted in overconsumption of fossil fuels.

Your whole premise ignores the billions of people who happily use electricity and fuel everyday and happily ignore the consequences. These same people then (literally) riot when prices go up by a relatively small amount.

People are greedy. They want more for less. We can disagree about the best form of economic association, but it’s disingenuous to ignore the reality that climate change is driven by the masses.

mschuster91 · on July 20, 2022

> but it’s disingenuous to ignore the reality that climate change is driven by the masses.

And the answer to a "tragedy of the commons" scenario is regulation: either intervene directly by banning undesired behaviors (e.g. ban flights on routes that are served by rail, such as France does) or tax it to make undesired behaviors unprofitable.

The problem with the latter is that you will always have people rich enough to simply pay whatever tax is asked and social resentment will grow as a result ("why should the lower classes bear the load of climate change and the rich enjoy three-minute flights to save a 40 minute road trip [1]?").

[1] https://www.buzzfeednews.com/article/stephaniesoteriou/kylie...

nicoburns · on July 20, 2022

Market failures are a feature of free market capitalism! Most of the solutions to market failures are some kind of regulation, restriction or tax which tend to be opposed by proponents of “pure capitalism”

toomanybeersies · on July 20, 2022

> because capitalism favors maximizing profit and considering tolerances too wide as an expense.

Lack of foresight isn't exclusive to capitalism. Would socialist data centres be built to handle temperatures several degrees above the highest ever recorded temperature?

Even in a socialist economy, building systems with an excessively high tolerances would be seen as a poor allocation of resources.

It could be equally argued that capitalism favours private companies like Google ensuring their DCs are as fault tolerant as possible, to ensure they have a competitive advantage. There's also plenty of cases where companies sell unnecessary and excessive goods and services to maximise their own profit.

> and being done with it potentially forever if its made to be repairable/upgradeable

DC cooling systems are repairable and upgradable. They're a far cry from a residential split system AC unit.

gm3dmo · on July 20, 2022

https://en.m.wikipedia.org/wiki/OGAS The book "How Not to Network a Nation: The Uneasy History of the Soviet Internet" linked to in that article discusses many of these topics.

mschuster91 · on July 20, 2022

> Even in a socialist economy, building systems with an excessively high tolerances would be seen as a poor allocation of resources.

Huh? GDR and Soviet made machinery, vehicles, even glassware for pubs [1] was made with sometimes ridiculous margins and tolerances to ensure longevity and easy repairability and was famous for it. Even pre-reunification Western made products such as Bosch, AEG, Hilti or Siemens were famous for building stuff that sometimes outlasted the owners (such as my 80s Hilti drill, which served three generations of my family and likely will still work when I have children of my own).

[1] https://de.wikipedia.org/wiki/Superfest

happyopossum · on July 20, 2022

> Soviet made machinery, vehicles, even glassware for pubs [1] was made with sometimes ridiculous margins and tolerances to ensure longevity

Bollocks. Soviet machinery is simple and crude, typically many decades behind their western counterparts.

mschuster91 · on July 20, 2022

Yes, but that was neither my point nor the point of the person I replied to.

"Keep it simple and stupid" is a tried and true engineering principle. The higher tolerances a design uses, the easier it is to manufacture and to repair, and the less likely it is to fail from wear and tear in the first place.

A socialist economy, where waiting for a new car could take anywhere from five to twenty years (!), definitely has to prioritize simpler, more (fault-)tolerant designs even if that takes a bit more resources to account for said tolerance. For example, a modern car heavily using fibreglass and plastic in the chassis may weigh a good load less than your average Lada or whatever that was made out of metal, but it could easily be repaired by your average farmer using tools they had in their shed.

Random side fact, this is a major cause why farmers are paying record prices for tractors nearly half a century old [1]. Or why the Russians are currently using so much ages-old stuff in the Ukraine war - modern tanks require a lot of logistics for repair and spare parts, but these old Russian clunkers? You can piss into the tank and it will probably drive on it. (Yes, I know, the Russians haven't been maintaining their tanks properly, which is a major factor in why they were not able to take Kyiw)

[1] https://www.startribune.com/for-tech-weary-midwest-farmers-4...

g8oz · on July 20, 2022

Simple and crude is not necessarily in opposition to robustness.

ndsipa_pomu · on July 20, 2022

Modern capitalism does have a built-in short-term view though. Companies can try planning ahead for decades of use, but if someone comes in and designs for a shorter term, they'll be a lower bidder and thus more likely to get the job. By the time that the short-comings are discovered, the lower bidder is long gone. The stock market also provides an incentive to just aim for short term gains at the expense of the long term.

Socialism can prioritise quick fixes or long term solutions, but it depends on the wisdom of the people involved as to whether they'll build in enough capacity to allow for future climate changes.

icare_1er · on July 20, 2022

Like most people making that commentary about "short-termism", you'd rather buy cheap clothes made in China than pay double for something local that lasts 5 times longer.

ndsipa_pomu · on July 20, 2022

You have nothing on which to base that observation and as such, you're not even wrong as your comment is meaningless

icare_1er · on July 20, 2022

How are non-capitalists countries dealing with extreme temperatures ?

asdff · on July 21, 2022

I'd argue there are no non-capitalist countries

icare_1er · on July 20, 2022

No Google data center issues, or melting railways in non-capitalists countries, because Google data centers or railways are the result of capitalist economies.

dozzman · on July 20, 2022

My main confusion with this downtime is that neither their Cloud SQL nor Redis offerings managed to complete fail over despite my org having high availability enabled on both of those plans. Is there something I'm missing here? I would've suspected that failover would kick in for high availability instances and cause minimal downtime however its been almost 24 hours and our Cloud SQL instance is still stuck on attempting to fail over, not to mention that it comes at a premium. Wondering if anyone can help me understand what I'm missing or if the failover behaviour is not working. We've made our own workarounds in the mean time.

Relevant docs I've checked for behaviour:

https://cloud.google.com/memorystore/docs/redis/high-availab...

https://cloud.google.com/sql/docs/mysql/high-availability

EDIT: Have found out from our ops team that the SQL instance recovered around 3am so it was down for approximately 9 hours -- which is still totally useless for something deemed HA.

Tostino · on July 20, 2022

Seems like they need to start issuing some refunds/credits.

cshou · on July 20, 2022

IIUC, HA setting only failover across *zones in the same region*. If the whole region is down, HA won’t be helpful. In this case, the London data center is the region.

giggsey · on July 20, 2022

The region wasn't down though. Only one zone was down?

From earlier in the incident history:

> Cloud SQL:

> Impact/Diagnosis: Non-HA instances backed by europe-west2-a are hard-down in europe-west2-a. HA instances that were in europe-west2-a when the incident started, are down with stuck failovers.

nogbit · on July 20, 2022

That’s expected, Cloud SQL is not multi region. Clouds define HA as being multizonal, which you were.

Try Spanner if one region is not enough.

dozzman · on July 20, 2022

The whole region wasn’t down though, only zone europe-west2-a so AFAICT high availability should’ve covered this particular instance of outage.

onphonenow · on July 20, 2022

That’s pretty terrible!

jvolkman · on July 20, 2022

This incident page really needs a diff mode or just less boilerplate. It's really difficult to tell at a glance what's changed with each update.

scary-size · on July 20, 2022

Add to that all timestamps being US/Pacific for an outage in Europe...

jlmorton · on July 20, 2022

Came here to say this. Each update almost subtracts value. Updates should only contain information that has changed.

Very frustrating when you're anxiously awaiting new information, and you have to do a word-by-word mental diff.

_bhrz · on July 20, 2022

I really like this. I think a change to the input questions could solve this — clearer, more specific questions like "What's changed?", "Is it worse/better?".

davidkuennen · on July 20, 2022

It's funny how HN always complains about every status page.

I think Google Cloud has one of the only status pages that is always up to date and very forthcoming in giving as much detail as possible. Personally I couldn't ask for more.

the_sleaze9 · on July 20, 2022

It's casual feedback, sure. But it is (by and large) specific and actionable.

I think "you couldn't ask for more" is disingenuous at best, and an actively harmful outlook at worst.

mloughran · on July 20, 2022

Totally agree. I literally piped updates to diff last night to see whether GCP were actually making progress fixing anything or just spamming the same update message. Would make a useful twitter bot…

POPOSYS · on July 20, 2022

If you are really that interested in the content of a status page of one cloud service provider you should redesign your infra.

The value proposition of "cloud" for is not "I can haz cloud of big corp as my own" but "I can haz many cloudz to make resilient infra!".

If you rely on one cloud provider you are doing it wrong.

mloughran · on July 20, 2022

There was a concurrent incident affecting a large number of GCP services: https://status.cloud.google.com/incidents/fmEL9i2fArADKawkZA....

It would have been rather useful had GCP linked these (presumably linked) incidents. The first mention of cooling in the concurrent incident was at 14:39 PDT (over 5 hours after first status update, and 4 hours after the cooling incident was created)... This is what was said:

> Description: A cooling related failure in one of our buildings that hosts zone europe-west2-a for region europe-west2 is affecting multiple Cloud services.

avidphantasm · on July 20, 2022

I originally read this title as “Cooking related failure”. It reminded me of when I worked for a US research university in the late 90s and one if the data center operators accidentally heated up a hot dog wrapped in aluminum foil in a microwave. A microwave that was stupidly plugged into a circuit share by some of the network equipment and servers. The microwave allegedly blew up or something, causing a multi-hour outage of the campus network.

jonatron · on July 20, 2022

I had a portable 14000 BTU unit running flat out, and it couldn't keep up. 40c is hard to deal with here.

sumanthvepa · on July 20, 2022

I’m a little confused. 40C is fairly common in India where I live. My air conditioning works fine here even in relatively high humidity. Is there something special about how a/c units work in the UK? Are they rated for lower ambient temperature or something?

jonatron · on July 20, 2022

It's less about the AC units, and more about the buildings. In places like Spain and Portugal, white painted outside walls, shading, and shutters on the outside of windows all help keep the heat out. My house has none of that, it just absorbs most of the heat from the sun.

sumanthvepa · on July 20, 2022

Ah! That explains it. I'm sure if my house were in the UK, I would freeze to death in winter.

ajdegol · on July 20, 2022

UK houses are totally fucking useless for cold weather too.

zarzavat · on July 20, 2022

Maybe the new builds which are just brick skins, but there's plenty of older houses that are built like tanks (I live in one).

pm215 · on July 20, 2022

There are also plenty of older houses that are built very cheaply and not much good in heat or cold either (eg single-brick-skin Victorian terraces)...

xdfgh1112 · on July 20, 2022

OP probably has a small AC unit with a pipe that you send out of the window. We don't have proper AC over here in most houses, because it's normally only hot a few weeks out of the whole year.

tjoff · on July 20, 2022

Portable units have terrible efficiency. Unfortunately that is often the only option in apartments.

johnklos · on July 20, 2022

I'm a huge fan of wondering openly whether what people commonly do is best, or if it's just best for some people and everyone else does it because they haven't really thought about it.

For instance, the idea that an expensive, thousand plus dollar rackmount server is only able to run in a special place where the temperature is just right and might fail if a fan fails or the temperature is a wee bit higher than usual is utter bollocks.

I build my own rackmount servers that can run at 100º temperatures, even with fan failures. I know this because that's how I test them. I have the OS aggressively throttle on the most egregious failures, but fan failure is much less common in general when you're using 80mm Noctua fans instead of 40mm fans that have to run at many thousands of RPM to keep their zones cool.

So maybe people need to rethink the idea that datacenters have to be kept at 70º or below, and instead should insist on better thought out hardware.

pclmulqdq · on July 20, 2022

Google datacenters run hot, as do many other cloud providers. They are happy to use high temperatures in their DCs to improve efficiency. The problem is that they have also removed the equipment to handle super-hot days for "efficiency."

knorker · on July 20, 2022

Your servers can run at 100℃ ambient temperature? Oh, I guess you mean F?

Yeah, servers can run just fine at 40℃. Well… unless they fail because of it. :-) That is: The ones that don't fail work just fine.

If you have a DC with 10'000 of your servers, and maybe 30'000 hard drives. What percentage of them will fail on any given day at 25℃ vs 40℃?

But it's not just your servers. Can your AC equipment/evaporators work at 40℃? And if your ACs start failing you could be looking at a cascading failure where it's actually more like 60-70℃, or just plain "a fire", in your DC.

Can your generators work?

And the answer also isn't "every component in my DC must be milspec extended temperature range". It's actually fine to build a DC in Iceland that's not specced for outside temperature of 50℃. In fact it would be a ridiculous waste to do so.

Google of course measures this. E.g. https://www.techrepublic.com/article/google-research-tempera...

But do keep in mind that 40℃ outside may mean 50℃ inside. Or indeed just 60℃ hotspots on the DC floor.

Hell, your network cables may not even be rated for 60℃. They usually aren't.

Your server may be fine with 40℃ inlet, but going out the air may disconnect it due to melting the network cable.

sandGorgon · on July 20, 2022

I would like to present AWS and Google Cloud Mumbai. heat wave and covid wave simultaneously! 40 degrees centigrade is warm here.

knorker · on July 20, 2022

Everywhere in the world you design for what the local conditions are.

I bet Mumbai doesn't mandate winter tyres in winter, right? Sweden does. If suddenly one winter you see -10℃ in Mumbai, would you appreciate Swedes mocking you for not being able to drive in a car not designed for it, with tyres not designed for it, on roads not designed for it, etc…?

Nothing in the UK is designed for 40℃. Buildings, the type of steel made for train tracks, ventilation in tube tunnels, the asphalt, the walls in the building, the windows (no double glazing in Mumbai, I assume?). I would expect everything in Mumbai is designed to handle high temperatures. But not cold.

zinekeller · on July 20, 2022

And I would actually expect that it is equipped to handle 40-degree outside temperatures year-long (same in zones in southwestern US - I meant 105-degree heat because freedom units!) London didn't historically experience these kind of heat though, only peaking to 37-degree in an hour.

xxs · on July 20, 2022

who the heck measures electronics temps/ambient in freedom units? All the datasheets are exclusively in C. [105F is precisely 40C, though]

pojzon · on July 20, 2022

Im curious whether all of those companies migrating to „the cloud” are thinking about those issues. It will be only more and more impactful also with power outages on the horizon.

Company I work now in completely does not care at least based on me rising those concerns to mng.

LAC-Tech · on July 20, 2022

Unfortunate I guess, but I still remember having to pull an overnighter at work because there was a leak in the sever room :)

benjaminwootton · on July 20, 2022

Cooling seems to be one of the first things to creak in hot weather. Our fridges at home were struggling, and some of the local supermarkets had fridges out.

Maybe that’s an obvious observation but I would have expected they had a little more operating range right at the point you need them.

c_o_n_v_e_x · on July 20, 2022

Ex HVAC tech here. Also was on a commissioning team for a Google DC in SE Asia.

Heat waves indeed kill equipment. The system is attempting to deal with a higher heat load on the home/conditioned space, while also having to operate with high(er) ambient temperatures.

High ambient temps -> High heat load -> warm(er) return refrigerant temps/high(er) refrigerant pressures -> less cooling for compressor / higher mechanical load on compressor -> Elevated power consumption as compressor is working harder -> higher power/heat levels stress electrical insulation & components.

The system struggles to keep up as the duty cycle is elevated as well. Putting a sprinkler next to the condensor (outside) unit is a hack.

FartyMcFarter · on July 20, 2022

> Putting a sprinkler next to the condensor (outside) unit is a hack.

However, water companies have been telling people to reduce water usage during the heat wave since demand is higher than usual, so this may not be a great idea.

londons_explore · on July 20, 2022

Worth noting that most home fridges have no kind of indication that they can't keep up.

Your fridge, rather than being the 5C it should be to keep your meat safe to eat, might have been up at 12C. You wouldn't be aware (it still feels cold), but you'd end up eating possibly dangerous food.

I really wish fridges had an alarm in that case (ie. The fridge has an indicator saying 'too hot. Food is now unsafe to eat').

emrvb · on July 20, 2022

That alarm is available on the more decent models.

There are also stickers for inside your fridge that can indicate the temperature. There is also a variant for specific temperatures, like 0, 5 or 7 degrees, that colorize if the temperature has risen, giving a (non-reversible) indication your fridge has been too warm.

Edit:

Did a quick search for you: https://www.tiptemp.com/Products/Rising-Time-Temperature-Ind...

oarsinsync · on July 20, 2022

You can self solve this with a fridge thermometer, e.g. https://www.amazon.co.uk/Thermometer-Refrigerator-INRIGOROUS... or if you’re into that smart home biz, I also have an Aqara zigbee temperature and humidity sensor for automated alarms.

EDIT: sibling comments about irreversible temperature monitors are also a great idea I hadn’t thought of! Time to buy some of those too

benjaminwootton · on July 20, 2022

Ours was making strange noise, and we noticed things were wet when we took them out which indicated it was struggling to keep temperature. Agree some kind of alarm or indicator is probably a good idea if its not keeping up!

londons_explore · on July 20, 2022

Wet things inside is actually an indication that the door isn't properly closed and sealing.

monkeydust · on July 20, 2022

From a business point of view, if your looking to move your infrastructure to cloud do you now should you be factoring in global warming?

If so, This will only play into the hands of those setting up data centres in the far north of the northern hemisphere (e.g. Iceland).

danpalmer · on July 20, 2022

I don't think this really has anything to do with the cloud.

On-premise hardware still needs cooling, and is arguably harder to cool as there are fewer economies of scale on cooling infrastructure. Dedicated "bare-metal" machines are just in regular data centres so no difference to the cloud there.

I think data centre locations will still be chosen on two factors: distance to customers, and cost of energy. It's just that operators will be looking for cheap energy. Iceland is good because they have a lot of geothermal energy, not because it's cold.

POPOSYS · on July 20, 2022

This reads like:

"Oh, tanks by that foreign army are rolling into our cities - should we start thinking about a defense system?"

Global warming will affect every aspect not only of your business, but also of your life, maybe a little bit later when you are rich and can afford to live in a self-created bubble, but it will.

Yes, you should think about it. Hard. Now.

the_sleaze9 · on July 20, 2022

Very much - YES.

clhodapp · on July 20, 2022

GCP sure does seem to have a lot of outages that span across multiple availability zones (and occasionally across multiple regions). It sure does seem like there is a disconnect between the expectations of isolation that they set and what they are able to deliver.

It's also interesting that the status page's attempt to spin the scope of impact actually makes it seem worse that it was full-region outage (they said, "There is a cooling related failure in one of our buildings that hosts a portion of capacity for zone europe-west2-a for region europe-west2 ...")

johndfsgdgdfg · on July 20, 2022

The amount of outages happening on GCP is mind boggling. I don't know how do people trust their business on Google. It's an ad company, not an infrastructure company. If you trust an ad company with your business, I guess it's on you.

internetting3 · on July 21, 2022

Great point, let's trust a retail company instead.

ccbccccbbcccbb · on July 20, 2022

Rest assured, if the agenda ever u-turns into "global cooling", google will follow with "heating related failure" reports.

lloydatkinson · on July 20, 2022

What?