Hackers Read It

pseudolus

19 hr. ago

Trillions spent and big software projects are still failing

https://spectrum.ieee.org/it-management-software-failures

406

353

rossdavidh

12 hr. ago

It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features. If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level. There is no software development process which reliably produces software that works at scale without doing it small, and medium sized, first, and fixing what goes wrong before you go big.

shagie

10 hr. ago

> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.

At a large box retail chain (15 states, ~300 stores) I worked on a project to replace the POS system.

The original plan had us getting everything working (Ha!) and then deploying it out to stores and then ending up with the two oddball "stores". The company cafeteria and surplus store were technically stores in that they had all the same setup and processes but were odd.

When the team that I was on was brought into this project, we flipped that around and first deployed to those two several months ahead of the schedule to deploy to the regular stores.

In particular, the surplus store had a few dozen transactions a day. If anything broke, you could do reconciliation by hand. The cafeteria had single register transaction volume that surpassed a surplus store on most any other day. Furthermore, all of its transactions were payroll deductions (swipe your badge rather than credit card or cash). This meant that if anything went wrong there we weren't in trouble with PCI and could debit and credit accounts.

Ultimately, we made our deadline to get things out to stores. We did have one nasty bug that showed up in late October (or was it early November?) with repackaging counts (if a box of 6 was $24 and if purchased as a single item it was $4.50 ... but if you bought 6 single items it was "repackaged" to cost $24 rather than $27) which interacted with a BOGO sale. That bug resulted in absurd receipts with sales and discounts (the receipt showed you spent $10,000 but were discounted $9,976 ... and then the GMs got alerts that the store was not able to make payroll because of a $9,976 discount ... one of the devs pulled an all nighter to fix that one and it got pushed to the stores ).

I shudder to think about what would have happened if we had tried to push the POS system out to customer facing stores where the performance issues in the cafeteria where worked out first or if we had to reconcile transactions to hunt down incorrect tax calculations.

solatic

11 hr. ago

That's what works for products, not software systems. Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts. Eventually you want to rewrite to deal with all the technical debt, but nobody has enough confidence to say what is in the codebase that's important to Product and what isn't, so everybody is afraid and frozen.

Scale is separately a Product and Engineering question. You are correct that you cannot scale a Product to delight many users without it first delighting a small group of users. But there are plenty of scaled Engineering systems that were designed from the beginning to reach massive scale. WhatsApp is probably the canonical example of something that was a rather simple Product with very highly scaled Engineering and it's how they were able to grow so much with such a small team.

Jean-Papoulos

26 min. ago

>Global IT spending has more than tripled in constant 2025 dollars since 2005, from US $1.7 trillion to $5.6 trillion, and continues to rise. Despite additional spending, software success rates have not markedly improved in the past two decades.

Okay but how much more software is used ? If IT spending has tripled since 2005 but we use 10x more software I'd say the trend is good.

jillesvangurp

2 hr. ago

Most of the examples here are big government IT projects. But it's unfair to single out software projects here. There are a lot of big government projects that fail or face long and expensive delays. A lot of public sector spending is like that. In fact, you'd be hard pressed to find examples where everything worked on time and on budget.

Mostly the issues are non technical and grounded in a lack of accountability and being too big to fail. A lot of these failures are failing top down. Unrealistic expectations, hand wavy leadership, and then that gets translated into action. Once these big projects get going and are burning big budgets and it's obvious that they aren't working, people get very creative at finding ways to tap into these budgets.

Here in Germany, the airport in Berlin was opened only a few years ago after being stuck in limbo a decade after it was supposed to open and the opening was cancelled only 2 weeks before it was supposed to happen. It was hilarious, they had signs all over town announcing how they were going to shut down the highway so the interior of the old airport could be transported to the new one. I kid you not. They were going to move all the check-in counters and other stuff over and then bang on it for a day or two and then open the airport. Politicians, project leadership, etc. kept insisting it was all fine right up until the moment they could not possibly ignore the fact that there was lots wrong with the airport and that it wasn't going to open. It then took a decade to fix all that. There's a railway station in Stuttgart that is at this point very late in opening. Nuclear plant projects tend to be very late and over budget too.

Government IT projects aren't that different than these. It's a very similar dynamic. Big budgets, decision making is highly political, a lack of accountability, lots of top down pretending it's going to be fine, big budgets and companies looking to tap into those, and a lot of wishful thinking. These are all common ingredients in big project failures.

The software methodology is the least of the challenges these projects face.

mcny

1 hr. ago

It is not just government. Private companies also have the same problem.

One reason why aws got so big is because it took months to get infrastructure to provision a virtual machine.

BirAdam

15 hr. ago

I study and write quite a bit of tech history. IMHO from what I've learned over the last few years of this hobby, the primary issue is quite simple. While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning. Typically, software folks build new and every generation of software developers must relearn the same problems.

malfist

14 hr. ago

I work at $FANG, every one of our org's big projects go off the rails at the end of the project and there's always a mad rush at the end to push developers to solve all the failures of project management in their off hours before the arbitrary deadline arrives.

After every single project, the org comes together to do a retrospective and ask "What can devs do differently next time to keep this from happening again". People leading the project take no action items, management doesn't hold themselves accountable at all, nor product for late changing requirements. And so, the cycle repeats next time.

I led and effort one time, after a big bug made it to production after one of those crunches that painted the picture of the root cause being a huge complicated project being handed off to offshore junior devs with no supervision, and then the junior devs managing it being completely switched twice in the 8 month project with no handover, nor introspection by leadership. My manager's manager killed the document and wouldn't allow publication until I removed any action items that would constrain management.

And thus, the cycle continues to repeat, balanced on the backs of developers.

bane

14 hr. ago

I've also considered a side-effect of this. Each generation of software engineers learns to operate on top of the stack of tech that came before them. This becomes their new operating floor. The generations before, when faced with a problem, would have generally achieved a solution "lower" down in the stack (or at their present baseline). But the generations today and in the future will seek to solve the problems they face on top of that base floor because they simply don't understand it.

This leads to higher and higher towers of abstraction that eat up resources while providing little more functionality than if it was solved lower down. This has been further enabled by a long history of rapidly increasing compute capability and vastly increasing memory and storage sizes. Because they are only interacting with these older parts of their systems at the interface level they often don't know that problems were solved years prior, or are capable of being solved efficiently.

I'm starting to see ideas that will probably form into entire pieces of software "written" on top of AI models as the new floor. Where the model basically handles all of the mainline computation, control flow, and business logic. What would have required a dozen Mhz and 4MB of RAM to run now requires TFlops and Gigabytes -- and being built from a fresh start again will fail to learn from any of the lessons learned when it was done 30 years ago and 30 layers down.

neilv

13 hr. ago

On some of the infamous large public IT project failures, you just have to look at who gets the contract, how they work, and what their incentives are. (For example, don't hire management consulting partner smooth talkers, and their fleet of low-skilled seat-warmers, to do performative hours billing.)

It's also hard when the team actually cares, but there are skills you can learn. Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).

But once you're a bit comfortable with the art and science of those, big new challenges are more about political and environment reality. It comes down to alignment and competence of: workers, internal team leadership, partners/vendors, customers, and investors/execs.

Discussing this is a little awkward, but maybe start with alignment, since most of the competence challenges are rooted in mis-alignments: never developing nor selecting for the skills that alignment would require.

cheesecompiler

9 hr. ago

Right, it's largely politically and ego driven; a people not a software problem.

JBlue42

11 hr. ago

> Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).

Was there any literature or other findings that you came across that ended up clicking and working for you that you can recommend to us?

jgeada

4 hr. ago

Fundamentally this is not a statement about programming or software. It is a statement that management at almost all companies is abysmally inept and are hardly ever held to account.

Most sizeable software projects require understanding, in detail, what is needed by the business, what is essential and what is not, and whether any of that is changing over the lifetime of the project. I don't think I've ever been on a project where any of that was known, it was all guess work.

scuff3d

3 hr. ago

Management is always a huge problem, but software engineers left to their own devices can be just as bad.

I very rarely hear actual technical reasons for why a decision was made. They're almost always invented after the fact to retroactive justify some tool or design pattern the developer wanted to use. Capabilities and features get tacked on just because it's something someone wanted to do, not because they solve an actual problem or can be traced back to requirements in any meaningful way.

Frankly as an industry we could learn a lot from other engineering fields, aerospace and electrical engineering in particular. They aren't perfect, but in general they're much better at keeping technical decisions tied to requirements. Their processes tend to be too slow for our industry of course, but that doesn't mean there aren't lessons to be learned.

ChrisMarshallNY

13 hr. ago

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not.

I guess that’s the real problem I have with SV’s endemic ageism.

I was personally offended, when I encountered it, myself, but that’s long past.

I just find it offensive, that experience is ignored, or even shunned.

I started in hardware, and we all had a reverence for our legacy. It did not prevent us from pursuing new/shiny, but we never ignored the lessons of the past.

pork98

13 hr. ago

Why do you find it offensive? It’s not personal. Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right? Even evolution shuns experience, all but throwing most of it out each generation, with a scant few species as exceptions.

0xbadcafebee

14 hr. ago

Software projects fail because humans fail. Humans are the drivers of everything in our world. All government, business, culture, etc... it's all just humans. You can have a perfect "process" or "tool" to do a thing, but if the human using it sucks, the result will suck. This means that the people involved are what determines if the thing will succeed or fail. So you have to have the best people, with the best motivations, to have a chance for success.

The only thing that seems to change this is consequences. Take a random person and just ask them to do something, and whether they do it or not is just based on what they personally want. But when there's a law that tells them to do it, and enforcement of consequences if they don't, suddenly that random person is doing what they're supposed to. A motivation to do the right thing. It's still not a guarantee, but more often than not they'll work to avoid the consequences.

Therefore if you want software projects to stop failing, create laws that enforce doing the things in the project to ensure it succeeds. Create consequences big enough that people will actually do what's necessary. Like a law, that says how to build a thing to ensure it works, and how to test it, and then an independent inspection to ensure it was done right. Do that throughout the process, and impose some kind of consequence if those things aren't done. (the more responsibility, the bigger the consequence, so there's motivation commensurate with impact)

That's how we manage other large-scale physical projects. Of course those aren't guaranteed to work; large-scale public works projects often go over-budget and over-time. But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process to encourage humans to do the right thing.

SchemaLoad

6 hr. ago

> But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process

If there was sufficient consequence for this stuff, no one would ever take on any risk. No large works would ever even be started because it would be either impossible or incredibly difficult to be completely sure everything will go to plan.

So instead we take a medium amount of caution and take on projects knowing it's possible for them to not work out or to go over budget.

farrelle25

11 hr. ago

> Software projects fail because humans fail. Humans are the drivers of everything in our world.

Ah finally - I've had to scroll halfway down to find a key reason big software projects fail.

<rant>

I started programming in 1990 with PL/1 on IBM mainframes and for 35 years have dipped in and out of the software world. Every project I've seen fail was mainly down to people - egos, clashes, laziness, disinterest, inability to interact with end users, rudeness, lack of motivation, toxic team culture etc etc. It was rarely (never?) a major technical hurdle that scuppered a project. It was people and personalities, clashes and confusion.

</rant>

Of course the converse is also true - big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect. Of course, most of these projects were bland corporate business data ones... so not technically very challenging. But still big enough software projects.

Gez... don't know why I'm getting so emotional (!) But the hard-core sofware engineering world is all about people at the end of the day.

dockd

8 hr. ago

If it makes anyone feel better, it's not just software:

https://en.wikipedia.org/wiki/Auburn_Dam

https://en.wikipedia.org/wiki/Columbia_River_Crossing

If you're 97% over budget, are you successful? https://en.wikipedia.org/wiki/Big_Dig

mpyne

7 hr. ago

> If you're 97% over budget, are you successful?

I don't like this as a metric of success, because who came up with the budget in the first place?

If they did a good job and you're still 97% over then sure, not successful.

But if the initial budget was a dream with no basis in reality then 97% over budget may simply have been "the cost of doing business".

It's easier to say what the budget could be when you're doing something that has already been done a dozen times (as skyscraper construction used to be for New York City). It's harder when the effort is novel, as is often the case for software projects since even "do an ERP project for this organization" can be wildly different in terms of requirements and constraints.

That's why the other comment about big projects ideally being evolutions of small projects is so important. It's nearly impossible to accurately forecast a budget for something where even the basic user needs aren't yet understood, so the best way to bound the amount of budget/cost mismatch is to bound the size of the initial effort.