September 2023, I knew I was in for a wild ride when I won this contract. I already coached other developers in a commercial setting, and I understood what was asked of me. My goal was to get two existing developers comfortable with Haskell while at the same time increasing productivity.
In my first week I started with understanding the code base1 and what theyāre building. They help precast factories find their produced walls called units. 2 This project puts Bluetooth tags on precast walls, so people donāt lose them at the factory. With these tags they can find them easily, because they provide an active signal. So the Haskell web app receives all that data and makes a nice interface out of it. The interface provides a text box to search for a unit3, which can be rendered on a map.
Ironically enough my first contributions to this organization werenāt about Haskell or teaching at all, there was an āescalationā on the Monday I started, and they managed to react 3 days after the event(!) Apparently they barely had any monitoring in place. I brought up how slow it was to react like that in one of the first meetings. This caused shock waves through the entire organization and they re-prioritized monitoring for all other projects as well.4 Not surprising that many of my initial Haskell contributions were just about improving the engineering side. For example by using annotated exceptions which gave us stack traces, or making Katip5 and logz work together, giving us production logs and monitoring.
When I joined there was only a single factory actually using this system, and the project was filled with bugs. My first impression was that the engineers were intensely focused on architecture, they feared losing productivity due to having ātoo much codeā. However, this came at the cost of getting stuff working. Since this is a mostly TypeScript organization that just adopted Haskell for this project as an experiment, it makes sense to be afraid of code bloat. Another issue was that the people who knew Haskell preferred using the most advanced features available, rather than just focusing on the basics. For example they would teach how to use lens, instead of simple record updates. This made it feel as if there was an endless mountain to climb of stuff to learn. I quickly proposed focusing on simple Haskell. This would cut down on the size of the learning curve and allow people to focus on building features, so theyād feel productive and could explore the remainder of the language at their own pace. Furthermore, the relational aspects of Postgres were in large parts of the system ignored. For example a load of units, which in reality is a wagon of precast walls, was represented as a JSON blob. 6 There was no load table and a load_unit_connection table, just a blob. Ironically enough this caused additional code to retrieve the relational aspects. Culturally the focus was āarchitectureā rather than making something work. Over engineering things that donāt matter was the name of the game. This isnāt unique to this project, at other companies Iāve seen similar discussion. Trying to achieve perfect code essentially. What was special is that they were still obsessed over this, despite how buggy and slow development was.
I learned from the previous contractor that at some point some engineers actually revolted against the use of Haskell! There were more people doing this, but they all transferred themselves out of this project. How ironic is that? 7 Anyway, this gave me some dreadful premonition. It felt like everything was still going awfully and things were getting worse!
In the second week I discovered that the teamlead was going to quit. He was one of two people I was supposed to teach Haskell. I never got the opportunity to do this. Heād be gone in a month. He was seething against Haskell. He had been only using HLS8 for compiler feedback and itās unreliable, 9 Furthermore, he claimed Haskell had caused a huge amount of āmental overheadā.10 But after talking to him I learned his underlying frustration was with upper management ignoring his recommendations. Ironically enough, despite our differences in āideologyā11, I liked this guy. He wasnāt bad at his job12, but dear lord, he was pissed!
After two weeks we can safely say this project was literally falling apart, supposedly all because of āHaskellā. Which is frankly absurd. They made it hard with an over-engineered architecture, and by using unreliable tools such as HLS. Furthermore they insisted on using every feature under the sun, creating an endless learning curve. 13 Perfection was the enemy of the good. I understood the tech leadās perception, HLS is not good for commercial development. If you have to figure out compile errors within your unreliable editor and also deal with an over-engineered architecture itās going to be taxing. I also think he never learned how to trust the compiler to guide changes, and how to read type signatures. All of the basics were skipped in favor of writing pristine code from the get go. I realized that at this point I had a choice, keep going on with this project that apparently was falling apart, or jump ship, I was still in conversations with other companies at this time, so jumping ship would be easy. I decided to keep on going. Because things were going bad, but I felt I could manage this. I knew this project would be challenging from the start, but I also liked the colleagues on the project. I felt I could rely on them, and we could do it if we just got shit done.
Anyway I was asked to take over as tech lead, and I agreed. I donāt mind, I guess. My leadership would involve getting everyone to be independent anyway.14 I think during my third and fourth week I started teaching Haskell, unfortunately the colleague I had to teach had essentially no experience. But fortunately she was smart, exceptionally smart. So we paired on some tasks, and her issues were mostly syntax related. We managed to get this small task over the line after two weeks of one-hour pairing sessions per day. In the final couple days she mostly worked on her own. However, at that point productivity had tanked so much that my colleague had to go back to doing frontend stuff and I had to take over the backend work. This was done to ensure that we met our customersā deadlines.
Then the actual lead quit, and it was strange, I had to do a bunch of management work all of a sudden as well, which I just postponed as much as possible in favor of getting the features out. 15 And we managed to do so, and everyone was sort off amazed how much we got done in a short amount of time despite being a man short. The previous lead had essentially been avoiding working on the backend, and making excuses to not do work in that final month. He said he was documenting.16
There were still many issues with this project. 17 Especially looking back in hindsight, but we were tightly under our only customerās āyokeā, they decided what we would be building, or drop us. The product managerās (PM) role at this point was to do customer management. 18 The project was slow to load, the search barely worked, and the units werenāt always found, and the lift system didnāt work either. The customer wanted us to fix these frontend features first. Dealing with the slow loading, the random reloads, and search, for example. It was just annoying for them to use. To be fair, the system would be difficult to sell with a frontend that was so buggy anyway.
Solving those involved me just digging through the TypeScript code and undoing most of the madness. They were using TypeScript as a relational database, so I just deleted all that and let the actual database be a relational database. This gave the developer who I was teaching Haskell time to learn on the backend while making features. The goal was to give her confidence, that she was creating commercial code in this language. I wanted to move in a direction where she could create entire features, front and backend. Because this is how Iāve learned to implement things. Doing so will remove all communication barriers and help her think ābiggerā. The goal was to get her to be independent.
It took a month or two to repair all these issues. No longer would the frontend do joins, or would there be any JSON blobs in the database. Furthermore I introduced a migration system so database operations became easy. The website became a lot more performant, almost magically, 19 and could deal with larger amounts of data. After that we moved onto actually building some features. The lifts dashboard for example. I managed expectations around rolling out features fast, we had to do integration testing. By which I mean put it on staging, and only accept something done if it all worked together. Unlike a normal web app we also have sensors and a pipeline of continuously flowing data. Because this is a complicated system you find many small issues. And a single small bug can break the entire system.
I think the end of the beginning was the customer visit in January 2024. Where I got to witness the problem at hand with my own eyes. I got to see the āyardā of gray concrete and how large these factories are. Theyāre large. We also demoed some features to them and they seemed to be less annoyed with us. I wouldnāt call it impressed, considering how slow development had been before I joined. But they were seeing movement, and this caused a renewal. We were free of their āyokeā.
By the skin of our teeth
Around January or February we started hearing signals or messages from upper management that we really had to start selling this product to other customers, OR ELSE. the commercial department sort of had given up trying to sell the project at this point completely. Honestly the way they did it was kind of insane for 50k+ deals. They just sent the hardware to a factory and told them to figure it out. š You can maybe do this with a mature known product. But this was a barely functioning prototype from a unknown company and it involved large amount of money. Of course they were going to expect āwhitegloveā treatment and of course the customers rejected this. Prospects would take the āsensorā, and Bluetooth tag into the yard, and see if it appears on the map. It didnāt because the main algorithm took like 30 minutes to figure out locations of these units. 20 Now our main customer was happy with that, because they were already bought in. However, if you want to sell this locating system, this is the first bullshit check you do. I think one of the OKRs21 at this time was to set up the project without any assistance from the internal team. In essence, to make it a do it yourself experience for the customer. This as a goal was batshit insane. Not only were the deals too large for this to be done. The tech was far from being polished enough to be used like this. Furthermore these would be recurring deals. so cheaping out on some manpower for a one-time setup cost is kinda crazy. I brought this to the attention of the product manager, and he agreed. 22
Because we were sort of āfreeā to decide what to work on, I strongly argued in favor of making sales easier. Another topic was for example to add admin pages, we were currently just modifying stuff directly into the database. We decided to go all in on just modifying the technology so that we could pass technical trials and sell to customers. Whatever it took. There were two major issues:
- Customers had a poor or no mental model of how the (hardware) technology worked. In that they didnāt understand Bluetooth23 signals.
- Weād also address taking a sensor and tag to the yard and not get a ping on the map. The system had to become āliveā
We ātackledā these problems with something called ātrialā mode. The designer made a ton of screens explaining all this stuff and have an automated way of guiding a new user through all of it.24 25 One nice thing is that I could let the other engineer implement most of the frontend and backend features for supporting that, while I focused on actually making the signals available at all. Keep in mind we did not just have a āthere is a databaseā backend called the server, we also had the āfigure out what a signal isā backend called reality capture. I worked on that reality capture part. This gave the other engineer the opportunity to grow more, and come up with her own designs which we could work with.26 We finished most of this around May.
I think around May in 2024 the money had run out for this project. So the other engineer got pulled, which was annoying because she just had gotten reasonably productive to do small full-stack features on her own. Another quarter or two and sheād be amazing. The designersā contract just wasnāt renewed at all. 27 Despite the cash burn being lowered, this project was far from saved, we were still scraping by. And the threats from upper management continued, and sales kept on fumbling.28 We knew the particle hardware we were using wasnāt performing well. But obviously weād get no hardware resources. It was just me and the algorithms engineer left doing technical work. And the product manager to deal with customers and sales.
One prospect customer who we were talking to for years at this point was located in Denmark. We knew the hardware wasnāt performing well because of trying to get the system to work with them. They told us, āshow it works and weāll use itā. It didnāt work. For example they had sensors in their little trucks driving by these walls with Bluetooth tags on them. The sensors would pick up no signals at all. I think the product manager had visited this place five times already by that point. Everytime the algorithm engineer and the PM would have a new tactic to ācrackā this yard. He went again around May, and failed. But as we know now there were serious fundamental issues with the firmware. So I donāt think trilateration29 couldāve worked in any case without firmware upgrades.
The sensors worked well enough to give you a live proximity-based location. This would just attach the sensor GPS location to a Bluetooth tag on extremely close proximity, say smaller than a meter. We could just measure the signal strength required for that, it is reliable. The backend was just a loop essentially, constantly querying the timestream30 database for updates at small incrementing windows. Itād be attaching the signals to gateway locations if the signal strength was high enough and the time stamps were close enough. This one feature, we tested the shit out of, it was our ace in the hole.
Something else happened around May, the head of software decided that this project could benefit from an app. And he decided to make a prototype. The QR code scanning in app was much better than whatever we could provide on the web. And the app had Bluetooth capabilities, which browsers to this day canāt access. So I saw some good growth potential in this and was so relieved he took the initiative. Now Flutter is far from perfect, and Dart isnāt a great language to work in.31 But there was no way I could also maintain an app on my own aside from dealing with the glaring problems this system had. So Iād not look a gift horse in the mouth and thanked the head of software for his contributions.
We had scheduled the check in Denmark beginning of July. The person managing our meeting, letās call him Bob, was actually quite happy to arrange some Danish delights. The product manager told him to arrange whatever for lunch, but I asked for some local Danish specialties. He seemed to be quite happy to do that. Bob was quite a character actually. He was completely disinterested in this project, he just wanted to get this done. His organization upper management apparently told him to do these checks, and he was going to, with minimal of efforts. But Bob seemed to be delighted to share about Danish culture and cuisine, what a joyful reminiscence. The check involved production people attaching a bunch of beacons to various units. Weād then use the system to find them. Of course the system failed. But not just because of the bad firmware. Also because the way this factory was managed was extremely strange and inefficient. Theyād move their trailers over to one part of the yard. Perform QA. Send them to another part of the yard to reshuffle.32 Then send them to whatever storage, then go back for reshuffling, etc. There was no rhyme or reason to any of this. Mind you these are walls they were carrying. These are heavy loads, and they used heavy machinery, but nobody had any clue how they were using these tools. We did not know this either, but we had scheduled several days to prepare for this check, and at some point I started asking the PM about what was going on, which made him think about it.33 At several times the system would locate these units in the middle of the road. They probably were in the middle of the road for a while, but we had no sensors in those reshuffle or QA areas, so our system wouldnāt further update even if they got moved there. We had an in real life phantom read. Honestly, we were set up to fail from the start. But no one knew how unfair this tech check was, even at the factory.
Here I had a little epiphany, all that talk about COā reduction for the other projects, may be in fact also be true for this project. Upper management, the same one threatening to cancel us all the time, deeply cared about carbon reduction. Nobody realized this because nobody has spend hours upon hours in the yard trying to get a tracking solution to work. Nobody has ever put any extra thoughts around the yard. So in fact a project I initially joined for the challange and for using Haskell, I stuck around because I liked the people, may in fact also be helping the environment? 34
Now the fact the system failed was a huge issue for us. We had some time however, we had planned the entire week for doing this check. If weād failed this check it would be a huge blow against us. In fact on the road to this check the PM was already casually asking me if I thought heād be a good PM for other software companies. I listened to what was not being said. I think heās awesome, but we were fucking cooked. However, we had our ace up our sleeve. the proximity algorithm.
On judgment day, before the check was being done by Bob, the PM told me to grab a sensor and go walk by all units, heād do the same but from the other side of the yard. the yard being 300x250m, which is around eight football fields. It started raining heavily. Find a tag, press the sensor to it, wait a second, then move on.35 Rinse and repeat while getting soaked. Before the check I went back into the office making sure everything was updated. Unfortunately some units had updated to the wrong location due to the other trilateration algorithm sending updates that override proximity. Here the algorithm engineer, the PM and I had a healthy debate. The algorithm engineer wanted a fair check, The PM and I knew there was nothing fair here. The PM and I wanted to force update the wrongly located units in the database. We ended up doing that by matter of vote36
Bob had already arrived, but the PM asked for a moment. The PM told me to also put one of these sensors in my bag, weād just walk with them on our person while doing the check. I was shivering from being soaked. I guess Bob didnāt put two and two together as he didnāt care.37 With the first unit Bob tried to let us fail on a technicality. But the PM went hard against that. The sun had come in, and these checks became a pleasant walk. Bob started cracking jokes and smoking cigarettes. The next ones were all passing (of course), in fact when the customer would check he would at times see the system update in front of his eyes. And just accept it as normal. Not knowing weād have sensors on our person. 38 Bob believed it worked. some rough spots were blamed on the newly developed app, but overall he was happy. We passed the check by the skin of our teeth.
This was huge. Weāve been scraping by and losing members for the past half year. But we managed to sell. No longer could upper management blame the technical working of our system for sales failures. Annoyingly this particular deal fell through anyway, but not because of the technical capabilities of the system. I think this did buy us a chunk of time however to actually address the underlying issues. During this check we had spotted many other issues with the system. Fun things such as DDoSing ourselves by having a bag of tags next to the sensor (which caused a bit of panic the day before judgment). I addressed all these issues.
Winning
We also started doing another trial for a potential new large customer, similar to the one in denmark. This time it would not take seven trips from the PM. We had proximity algorithm to convince them that we could find locations of units. Actually one of the sensors failed during this check, fortunately they had a backup, but aside that we past this tech trial on first go. We confirmed here that if you convince people it works swiftly, theyāll be a lot less thorough in their technical checks. This trial in September 2024, I believe was still on the old crummy firmware. We passed so āeasilyā because we failed so hard and often in Denmark and learned. Furthermore the algorithm engineer did her magic, and had convinced them the system could be used for LEAN manufacturing. Unfortunately the customer had arranged no experts for that check there at the time. So they didnāt realize what they were sitting on. But they were convinced the system worked and was something that they wanted. This came back to us in January with some good news once they finally found a guy to judge that, and he was enthusiastic.
Then the next major issue started popping up for our existing customer. The Bluetooth tag batteries around July started dying. We thought we solved this by just sending new beacons. The new ones had better settings which were supposed to make them last for 4 years. Our existing customer also found the algorithm wouldnāt work on another part of the yard, we worked around this by doing those manual forced updates with proximity. Actually they were happy to take sensors into the yard. Around September I started working on the firmware, described elsewhere so I wonāt go into it. This made proximity significantly better, and temporarily killed the other algorithms, due to an excess of data causing it to run out of memory.
Around October we won an award for smart data collection in construction, we were nominated by our one and only customer. Furthermore around Christmas we got good feedback from them. They had managed to lose around 30 units in SAP and managed to find them all with our system! I knew at the time it was just my proximity algorithm that was operational. Because the new firmware had killed the trilateration algorithm. So I was a little confused by this. Furthermore our own audits showed that we could seriously work on accuracy. I think what happened was that one of their employees consistently took a sensor into the yard. Iām not really sure, but if theyāre happy, Iām happy.
Even after winning the award and having this major customer in the pipeline, we still got more threats of upper management. We also got more curve balls thrown at us. For example the temperature product required my services for 2 weeks. I was the only software engineer trying to make this logistic stuff work, they had seven people working on their temperature tech, and they couldnāt do this one algorithm integration in time. I was supposed to help them for 2 weeks. I finished the task in 2 days. After which they started inventing other tasks, I said āno thanksā. Other people from upper management who had been ignoring this project for years at this point all of a sudden wanted to get involved. I think people were noticing this project was starting to win. In their helpfulness they blocked architecture changes until detailed documentation was made for example. Iāve no idea how to deal with these requests. Spending two days on documenting while one of the main algorithms is down seems irresponsible. I just stopped doing architecture changes, they were mostly cost saving anyway, and optional for now.
The PM would become father, and heād go on a three-month break from the job in December. We knew this was coming up so we were preparing for that point. The algorithm engineer and I would share responsibilities. I was anxious for this period. One of our main concerns was to keep the commercial team focused. Doing a weekly meeting turned out to be highly successful, actually the main sales guy liked doing them and requested to continue doing them after the PM got back. We made this work. Come on, compared to what weāve been through in Denmark, this was easy right? It was.
For most of December I worked on signal processing. This was some new feature where we sent signals directly to the backend so we could support this ādetected hereā status. This was one of the main features the new customer would use, to determine on which site a unit is. Itās a more general way of locating units, rather than by coordinates you just give it a site name. This is easy because you know on which site a sensor is, so all you have to do is pick up a signal and associate it to a sensor. Before we fetched the signals from timestream but that was kind of expensive. This approach used a simple queue.
There is plenty of other high impact stuff to do. Fore example the beacon battery issue exploded, we did some analyses on an existing batch and found that around 5% of the new batteries were unacceptably low. I already had written an app to check for beacon settings and battery settings, it was just a bit slow. However I modified it so that it would highlight bad settings in red. This made sifting through all these beacons (thousands of them) a lot faster. Itās available opensource. With the things learned from that development I also built a liveliness check in the registration app.39 This should prevent the system from accepting any dead or low battery Bluetooth tags.
Weāre in March now. That big customer signed the deal, and we likely got another construction site. Currently weāre servicing two locations; soon this will grow to fourteen. Weāre winning and we likely wonāt be able to keep up with the work. Which is a much better problem to have than being threatened by cancellation. Whilst thatās going on weāre still surprised by leadership with cryptic and somewhat intimidating remarksāfor example, asking āis this really aligned with our vision?ā. Their vision being to tackle carbon impact in construction. Which is ironic, considering the experience in Denmark.^danish-madness I suppose theyād just not believe us. Despite the irony and the fact we just had this huge deal done, weāre still having these kinds of questions being raised. I think I requested a warm body to deal with those battery replacements. This request got denied. Itās an odd situation to be in, where you sort of know already youāre going to be short on manpower, but upper management still believes youāre a bad project. We triumphed in Denmark, you think a salty manager is going to stop us?
As long as we focus on the right stuff to build weāll be fine. Weāve been focusing on stuff that has impact, and I think about 90% of the things we worked on had impact. We bought time by dealing with our customers request and get a renewal, that time we used to make the product sell able via proximity, which bought us more time to fix the fundamental firmware problems. Which made the product actually good. There are other large problems we have in this project. For example the particle data operations are hugely expensive, but this company can build its own hardware. Upper management just likes to kill our requests. Ironically enough, itās costing them money, not me. It causes me more work because I work around it these problems with more firmware modifications, for example, I introduced a sleep mode where we disable the sensors during off hours. Iāve never claimed to be good at firmware, but even a slowly produced implementation of these changes causes a huge impact. Despite how cooked internal politics is for this company, with two major customers now, many sites and getting the award, we can say weāre winning this unfair game.