BT Broadband
Every so often there is a tech post that I end up writing which rather than starting as a new piece of tech or a random project I've embarked on, is instead born out of frustration and a genuine dismay at a state of operation. This post is unfortunately one of those, with a hope that it helps people in the future (and lowers my blood pressure).
A few months back I found myself visiting someones house whereby I needed to use their internet connection for work. Nothing out of the ordinary there (remote work has been a thing for some time), however things didn't exactly go to plan. Their internet, a BT VDSL2 connection that in theory should be 40Mb down/7Mb up turned out to be nearly unusable. Not a case of a busy connection with a slow download/upload, but rather a case of very high packet loss / jitter combined with fluctuating speeds. As you might have guessed, using Zoom was a non-starter (even for audio-only) and email was a shot in the dark.
As I love the occasional technical challenge (that and I like to get paid every month) I decided to dig into what was happening and what was fundamentally broken. I was informed that their connection had been like this since they moved in (about 3 years ago from memory) and that despite multiple people complaining about slow speeds they had assumed it was just a slow connection and was working as expected. Given they don't work in IT and conversations with BT had previously informed them that everything looked fine, it's easy to see why they would come to this conclusion. To complicate matters the UI of the HomeHub stated that the line was running at 40Mb down, and that is what it had always showed.
Given that the hub is one of the newer BT HomeHub models (one generation pre-disc) it isn't an old device and it does support newer WiFi standards (at least new enough to hit 40Mb on a bad day). A subsequent trip home and some testing gear allowed me to rule out the obvious (no WiFi interference, no noise on the power line etc). Connecting via Ethernet with no other devices plugged in / connected via WiFi revealed the same issue, however it also revealed that the speedtest would appear as if it maxed out for just under 2 seconds and then would drop drastically to around 7Mb. The latter pointed towards interference on the line and it having to back-off under load.
Checking the cabling from the master phone socket revealed that things weren't as optimal as they could be, with an extension cable running alongside the exterior of the property to an unused socket. Given it was unshielded cable and not in use it was subsequently removed. The master phone socket was also upgraded to a newer NTE5b socket (the latest used), which includes the built-in VDSL2 line filter. After checking everything looked good it was time to test again, and unfortunately the results were still the same. The router was powered off for 10 minutes and restarted to let it train on the line again, and sadly even after a few days the results were still the same (including the strange drop of speed when testing).
With everything tested within the house it was time to get BT involved again, this time armed with what had taken place at home / had been tested. The first engineer visit was scheduled and after a few days they arrived and performed some basic testing. They checked for the usual problems that might be the cause (within the house) and found nothing that stood out. They stated that the modem said everything was fine and that a test performed remotely showed that things looked fine, so no issue to be found. They then tested with a local device using the BT speed tester and couldn't understand how the results were contradictory. Frustratingly they replaced the new phone socket I had fitted with the exact same model, despite being informed that it had just been changed. With that not improving things they also replaced the HomeHub with the latest model (a nice thing to do, but sadly it didn't improve things). The case notes were updated to state that it might be an upstream issue, and that was the end of the first visit.
With nothing progressing BT were chased again and another engineer visit was scheduled. On cue another engineer arrived and performed the same tests, providing the same canned response. Not-so amusingly it transpired that the case notes either weren't available or hadn't been updated, so the engineer had to perform the same tests as they had no idea as to what was previously tested. The end result was still the same though, with nothing resolved and no further communication.
Another set of phone calls to BT (yes, multiple) and an escalation manager had been assigned to the case. That is, the new case that had been created because the previous case had been marked as resolved despite nothing being changed. The only logic I could put to this was a team trying to preserve their SLA's in the hope that the customer would simply go away. Regardless of the rationale, closing an unresolved case with no communication to the customer is disgusting at best, and in truth shows how BT clearly aren't being held accountable enough.
A few days later and it was time for another BT engineer to visit, who upon arrival confirmed that they hadn't been provided any case notes either and so were coming into this blind again. The conversation with this engineer began pleasant enough, however it soon took the tone of "there is no problem to fix, other than your expectations". Thankfully I managed to be there for this visit, whereby getting the engineer to do their job properly took a remarkable amount of effort. After being told repeatedly that there was nothing wrong with the line, it took the threat of grabbing my equipment to prove that the line is clearly faulty to get them to actively engage, and even then it wasn't straightforward...
The initial view from the engineer was that (again) there was nothing wrong with the line, even to the point of not running any form of external speedtest as it would be a waste of time. After arguing that not running a local speedtest was in fact not doing their job, they agreed to entertain one whereby it showed the drop to 7Mb. The first response, it must be a bad speedtest server (despite it being BT's own). My response, pick anyone you want or three of the dedicated ones I use that are backed by a 10Gb connection. A few more tests later and thankfully they were enough to get them to engage and perform the first part of what would subsequently lead to the resolution.
An Openreach engineer has specialist equipment that doesn't just run a simple speedtest over the line, it actually acts as the modem for your connection and allows testing of the different frequency bands (channels) that are allocated by the outside cabinet to your connection. This test isn't performed by default and took arguing to get the engineer to perform it (as it takes 20 minutes). Amusingly, upon running the test the display on the testing equipment immediately showed that there was a fault. Of the channels allocated to the connection there were multiple downstream channels that had zero signal, not even a hint of one. This wasn't something the engineer had seen before (he was honest about that), and another subsequent test was performed showing the same issue.
You would think that with two clear test results showing that something is very wrong with the downstream things would make immediate progress, however the engineer went into what I like to call 'full bullshit' mode. While it's been a while since I studied for my CCNA examination and did my degree in computing networks, being told that the channels on the display were in fact WiFi channels and nothing to do with the connection was less than amusing. Worse, after pointing out they clearly weren't and that the same speed results were found with a device connected via Ethernet, the BS continued about how 2.4GHz is faster than 5GHz. As you might guess, my tolerance for this is somewhat low, especially when it's at the expense of someone just wanting a stable connection that they pay for each month.
The next excuse on the list of things that would be causing the connection to behave like this was that of powerline noise. Taking aside the RF chokes and smoothing capacitors in the power supply that the hub uses, and the fact that at this point two new hubs had been used, it was another fun connection. After a few minutes of back and forth on this, the suggestion was to turn off devices one at a time and see if that improved the speed, however the engineer wouldn't be here for this as it would take too long. Losing my patience with yet another pointless conversation I made the point that I would head home, grab a UPS, come back and plug only the hub into it while powering off the entire house to prove that nothing in the house is causing any interference. A rather frustrated engineer subsequently admitted he had a generator in his van and could actually perform the test without my need to travel.
After that part had concluded (without any testing) it was now time to blame it on REIN (Repetitive Electrical Impulse Noise), specifically that it could be external interference being picked up by the cable from the drop-pole, which might not be fixable. Amusingly, one of the first things I did was ask next door who their ISP was and what speeds they were getting. After pointing this out and showing the speedtest the engineer finally stated to engage again, asking if they could take the HomeHub to the street cabinet and plug it in directly. Thankfully this was the first real step in the right direction.
The engineer was gone for the better part of 2 hours before finally returning with some solid information that would contribute to getting things fixed. The engineer had indeed tested the HomeHub directly on the cabinet and was getting the same speeds (ruling out anything within the house and the cabling from the street cabinet to the property). The line had also been patched into a different card within the cabinet and was still experiencing the same issue, indicating the problem wasn't with the card either. Finally, they had spoken with a colleague who had seen this with two other properties previously and knew of the issue. To summarise the problem, the line profile pushed by BT (not Openreach) isn't correct for the line and so while it should be using all the channels allocated to the connection it actually can't, but that isn't being communicated to the Hub.
The end result, whenever the cabinet was trying to send data to the HomeHub it wasn't coming down some of the channels and so everything had to slow down to try and make things reliable, despite the line not showing as running at a lower speed (as technically the line wasn't under any significant interference / crosstalk). If your head hurts here, it's to be expected. Think of it like a line of people all saying a single word in a sentence and multiple people being asleep, resulting in jumping back to the start of the line to try the last few words again. At this point Openreach were confident that this is a BT issue as everything on the line looks 100% and there is nothing further they can do. The case notes were (in theory) updated again, and so progress should be made, right?
Sadly not... After a few days of no communication from the case manager it was time for more calls to find out why nothing had progressed. As it transpired, BT had seen the case notes from Openreach but had rejected them as they didn't believe that to be the case. At this point I was already testing out different mobile providers to see if a 4G connection would be more reliable for them, as the level of frustration at this point was beyond ridiculous.
Another engineer visit was scheduled however it was rejected as at this point there was nothing left to test and no rationale could be provided as to what they were expecting to find that differed from the previous 3 engineers. Despite this an engineer did turn up (on a different day than originally proposed and without any warning). As you might imagine it isn't great to get a phone call stating that an engineer is outside and why aren't you opening the door. Another call with BT and a different approach was taken by them, this time offering to allow the contract to be ended early so they could find a different provider. This was the point whereby my patience with this had hit zero, as this is simply throwing the problem over the fence to someone else.
As the nature of my job for many years has involved machine data and programming, it was time to get concrete evidence of everything for what would clearly be an impending legal battle. A few evenings later and a Raspberry Pi was gathering the line stats every few minutes, and when I say line stats I mean all of the information that the HomeHub has on the connection (including SNR levels and a few bits of information they don't appear to display in the UI). Additionally, a speedtest was configured to run every hour with the results logged and charted. One summary dashboard later and the stats of the connection were easy to follow (even for BT).
Once again things had gone quiet, with the case manager seemingly not responding to emails any more. A few more calls later and a new escalation manager was thankfully assigned. On a conversation with them regarding the issues being faced and what Openreach had stated the problem was, I made the suggestion of changing the line profile to a faster one and then subsequently dropping it back to the speed being paid for. The theory here was that in choosing a faster profile (or potentially any profile) the channels available at the cabinet level would be reassessed and with any luck all of them would be available. It was also pointed out that the connection was being continuously logged, including the current profile and thresholds for the line and the hourly speedtest.
Annoyingly, the change that was scheduled hadn't gone live. Credit to the case manager who did call the next day to see if there was any change, and did seem somewhat confused when I pointed out that the profile hadn't changed as the line speed was still set at 40Mb. Following the recommendation of the case manager the HomeHub was reset later that evening to hopefully catch another attempt at the profile being changed (to 55Mb for anyone wondering).
It was the next day when there was finally some good news, with the dashboard showing that the line speed had increased and the hourly speedtests showing much faster speeds (though still not at the expected speed). Another call with the case manager took place, with the conversation being good and that the line now needed to train over the coming 10 days to see what speed it could reliably run at. With the connection still being logged and the hourly speedtests still taking place it was time to sit back and wait.
Over the passing 10 days the line continuously adjusted its thresholds overnight resulting in a subtle speed boost. The latency/jitter values had improved significantly and for the first time the line was genuinely usable. Pages loaded in a way they hadn't seen before (at least not at home), and I won't forget the comment that for the first time their video calls actually looked clear. It took a long time to get there, but the end result was a connection that actually worked properly and at the speeds being paid for.
So why write this (and in so much detail). In truth, though the engineer visits and calls to BT the constant response was always that it's a fault within the house, and that realistically it only progressed to the final state by technical arguments and connection logging that 99% of the population don't have the knowledge to do. The internet has become a core service that billions of people depend on every day, and being in a situation where things don't work (including video calls to loved ones) isn't something we should be facing given the progression of technology. As a few of my posts have resulted in emails that have helped people (that I have heard from), I hope that what I have written here helps those who are experiencing similar issues and are fighting to get things sorted.