Let’s not bury the lede: it was DNS. However, unlike the meme ("It’s not DNS, it’s never DNS. It was DNS"), I didn’t even have an inkling that DNS might be the problem.
I’m writing a new blog about streaming Apache Kafka data to Apache Iceberg and wanted to provision a local Kafka cluster to pull data from remotely. I got this working nicely just last year using ngrok to expose the broker to the interwebz, so figured I’d use this again. Simple, right?
Nope.
❯ kcat -L -b 4.tcp.eu.ngrok.io:16689
%3|1714734085.241|FAIL|rdkafka#producer-1| [thrd:4.tcp.eu.ngrok.io:16689/bootstrap]: 4.tcp.eu.ngrok.io:16689/bootstrap: Connect to ipv4#0.0.0.0:16689 failed: Connection refused (after 4ms in state CONNECT)
%3|1714734086.246|FAIL|rdkafka#producer-1| [thrd:4.tcp.eu.ngrok.io:16689/bootstrap]: 4.tcp.eu.ngrok.io:16689/bootstrap: Connect to ipv4#0.0.0.0:16689 failed: Connection refused (after 2ms in state CONNECT, 1 identical error(s) suppressed)
% ERROR: Failed to acquire metadata: Local: Broker transport failure (Are the brokers reachable? Also try increasing the metadata timeout with -m <timeout>?)
Spinning up this Docker Compose should have worked just fine (it did last year!). But for some reason my Kafka broker couldn’t be reached (Connection refused
).
The Basic Setup 🔗
After a lot of poking and prodding and changing stuff and still no luck with Kafka, I decided to strip it back. No Docker Compose, just Kafka. Just netcat listening on a port:
nc -lk 4242
From another terminal window I can test connectivity to the host/port:
❯ nc -z localhost 4242
Connection to localhost port 4242 [tcp/*] succeeded!
Now let us layer in ngrok, and add a route for the local netcat listener:
❯ ngrok tcp 4242
Session Status online
Account Robin Moffatt (Plan: Free)
Version 3.9.0
Region Europe (eu)
Web Interface http://127.0.0.1:4040
Forwarding tcp://0.tcp.eu.ngrok.io:13810 -> localhost:4242
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
With this, we should be able to connect with netcat on the tcp host/port given:
❯ nc -vz 0.tcp.eu.ngrok.io 13810
nc: connectx to 0.tcp.eu.ngrok.io port 13810 (tcp) failed: Connection refused
Let’s dig into the problem 🔗
One thing that I did think of early on was that I run nextDNS (similar to Pi-Hole, if you’re familiar with that) which can occassionally have unintended sideeffects when it comes to networking and resolving names. So I disabled it, and tried again:
❯ nc -vz 0.tcp.eu.ngrok.io 13810
nc: connectx to 0.tcp.eu.ngrok.io port 13810 (tcp) failed: Connection refused
If this was a cop chase movie, at this point the camera would be panning to the side street where the villian is hiding and the police car just drove by…
Dusting off my troubleshooting command line toolbox, I tried a mtr
(like traceroute
but fancier):
My traceroute [v0.95]
asgard08 (192.168.10.50) -> 0.tcp.eu.ngrok.io (81.99.162.48) 2024-05-03T13:43:42+0100
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. usgmoffattme 0.0% 4 4.2 6.3 4.2 9.0 2.2
2. 10.53.34.17 0.0% 4 20.8 18.1 16.4 20.8 2.0
3. brad-core-2a-ae41-650.network.virginmedia.net 0.0% 4 41.3 23.7 11.4 41.3 14.0
4. (waiting for reply)
5. (waiting for reply)
6. lang-dclcore-1a-port-channel1.network.virginmedia.net 0.0% 3 23.6 22.4 20.8 23.6 1.4
7. lang-sspiprxy.network.virginmedia.net 0.0% 3 23.4 24.3 23.4 25.3 0.9
What caught my eye here was the final hop—lang-sspiprxy.network.virginmedia.net
. My ISP is Virgin Media, but the only entry I’d expected to see relating to it was my own IP.
Why is traffic for 0.tcp.eu.ngrok.io
going to 81.99.162.48
(lang-sspiprxy.network.virginmedia.net
)?
Let’s check the DNS resolution:
❯ dig 0.tcp.eu.ngrok.io
; <<>> DiG 9.10.6 <<>> 0.tcp.eu.ngrok.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64676
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;0.tcp.eu.ngrok.io. IN A
;; ANSWER SECTION:
0.tcp.eu.ngrok.io. 0 IN A 81.99.162.48
;; Query time: 85 msec
;; SERVER: 192.168.10.1#53(192.168.10.1)
;; WHEN: Fri May 03 13:47:39 BST 2024
;; MSG SIZE rcvd: 62
The DNS server is my local router (192.168.10.1
) which in turn is defaulting to the ISP’s DNS servers. What about if we use a trusted public DNS like Cloudflare's?
❯ dig @1.1.1.1 0.tcp.eu.ngrok.io
; <<>> DiG 9.10.6 <<>> @1.1.1.1 0.tcp.eu.ngrok.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4513
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;0.tcp.eu.ngrok.io. IN A
;; ANSWER SECTION:
0.tcp.eu.ngrok.io. 60 IN A 18.197.239.5
;; Query time: 67 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri May 03 13:49:41 BST 2024
;; MSG SIZE rcvd: 62
Huh. 18.197.239.5
is different from 81.99.162.48
, definitely. Now that we’re in the realms of DNS, let’s switch NextDNS back on and see what happens:
❯ dig 0.tcp.eu.ngrok.io
; <<>> DiG 9.10.6 <<>> 0.tcp.eu.ngrok.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34593
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; OPT=15: 00 11 42 6c 6f 63 6b 65 64 20 62 79 20 4e 65 78 74 44 4e 53 ("..Blocked by NextDNS")
;; QUESTION SECTION:
;0.tcp.eu.ngrok.io. IN A
;; ANSWER SECTION:
0.tcp.eu.ngrok.io. 300 IN A 0.0.0.0
;; Query time: 41 msec
;; SERVER: 192.0.2.42#53(192.0.2.42)
;; WHEN: Fri May 03 13:52:06 BST 2024
;; MSG SIZE rcvd: 103
Well, just look at that:
..Blocked by NextDNS
Now the penny is starting to drop. If you’re particularly observant you’ll notice the error that I showed at the very top of this blog says Connect to ipv4#0.0.0.0:16689 failed
—and 0.0.0.0
is what the dig
above shows NextDNS resolves the ngrok hostname to.
In the logs that NextDNS provides I can see:
As well as blocking crap like ads and tracking domains, NextDNS also blocks DNS resolutions for sites that are deemed nefarious. It looks like ngrok ended up on one of these lists - probably because ngrok is sometimes abused to serve phishing websites etc. After adding *.ngrok.io
to my NextDNS Allowlist I got this:
❯ nc -vz 0.tcp.eu.ngrok.io 13810
Connection to 0.tcp.eu.ngrok.io port 13810 [tcp/*] succeeded!
Success!
So NextDNS was, in a sense, a problem of my own making. But my ISP blocking this traffic is not something I’d expected. It turns out that Virgin Media offer "Web Safe" which includes "Virus Safe" which is enabled by default. After opting out of it, the ngrok address resolved correctly for me too:
❯ dig 2.tcp.eu.ngrok.io
; <<>> DiG 9.10.6 <<>> 2.tcp.eu.ngrok.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6068
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;2.tcp.eu.ngrok.io. IN A
;; ANSWER SECTION:
2.tcp.eu.ngrok.io. 13 IN A 18.197.239.5
;; Query time: 76 msec
;; SERVER: 192.168.10.1#53(192.168.10.1)
;; WHEN: Fri May 03 14:07:49 BST 2024
;; MSG SIZE rcvd: 62
It’s all working 😅 🔗
❯ kcat -L -b 6.tcp.eu.ngrok.io:12916
Metadata for all topics (from broker 1: 6.tcp.eu.ngrok.io:12916/1):
1 brokers:
broker 1 at 6.tcp.eu.ngrok.io:12916 (controller)
0 topics:
With ngrok on the Allowlist for NextDNS, everything works great. I’ll probably leave the "Virus Safe" at my ISP switched on unless it continues to cause these kind of problems. I’m also going to switch over my router’s DNS to use Cloudflare (1.1.1.1) in the future.
Footnote: A final puzzle - why does +trace
on dig
bypass the filtering? 🔗
This is the case on both NextDNS and Virgin Media. If I put the blocks back to how they were when I started looking at this and run dig
normally, I get the blocked IP result as expected:
# NextDNS
❯ dig +short 0.tcp.eu.ngrok.io
0.0.0.0
# Virgin Media
❯ dig +short 0.tcp.eu.ngrok.io
81.99.162.48
But if I use +trace
then in amongst the detailed trace info, I get the correct ngrok IP resolution:
# NextDNS
❯ dig +short +trace 0.tcp.eu.ngrok.io
NS a.root-servers.net. from server 192.168.10.1 in 65 ms.
NS b.root-servers.net. from server 192.168.10.1 in 65 ms.
NS c.root-servers.net. from server 192.168.10.1 in 65 ms.
NS d.root-servers.net. from server 192.168.10.1 in 65 ms.
NS e.root-servers.net. from server 192.168.10.1 in 65 ms.
NS f.root-servers.net. from server 192.168.10.1 in 65 ms.
NS g.root-servers.net. from server 192.168.10.1 in 65 ms.
NS h.root-servers.net. from server 192.168.10.1 in 65 ms.
NS i.root-servers.net. from server 192.168.10.1 in 65 ms.
NS j.root-servers.net. from server 192.168.10.1 in 65 ms.
NS k.root-servers.net. from server 192.168.10.1 in 65 ms.
NS l.root-servers.net. from server 192.168.10.1 in 65 ms.
NS m.root-servers.net. from server 192.168.10.1 in 65 ms.
RRSIG NS 8 0 518400 20240516050000 20240503040000 5613 . Gz7tfgerwhD0FAUDn+c/U3b/SrOgMyWaFh+575O7DxjF+yv0hND7AsLL 1gYcf8+n0V77G0XnAOkPJVPpe5cj/75xL6L/+PsaBteVJ0p9ZrsRDV7V
c+wxa2mR5mgKy4DsAk3PjgI3KfKlzm1YIg82UWs6AFS98V9m59uHM9gK DOTLXm6q38RwaU1cSuxU+QAhxK8xjbt8cbVUjmOyE6GYilZ6Peai02r9 EljH8UM1ulBiSSl4nUo1dgoxabTSVsmV/+CmdaUN8k97alg/vAzRhFc
L YKIg/Y0nryoSZq/wUkwweFvcrr0UrMeH0f6iR5rfaxrrjPcL7E8UrNRU 9aHjHg== from server 192.168.10.1 in 65 ms.
A 18.158.249.75 from server 205.251.192.146 in 20 ms.
# Virgin Media
❯ dig +short +trace 0.tcp.eu.ngrok.io
NS a.root-servers.net. from server 192.168.10.1 in 106 ms.
NS b.root-servers.net. from server 192.168.10.1 in 106 ms.
NS c.root-servers.net. from server 192.168.10.1 in 106 ms.
NS d.root-servers.net. from server 192.168.10.1 in 106 ms.
NS e.root-servers.net. from server 192.168.10.1 in 106 ms.
NS f.root-servers.net. from server 192.168.10.1 in 106 ms.
NS g.root-servers.net. from server 192.168.10.1 in 106 ms.
NS h.root-servers.net. from server 192.168.10.1 in 106 ms.
NS i.root-servers.net. from server 192.168.10.1 in 106 ms.
NS j.root-servers.net. from server 192.168.10.1 in 106 ms.
NS k.root-servers.net. from server 192.168.10.1 in 106 ms.
NS l.root-servers.net. from server 192.168.10.1 in 106 ms.
NS m.root-servers.net. from server 192.168.10.1 in 106 ms.
RRSIG NS 8 0 518400 20240516050000 20240503040000 5613 . Gz7tfgerwhD0FAUDn+c/U3b/SrOgMyWaFh+575O7DxjF+yv0hND7AsLL 1gYcf8+n0V77G0XnAOkPJVPpe5cj/75xL6L/+PsaBteVJ0p9ZrsRDV7V
c+wxa2mR5mgKy4DsAk3PjgI3KfKlzm1YIg82UWs6AFS98V9m59uHM9gK DOTLXm6q38RwaU1cSuxU+QAhxK8xjbt8cbVUjmOyE6GYilZ6Peai02r9 EljH8UM1ulBiSSl4nUo1dgoxabTSVsmV/+CmdaUN8k97alg/vAzRhFc
L YKIg/Y0nryoSZq/wUkwweFvcrr0UrMeH0f6iR5rfaxrrjPcL7E8UrNRU 9aHjHg== from server 192.168.10.1 in 106 ms.
A 18.158.249.75 from server 205.251.192.146 in 19 ms.
I’d love to hear from you if you can explain what’s happening with this :)