Fix 404 Errors After Pangolin Update: A Traefik Troubleshooting Guide
Hey guys,
I'm encountering a frustrating issue after updating Pangolin from version 1.4.0 to 1.8.0 – all my resources are returning 404 errors. I've been digging through the logs, and the only clue I've found is this message from Traefik (version 1.4.5):
ERR Provider error, retrying in 664.618717ms error="cannot decode configuration data: field not found, node: sticky" providerName=http
To give you a clearer picture, I've included my configuration files below. Any help or insights would be greatly appreciated!
Docker Compose Configuration
Here's my docker-compose.yml
file:
name: pangolin
services:
pangolin:
image: fosrl/pangolin:1.8.0
container_name: pangolin
restart: unless-stopped
volumes:
- ./config:/app/config
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3001/api/v1/"]
interval: "3s"
timeout: "3s"
retries: 15
gerbil:
image: fosrl/gerbil:1.1.0
container_name: gerbil
restart: unless-stopped
depends_on:
pangolin:
condition: service_healthy
command:
- --reachableAt=http://gerbil:3003
- --generateAndSaveKeyTo=/var/config/key
- --remoteConfig=http://pangolin:3001/api/v1/gerbil/get-config
- --reportBandwidthTo=http://pangolin:3001/api/v1/gerbil/receive-bandwidth
volumes:
- ./config/:/var/config
cap_add:
- NET_ADMIN
- SYS_MODULE
ports:
- 51820:51820/udp
- 21820:21820/udp
- 443:443 # Port for traefik because of the network_mode
- 80:80 # Port for traefik because of the network_mode
- 25565:25565
- 25566:25566
traefik:
image: traefik:v3.4.5
container_name: traefik
restart: unless-stopped
network_mode: service:gerbil # Ports appear on the gerbil service
depends_on:
pangolin:
condition: service_healthy
command:
- --configFile=/etc/traefik/traefik_config.yml
volumes:
- ./config/traefik:/etc/traefik:ro # Volume to store the Traefik configuration
- ./config/letsencrypt:/letsencrypt # Volume to store the Let's Encrypt certificates
networks:
default:
driver: bridge
name: pangolin
enable_ipv6: true
Key Takeaways from the Docker Compose File
- We are using Pangolin 1.8.0, Gerbil 1.1.0, and Traefik 3.4.5.
- Pangolin's configuration is mounted from the
./config
directory. - Gerbil depends on Pangolin being healthy and has several command-line arguments for configuration, including remote config and bandwidth reporting.
- Traefik is running in
network_mode: service:gerbil
, which means it shares the network namespace with the Gerbil container. This is a crucial detail for understanding how Traefik routes traffic. - Traefik's configuration is loaded from
/etc/traefik/traefik_config.yml
, and Let's Encrypt certificates are stored in/letsencrypt
.
Pangolin Configuration (config.yml)
Here's the content of my pangolin config.yml
file:
app:
dashboard_url: https://dash.url.com
log_level: info
save_logs: false
domains:
domain1:
base_domain: url.com
cert_resolver: letsencrypt
domain2:
base_domain: url2.com
cert_resolver: letsencrypt
domain3:
base_domain: url3.com
cert_resolver: letsencrypt
server:
external_port: 3000
internal_port: 3001
next_port: 3002
internal_hostname: pangolin
session_cookie_name: p_session_token
resource_access_token_param: p_token
resource_access_token_headers:
id: P-Access-Token-Id
token: P-Access-Token
resource_session_request_param: p_session_request
secret: top secret ;)
cors:
origins:
- https://dash.url.com
methods:
- GET
- POST
- PUT
- DELETE
- PATCH
headers:
- X-CSRF-Token
- Content-Type
credentials: false
traefik:
cert_resolver: letsencrypt
http_entrypoint: web
https_entrypoint: websecure
gerbil:
start_port: 51820
base_endpoint: dash.url.com
use_subdomain: false
block_size: 24
site_block_size: 30
subnet_group: 100.89.137.0/20
rate_limits:
global:
window_minutes: 1
max_requests: 500
users:
server_admin:
email: [email protected]
password: secretpassword
flags:
require_email_verification: false
disable_signup_without_invite: true
disable_user_create_org: true
allow_raw_resources: true
allow_base_domain_resources: true
email:
smtp_host: redacted
smtp_port: 587
smtp_user: redacted
smtp_pass: redacted
Delving Deeper into the Pangolin Configuration
- This
config.yml
file is the heart of your Pangolin setup. It defines everything from domain configurations to server settings and user access. - The
domains
section specifies the base domains and certificate resolvers for your applications. Make sure these are correctly configured and match your actual domain names. - The
server
section defines the ports used by Pangolin internally and externally, as well as settings for session cookies, resource access tokens, and CORS (Cross-Origin Resource Sharing). - The
traefik
section is particularly important. It tells Pangolin which certificate resolver to use and which entry points (web and websecure) Traefik should use for HTTP and HTTPS traffic. - The
gerbil
section configures the Gerbil service, including the starting port, base endpoint, and subnet group. These settings are crucial for the VPN functionality. - Rate limits are set globally to prevent abuse, limiting requests to 500 per minute.
- User settings and flags control user registration, email verification, and other features.
- The
email
section contains sensitive information for SMTP configuration, which is redacted in the provided file.
Traefik Dynamic Configuration
Here's my traefik dynamic config
(dynamic_config.yml):
http:
middlewares:
redirect-to-https:
redirectScheme:
scheme: https
routers:
# HTTP to HTTPS redirect router
main-app-router-redirect:
rule: "Host(`dash.url.com`)"
service: next-service
entryPoints:
- web
middlewares:
- redirect-to-https
# Next.js router (handles everything except API and WebSocket paths)
next-router:
rule: "Host(`dash.url.com`) && !PathPrefix(`/api/v1`)"
service: next-service
entryPoints:
- websecure
tls:
certResolver: letsencrypt
# API router (handles /api/v1 paths)
api-router:
rule: "Host(`dash.url.com`) && PathPrefix(`/api/v1`)"
service: api-service
entryPoints:
- websecure
tls:
certResolver: letsencrypt
# WebSocket router
ws-router:
rule: "Host(`dash.url.com`)"
service: api-service
entryPoints:
- websecure
tls:
certResolver: letsencrypt
services:
next-service:
loadBalancer:
servers:
- url: "http://pangolin:3002" # Next.js server
api-service:
loadBalancer:
servers:
- url: "http://pangolin:3000" # API/WebSocket server
Dissecting the Traefik Dynamic Configuration
- This file is where you define your Traefik routes, middlewares, and services. It's what tells Traefik how to handle incoming traffic.
- The
middlewares
section defines aredirect-to-https
middleware, which is used to redirect HTTP traffic to HTTPS. - The
routers
section defines several routers for different purposes:main-app-router-redirect
: Redirects HTTP traffic fordash.url.com
to HTTPS.next-router
: Handles traffic fordash.url.com
(excluding/api/v1
) and routes it to thenext-service
. This is likely for your Next.js frontend.api-router
: Handles traffic fordash.url.com
and/api/v1
, routing it to theapi-service
. This is for your API endpoints.ws-router
: Handles WebSocket traffic fordash.url.com
and routes it to theapi-service
.
- The
services
section defines the backend services:next-service
: Points tohttp://pangolin:3002
, which is likely your Next.js server.api-service
: Points tohttp://pangolin:3000
, which is likely your API and WebSocket server.
Traefik Configuration (traefik_config.yml)
Finally, here's the content of my main traefik config
(traefik_config.yml):
api:
insecure: true
dashboard: true
providers:
http:
endpoint: "http://pangolin:3001/api/v1/traefik-config"
pollInterval: "5s"
file:
filename: "/etc/traefik/dynamic_config.yml"
experimental:
plugins:
badger:
moduleName: "github.com/fosrl/badger"
version: "v1.2.0"
log:
level: "INFO"
format: "common"
certificatesResolvers:
letsencrypt:
acme:
httpChallenge:
entryPoint: web
email: "[email protected]"
storage: "/letsencrypt/acme.json"
caServer: "https://acme-v02.api.letsencrypt.org/directory"
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
tcp-25565:
address: ":25565/tcp"
tcp-25566:
address: ":25566/tcp"
transport:
respondingTimeouts:
readTimeout: "30m"
http:
tls:
certResolver: "letsencrypt"
serversTransport:
insecureSkipVerify: true
Unraveling the Core Traefik Configuration
- This is Traefik's main configuration file, defining the global settings and providers.
- The
api
section enables the Traefik dashboard in insecure mode (insecure: true
). It's crucial to understand the security implications of this in a production environment. Exposing the dashboard without proper authentication can be a major security risk. - The
providers
section is key. It tells Traefik where to get its configuration:http
: This provider dynamically fetches configuration from Pangolin's API endpoint (http://pangolin:3001/api/v1/traefik-config
).file
: This provider loads configuration from thedynamic_config.yml
file.
- The
experimental
section enables thebadger
plugin, likely a custom plugin for Pangolin. - The
log
section configures Traefik's logging level and format. - The
certificatesResolvers
section configures Let's Encrypt for automatic certificate generation and renewal. It uses the HTTP challenge and stores certificates in/letsencrypt/acme.json
. - The
entryPoints
section defines the entry points for incoming traffic: web (port 80), websecure (port 443), and two TCP entry points (25565 and 25566). The TCP entry points also have TLS configured using Let's Encrypt. - The
serversTransport
section disables TLS verification, which is generally not recommended for production environments.
Initial Troubleshooting Steps & Potential Causes
Okay, so we have a lot of config to unpack here, but that's good! The more info, the better we can troubleshoot. Based on the error message and your configurations, here's my initial thoughts:
- Traefik's "cannot decode configuration data: field not found, node: sticky" error: This is the most important clue. It suggests that Traefik is trying to read a configuration setting (
sticky
) that either doesn't exist or is in the wrong format. This could be due to a breaking change in the Traefik configuration schema between versions or an issue with how Pangolin is generating the dynamic Traefik configuration. - Traefik Version Incompatibility: You mentioned you're using Traefik 3.4.5, but the error message suggests that it might be expecting configurations from an older version. Pangolin updates could introduce configurations not compatible with your Traefik version.
- Dynamic Configuration Issues: Pangolin dynamically generates Traefik configuration. There might be a bug in how Pangolin generates the configuration for your setup, especially after the update.
- File Provider Conflicts: You're using both the HTTP provider (to fetch config from Pangolin) and the File provider (to load
dynamic_config.yml
). There could be conflicts or precedence issues between these providers. - Caching/Stale Configuration: Traefik might be caching an old configuration, especially if the dynamic configuration isn't updating correctly.
Next Steps: Let's Get This Fixed!
Here are some concrete steps we can take to try and resolve this:
-
Check Traefik Compatibility: Pangolin 1.8.0 might require a newer version of Traefik. Double-check the Pangolin documentation for compatibility information. If necessary, try upgrading Traefik to the latest stable version (but be sure to test in a non-production environment first!).
-
Inspect the Dynamic Configuration: The most crucial step is to see what configuration Pangolin is actually generating for Traefik. You can access the output of the HTTP provider by making a request to
http://pangolin:3001/api/v1/traefik-config
from within your Docker network. Save this output and carefully examine it for errors, missing sections, or unexpected values. Look for anything related tosticky
or other load-balancing configurations. -
Simplify Traefik Configuration: To rule out conflicts between providers, try disabling the File provider temporarily. Comment out the
file
section in yourtraefik_config.yml
:# providers: # file: # filename: "/etc/traefik/dynamic_config.yml"
Then, restart Traefik and see if the 404 errors persist. If the issue is resolved, it points to a conflict between the dynamic configuration and your
dynamic_config.yml
file. -
Clear Traefik Cache (if applicable): Traefik might have a cached version of the configuration. While Traefik generally handles updates gracefully, sometimes clearing the cache can help. There isn't a direct way to "clear the cache," but restarting Traefik effectively clears its in-memory configuration. If you're using any persistent storage for Traefik's configuration (besides Let's Encrypt), you might need to investigate how to clear its cache.
-
Review Pangolin Logs: Check the Pangolin logs for any errors or warnings related to Traefik configuration generation. There might be clues there about why the configuration is invalid.
-
Check Network Connectivity: Although it's less likely, ensure that Traefik can actually reach Pangolin on
http://pangolin:3001
. You can trydocker exec -it <traefik_container_id> curl http://pangolin:3001/api/v1/
to test connectivity from within the Traefik container. -
Temporary Static Configuration: Create a very basic static Traefik configuration (using the
dynamic_config.yml
file) that simply routes all traffic to Pangolin’s port 3000 or 3002. This will help determine whether the issue lies within the dynamic configuration generation or with basic connectivity and routing.For example, your
dynamic_config.yml
might look like this:
http:
routers:
catchall:
rule: "HostRegexp(`{host:.+}`)"
service: pangolin-service
entryPoints: [web, websecure]
services:
pangolin-service:
loadBalancer:
servers:
- url: "http://pangolin:3000"
Remember to restart Traefik after making any configuration changes.
Sharing is Caring: Let's Collaborate!
To help me (and others in the community) assist you better, please share the following after you've tried these steps:
- The output of
http://pangolin:3001/api/v1/traefik-config
(redact any sensitive information, of course!). - Any relevant Pangolin logs.
- The results of each troubleshooting step you've tried.
Let's work through this together! 404 errors can be a pain, but with a systematic approach, we can usually find the culprit. Good luck, and keep us updated!