Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrade from v2.66.2 to 2.104.0 some POST calls fail when REPLICA_DATABASE_URLS is set #3681

Closed
1 of 4 tasks
gamer22026 opened this issue Mar 26, 2024 · 7 comments · Fixed by #3771
Closed
1 of 4 tasks
Assignees
Labels
api Issue related to the REST API bug Something isn't working

Comments

@gamer22026
Copy link

gamer22026 commented Mar 26, 2024

How are you running Flagsmith

  • Self Hosted with Docker
  • Self Hosted with Kubernetes
  • SaaS at flagsmith.com
  • Some other way (add details in description below)

Describe the bug

We use Aurora RDS/Postgres with one writer and one reader. We've been using REPLICA_DATABASE_URLS to split reads/writes between these 2 instances. After upgrading from v2.66.2 to v2.104.0, we've seen failing POSTS (both directly through the api and also the UI) that throw a 500 error. Things like creating feature, create segment override. When we don't set REPLICA_DATABASE_URLS, these all work just fine. Seems like some of the POSTS are not routing to the write instance.

Steps To Reproduce

Set REPLICA_DATABASE_URLS
Create environment
Try to create feature fails with 500 error

Expected behavior

Set REPLICA_DATABASE_URLS
Create environment
Try to create feature succeeds

Screenshots

No response

@gamer22026 gamer22026 added the bug Something isn't working label Mar 26, 2024
@gamer22026
Copy link
Author

When the error happens the api server logs:

{
  "levelname": "ERROR",
  "message": "Internal Server Error: /api/v1/projects/3/features/",
  "timestamp": "2024-03-26 17:21:06",
  "logger_name": "django.request",
  "process_id": 9,
  "thread_name": "ThreadPoolExecutor-0_0"
}

There are no tracebacks in the logs corresponding to the times these internal server errors occur

@dabeeeenster dabeeeenster added the api Issue related to the REST API label Mar 26, 2024
@gamer22026
Copy link
Author

gamer22026 commented Mar 26, 2024

Was able to recreate this on v2.104.0 using a brand new blank DB (Aurora RDS/Postgres15) running 1 writer and 1 reader. Initially did not add REPLICA_DATABASE_URLS and was able to create a feature. Then added REPLICA_DATABASE_URLS and again received the Internal Server Error (and no tracebacks recorded). So appears to unrelated to any DB migrations from older flagsmith versions

@gamer22026
Copy link
Author

The env vars I'm setting in the flagsmith-api

       env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              key: DATABASE_URL
              name: flagsmith
        - name: REPLICA_DATABASE_URLS
          valueFrom:
            secretKeyRef:
              key: REPLICA_DATABASE_URLS
              name: flagsmith
        - name: DJANGO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: DJANGO_SECRET_KEY
              name: flagsmith
        - name: INFLUXDB_URL
          value: http://flagsmith-influxdb2.flagsmith.svc.cluster.local:80
        - name: INFLUXDB_BUCKET
          value: default
        - name: INFLUXDB_ORG
          value: influxdata
        - name: INFLUXDB_TOKEN
          valueFrom:
            secretKeyRef:
              key: admin-token
              name: flagsmith-influxdb2-auth
        - name: DJANGO_ALLOWED_HOSTS
          value: '*'
        - name: ADMIN_EMAIL
          value: [email protected]
        - name: EMAIL_BACKEND
          value: django.core.mail.backends.smtp.EmailBackend
        - name: EMAIL_HOST
          value: some.email.host
        - name: EMAIL_PORT
          value: "25"
        - name: EMAIL_USE_TLS
          value: "false"
        - name: ACCESS_LOG_LOCATION
          value: /dev/stdout
        - name: GUNICORN_KEEP_ALIVE
          value: "30"
        - name: ENABLE_ADMIN_ACCESS_USER_PASS
          value: "true"
        - name: ENABLE_TELEMETRY
          value: "false"
        - name: LOG_LEVEL
          value: DEBUG
        - name: SENDER_EMAIL
          value: [email protected]
        - name: TELEMETRY_API_URI
          value: http:/localhost:8000
        - name: LOG_FORMAT
          value: json

@matthewelwell matthewelwell assigned zachaysan and unassigned khvn26 Mar 26, 2024
@gamer22026
Copy link
Author

As a test I actually set DATABASE_URL and REPLICA_DATABASE_URLS to both point to the same Aurora RDS writer instance (taking the reader instance out of the picture) and still get a 500 error from api server when trying to create a feature. So it seems just the fact that the REPLICA_DATABASE_URLS exists at all breaks it. Also, for point of reference, other things create just fine (orgs, projects, environments). Can also create a segment if a feature is already existing. But can't create new feature or if a feature exists a feature segment override. Both throw 500 from api server.

@zachaysan
Copy link
Contributor

As a test I actually set DATABASE_URL and REPLICA_DATABASE_URLS to both point to the same Aurora RDS writer instance (taking the reader instance out of the picture) and still get a 500 error ...

Yeah that was my test yesterday as well. I followed the stack trace and it looks like the issue is pretty deep down in the internals of Django. An instance that was supposed to be present raised a DoesNotExist error when is attempting access.

@joshuabalduff
Copy link

joshuabalduff commented Apr 9, 2024

Yeah, I am running into the same error, glad I found this. I can confirm once I remove the replica variable I am able to create a feature.

@matthewelwell
Copy link
Contributor

@zachaysan and I have narrowed this issue down to this commit. More investigation needed to understand why...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issue related to the REST API bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants