[Bug] 每隔一段时间服务器对所有涉及trpc的请求都无响应, 直到客户端超时 #6775
Replies: 14 comments
-
Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
Anyone could help?🥺 |
Beta Was this translation helpful? Give feedback.
-
server logs
|
Beta Was this translation helpful? Give feedback.
-
看下 AI 给出的回复是否有帮助: 问题概述用户在自托管 Docker 环境下运行 LobeChat (版本 1.66.6),遇到以下问题:
可能的原因分析1. 内存泄漏问题最有可能的原因是 Node.js 应用中存在内存泄漏。这与超时设置有关的内存泄漏特别符合问题描述:
Node.js 中的 2. 连接池耗尽tRPC 服务可能使用了数据库连接池或其他资源池,这些池在一段时间后被耗尽:
3. Docker 资源限制在 Docker 环境中,如果没有适当设置资源限制,容器可能会遇到资源瓶颈:
4. 数据库性能问题使用
5. Nginx 代理配置问题从问题描述看出使用了 Nginx 作为 API 网关:
建议解决方案
这个问题很可能是由内存泄漏引起的,特别是与 tRPC 请求处理和超时机制相关的泄漏。建议优先检查内存使用情况并监控服务运行时的资源消耗。 |
Beta Was this translation helpful? Give feedback.
-
See if the responses given by AI are helpful: Based on the content of GitHub issue #6663 and related search results, I will analyze the possible reasons for the tRPC request timeout error in LobeChat: Problem OverviewThe user runs LobeChat (version 1.66.6) in a self-hosted Docker environment and encounters the following problems:
Possible Cause Analysis1. Memory Leak IssuesThe most likely reason is that there is a memory leak in the Node.js application. This memory leak related to the timeout setting is particularly consistent with the problem description:
2. Connection pool exhaustedThe tRPC service may use a database connection pool or other resource pool that is exhausted after a period of time:
3. Docker resource limitationsIn a Docker environment, containers may experience resource bottlenecks if resource limits are not set properly:
4. Database performance issuesUsing the
5. Nginx proxy configuration issuesFrom the problem description, it is shown that Nginx is used as the API gateway:
Suggested Solutions
This problem is likely caused by memory leaks, especially those related to tRPC request handling and timeout mechanisms. It is recommended to prioritize checking memory usage and monitoring resource consumption when the service is running. |
Beta Was this translation helpful? Give feedback.
-
nginx 会等到超时, 然后返回499
|
Beta Was this translation helpful? Give feedback.
-
如何检查数据库连接池, 用的postgres |
Beta Was this translation helpful? Give feedback.
-
How to check database connection pool, use postgres |
Beta Was this translation helpful? Give feedback.
-
docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
789fede746be lobe-chat.1.z8vtwzxkzb628qc8b3og1cx1z 0.00% 426.4MiB / 15.58GiB 2.67% 11.8MB / 23.1MB 0B / 5.58MB 22
ba57a5e0ce96 nginx.1.oarwyfwic4q9kc97mrloozs4b 0.00% 9.406MiB / 15.58GiB 0.06% 211MB / 218MB 2.13MB / 5.8MB 5
8733938d62d3 kratos-ui-node.1.3njbxa906rmx8z5rbzj39vhkc 0.00% 65.44MiB / 15.58GiB 0.41% 7.4MB / 1.3MB 5.77MB / 2.29MB 22 |
Beta Was this translation helpful? Give feedback.
-
升级到最新版本, 此问题依然存在 docker ps 1 ↵
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
789fede746be lobehub/lobe-chat-database:1.68.5 "/bin/node /app/star…" 7 hours ago Up 7 hours 3210/tcp lobe-chat.1.z8vtwzxkzb628qc8b3og1cx1z |
Beta Was this translation helpful? Give feedback.
-
postgres stats
|
Beta Was this translation helpful? Give feedback.
-
docker container logs
|
Beta Was this translation helpful? Give feedback.
-
@sideef5ect 看这个报错是 auth 没连上吧?感觉你得检查下连接性的问题? |
Beta Was this translation helpful? Give feedback.
-
@sideef5ect See this error is that auth is not connected, right? I feel like you have to check the connectivity problem? |
Beta Was this translation helpful? Give feedback.
-
📦 Platform
Self hosting Docker
📦 Deploymenet mode
server db(lobe-chat-database image)
📌 Version
1.66.4
💻 Operating System
Other Linux, Ubuntu
🌐 Browser
Chrome
🐛 Bug Description
服务器运行大约30分钟后,TRPC请求持续失败并出现超时错误(HTTP 524)。服务器重启后会暂时恢复正常,但在类似时长后再次遇到相同问题。
如果在Chrome开发者工具中使用GET请求,服务器无响应,等待直到超时,然后返回524响应。
All trpc requests failed with timeout 524 responses.
The server could work after restart, but only for a while like 30 mins, and then the trpc can not work any more.
📷 Recurrence Steps
TRPC requests to the server consistently fail with a timeout error (HTTP 524) after the server has been running for approximately 30 minutes. The server resumes normal operation temporarily after a restart but encounters the same issue again after a similar duration.
if use the GET request from chrome dev tools, the server will freeze, wait until timeout, and give a 524 response
if use POST, it server could immediately return error.
curl -X POST "http://lobe-chat:3210/trpc/lambda/agent.getAgentConfig,aiProvider.getAiProviderRuntimeState,user.getUserState"
🚦 Expected Behavior
no error, or return error response immediately
📝 Additional Information
api gateway: nginx
upgrade to latest lobe-chat-database images seems still suffer from the same issue
Beta Was this translation helpful? Give feedback.
All reactions