-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router sometimes slows down badly #1138
Labels
Comments
snarfed
added a commit
that referenced
this issue
Jun 25, 2024
Haven't seen this since we went to four cores, which pretty much confirms it was CPU. Closing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The router sometimes gets into a bad state where it takes forever to handle
/queue/receive
requests, eg 30-75s when they should average .5-2s or so. I don't understand what's going on here yet, or why this only happens sometimes.It seems maybe loosely related to the number of WSGI workers and threads per worker, ie it seems worse with one worker w/100 threads, better with five workers w/10 threads each, but only somewhat, and I'm not 100% sure of the correlation.
Maybe it's context switching overhead between threads? But the slowdown seems way too drastic to be caused by that alone. Another theory is that the thread pool gets stuck on tasks that need HTTP requests to external servers that are down or very slow, and either our per-request timeout is too long, or it's ok but we attempt a lot of different outbound requests per task, and so these tasks starve other tasks. That theory feels unsatisfying too, but I don't have any other theories yet. Hrmph.
The text was updated successfully, but these errors were encountered: