At work, I noticed a Python job that was causing a large overload condition. The job requested 16 CPU cores, and was running gensim.models.ldamulticore requesting 15 workers. However, the load on that server indicated an overload of many times over. This is despite a cgroups cpuset restricting the number of cores for that process to 16.
It turns out, gensim.models.ldamulticore uses Python multiprocessing. That module decides how many threads to run based on the number of CPU cores read directly from /proc/cpuinfo. This completely bypasses the limitations imposed by cgroups.
There is currently an open enhancement request to add a new function to multiprocessing for requesting the number of usable CPU cores rather than the total number of CPU cores.