New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Npcap 0.9995 leaks NonPagedPool, causes system hang #213
Comments
My current plan is to redefine the allocation tags to be more unique so we can better understand which part is leaking. I have already added additional assertions and will be making a debug build to hopefully catch an inconsistency earlier. Other changes have already been put in place such as a memory barrier to avoid possible concurrency issues due to instruction reordering (cc788c9). I will also be testing on a multi-core system to better exercise the concurrency controls, since prior to this our testing has been done on single-core virtual machines. |
Private report of pool tracking shows we are looking at case 4. |
Unsure if this is the underlying case, but it certainly looks suspicious. This allocation is always accompanied by an MDL allocation, but only the buffer seems to be leaking. If under low-resources conditions we fail to map the system address for this buffer, we are unable to free it, which could lead to a runaway out-of-resources condition. |
For packets less than 256B, NDIS's NetBufferPool is efficient enough. For larger ones, we need to allocate more buffer space and MDLs, and we were freeing those as soon as we were done; very inefficient. Instead, we'll put them on a stack and pull them back off when we need them. A separate thread will clear the stack when all capture handles are closed. Related to #213 because by avoiding frequent small (257 to 1258 bytes) allocations we reduce the chance of heap fragmentation. Also, this changes pretty much all the code implicated in that issue, so if it persists it's something new.
This issue was partially addressed in Npcap 0.9996 and completely resolved in Npcap 0.9997. |
The memory tag "4321" (represented in code as
'1234'
, a.k.a.NPF_ALLOC_TAG
) is leaking about 5% of allocations, which leads to an eventual out-of-memory system hang in high-traffic or long-running captures. We have had one private report, and it is also mentioned in this comment: #206 (comment)The memory tag in question is used in 4 places:
NPCAP_FILTER_MODULE::PacketPool
, which is used for packet injection and loopback capture.NPCAP_FILTER_MODULE::TapNBPool
, which is used to copy captured packet data.NPF_OidRequest()
LWF callback.TapNBPool
(NPF_NBCOPY_INITIAL_DATA_SIZE
, currently 256 bytes).Code for cases 1 and 3 have not changed in recent releases, so they can most likely be ruled out. Allocations for 2 and 4 are both freed in
NPF_FreeNBCopies()
, which is a callback invoked fromNPF_ObjectPoolReturn()
, and which is implicated in the crash reported in #212, which currently looks like a double-free. This function is called based on a reference-counting system that is intended to be safe from concurrency problems, but obviously something is wrong.The text was updated successfully, but these errors were encountered: