Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature discussion: Keep capture handles open when adapter is removed and reattach them when it is restored #506

Closed
dmiller-nmap opened this issue Jun 7, 2021 · 3 comments

Comments

@dmiller-nmap
Copy link
Contributor

As discussed in #250, Npcap currently invalidates any open capture handles when NDIS detaches it from the network adapter. This can happen when the adapter is removed or when certain stack maintenance activity takes place.

We would like to consider the possibility of keeping a capture handle open in order to resume capture as soon as NDIS attaches us to the adapter again. This would probably require a user-configurable timeout.

@dmiller-nmap
Copy link
Contributor Author

Alternatively, if user software (i.e. libpcap) can tolerate the error during any read/write operations, we could leave it up to the user software to close the handle if it's been too long. That's probably best and easiest.

@dmiller-nmap
Copy link
Contributor Author

This is implemented in Npcap 1.60, from the driver side: the interruption (removal, rebinding, etc.) will cause the same errors, but if the interface comes back before the user program closes the handle, it will be reattached and can pick up capturing where it left off. Libpcap (wpcap.dll) does not have a separate error return value for this condition, however, so the best that user code can do for now is to sleep a moment and try again with pcap_next_ex(), pcap_loop(), or pcap_dispatch(). If you get errors twice in a row, it's probably not this recoverable error.

@guyharris - any ideas for making this easier to use?

@guyharris
Copy link
Contributor

guyharris commented Dec 14, 2021

Libpcap (wpcap.dll) does not have a separate error return value for this condition, however, so the best that user code can do for now is to sleep a moment and try again with pcap_next_ex(), pcap_loop(), or pcap_dispatch(). If you get errors twice in a row, it's probably not this recoverable error.

The way Linux handles removed devices is... suboptimal. To quote the comment in pcap-linux.c:

	/*
	 * Keep polling until we either get some packets to read, see
	 * that we got told to break out of the loop, get a fatal error,
	 * or discover that the device went away.
	 *
	 * In non-blocking mode, we must still do one poll() to catch
	 * any pending error indications, but the poll() has a timeout
	 * of 0, so that it doesn't block, and we quit after that one
	 * poll().
	 *
	 * If we've seen an ENETDOWN, it might be the first indication
	 * that the device went away, or it might just be that it was
	 * configured down.  Unfortunately, there's no guarantee that
	 * the device has actually been removed as an interface, because:
	 *
	 * 1) if, as appears to be the case at least some of the time,
	 * the PF_PACKET socket code first gets a NETDEV_DOWN indication
	 * for the device and then gets a NETDEV_UNREGISTER indication
	 * for it, the first indication will cause a wakeup with ENETDOWN
	 * but won't set the packet socket's field for the interface index
	 * to -1, and the second indication won't cause a wakeup (because
	 * the first indication also caused the protocol hook to be
	 * unregistered) but will set the packet socket's field for the
	 * interface index to -1;
	 *
	 * 2) even if just a NETDEV_UNREGISTER indication is registered,
	 * the packet socket's field for the interface index only gets
	 * set to -1 after the wakeup, so there's a small but non-zero
	 * risk that a thread blocked waiting for the wakeup will get
	 * to the "fetch the socket name" code before the interface index
	 * gets set to -1, so it'll get the old interface index.
	 *
	 * Therefore, if we got an ENETDOWN and haven't seen a packet
	 * since then, we assume that we might be waiting for the interface
	 * to disappear, and poll with a timeout to try again in a short
	 * period of time.  If we *do* see a packet, the interface has
	 * come back up again, and is *definitely* still there, so we
	 * don't need to poll.
	 */

This sounds a bit similar. What we could do is:

  • if we see the error, we set a "we saw the error" flag, temporarily set the read timeout to some appropriate short interval ("a moment"), and act as if we didn't see any packets;
  • if a read completes and the "we saw the error flag" is set:
    • if the read succeeded, we clear that flag, restore the previous read timeout, and provide the packets;
    • if the read failed with that error, we treat that as a hard error;
    • if the read failed with another error, we return that error.

I'd have to think about whether that would work in nonblocking mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants