Now that the current timestamp is saved in a variable lets add the
ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the default selected timestamp API change we have to change also the
timestamp return by ethtool. This patch return now the current selected
timestamp.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the API to select MAC default time stamping instead of the PHY.
Indeed the PHY is closer to the wire therefore theoretically it has less
delay than the MAC timestamping but the reality is different. Due to lower
time stamping clock frequency, latency in the MDIO bus and no PHC hardware
synchronization between different PHY, the PHY PTP is often less precise
than the MAC. The exception is for PHY designed specially for PTP case but
these devices are not very widespread. For not breaking the compatibility I
introduce a default_timestamp flag in phy_device that is set by the phy
driver to know we are using the old API behavior.
The phy_set_timestamp function is called at each call of phy_attach_direct.
In case of MAC driver using phylink this function is called when the
interface is turned up. Then if the interface goes down and up again the
last choice of timestamp will be overwritten by the default choice.
A solution could be to cache the timestamp status but it can bring other
issues. In case of SFP, if we change the module, it doesn't make sense to
blindly re-set the timestamp back to PHY, if the new module has a PHY with
mediocre timestamping capabilities.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace hwtstamp_source which is only used by the kernel_hwtstamp_config
structure by the more widely use timestamp_layer structure. This is done
to prepare the support of selectable timestamping source.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new commands allowing to list available time stamping layers on a
netdevice's link.
Example usage :
./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema \
--do ts-list-get \
--json '{"header":{"dev-name":"eth0"}}'
{'header': {'dev-index': 3, 'dev-name': 'eth0'},
'ts-list-layer': b'\x01\x00\x00\x00\x05\x00\x00\x00'}
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new netlink message that lists all available time stamping
layers on a given interface.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new commands allowing to get the current time stamping on a
netdevice's link.
Example usage :
./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-get \
--json '{"header":{"dev-name":"eth0"}}'
{'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-layer': 1}
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time stamping on network packets may happen either in the MAC or in
the PHY, but not both. In preparation for making the choice
selectable, expose both the current layers via ethtool.
In accordance with the kernel implementation as it stands, the current
layer will always read as "phy" when a PHY time stamping device is
present. Future patches will allow changing the current layer
administratively.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timestamping software or hardware flags are often used as a group,
therefore adding these masks will easier future use.
I did not use SOF_TIMESTAMPING_SYS_HARDWARE flag as it is deprecated and
not use at all.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of no phc we should not return SOFTWARE TIMESTAMPING flags as we do
not know whether the netdev supports of timestamping.
Remove it from the lan8841_ts_info and simply return 0.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the dev_set_hwtstamp_phylib function accessible in prevision to use
it from ethtool to reset the tstamp current configuration.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware timestamping through ndo_eth_ioctl() is going away.
Convert the macb driver to the new API before that can be removed.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__phy_hwtstamp_set function were calling the phy_mii_ioctl function
which will then use the ifreq pointer to call the hwtstamp callback.
Now that ifreq has been removed from the hwstamp callback parameters
it seems more logical to not go through the phy_mii_ioctl function and pass
directly kernel_hwtstamp_config parameter to the hwtstamp callback.
Lets do the same for __phy_hwtstamp_get function and return directly
EOPNOTSUPP as SIOCGHWTSTAMP is not supported for now for the PHYs.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHYs hwtstamp callback are still getting the timestamp config from
ifreq and using copy_from/to_user.
Get rid of these functions by using timestamp configuration in parameter.
This also allow to move on to kernel_hwtstamp_config and be similar to
net devices using the new ndo_hwstamp_get/set.
This adds the possibility to manipulate the timestamp configuration
from the kernel which was not possible with the copy_from/to_user.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is suboptimal to attempt skb linearization from ndo_start_xmit()
if a gso skb has pathological layout, or if host stack does not have
access to the payload (TCP direct). Linearization of large skbs
can also fail under memory pressure.
We should instead have an ndo_features_check() so that we can
fallback to GSO, which is supported even for TCP direct,
and generally much more efficient (no payload copy).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Bailey Forrest <bcf@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Praveen Kaligineedi <pkaligineedi@google.com>
Cc: Shailend Chand <shailend@google.com>
Cc: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BCM54612E ethernet PHY supports IDDQ-SR.
Therefore wire-up the suspend and resume callbacks
to point to bcm54xx_suspend() and bcm54xx_resume().
Signed-off-by: Marco von Rosenberg <marcovr@selfnet.de>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of TLV_SPACE() is to add the TLV descriptor size to the size of
the TLV value passed as argument and align the resulting size to
TLV_ALIGNTO.
tipc_tlv_alloc() calls TLV_SPACE() on its argument. In other words,
tipc_tlv_alloc() takes its argument as the size of the TLV value. So the
call to TLV_SPACE() in tipc_get_err_tlv() is redundant. Let's remove this
redundancy.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Gigabit Ethernet IP block on the RZ/Five SoC is identical to one
found on the RZ/G2UL SoC. "renesas,r9a07g043-gbeth" compatible string
will be used on the RZ/Five SoC so to make this clear and to keep this
file consistent, update the comment to include RZ/Five SoC.
No driver changes are required as generic compatible string
"renesas,rzg2l-gbeth" will be used as a fallback on RZ/Five SoC.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
net: Add linkmode_fill, use linkmode_*() in phylink/sfp code
This small series adds a linkmode_fill() op, and uses it in phylink.
The SFP code is also converted to use linkmode_*() ops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the linkmode_*() helpers rather than open coding the calls to the
bitmap operators.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a linkmode_fill() helper, which will allow us to convert phylink's
open coded bitmap_fill() operations.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: change reaction to ICMP messages
ICMP[v6] messages received for a socket in TCP_SYN_SENT currently abort
the connection attempt, in violation of standards.
This series changes our stack to adhere to RFC 6069 and RFC 1122
(4.2.3.9)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, non fatal ICMP messages received on behalf
of SYN_SENT sockets do call tcp_ld_RTO_revert()
to implement RFC 6069, but immediately call tcp_done(),
thus aborting the connect() attempt.
This violates RFC 1122 following requirement:
4.2.3.9 ICMP Messages
...
o Destination Unreachable -- codes 0, 1, 5
Since these Unreachable messages indicate soft error
conditions, TCP MUST NOT abort the connection, and it
SHOULD make the information available to the
application.
This patch makes sure non 'fatal' ICMP[v6] messages do not
abort the connection attempt.
It enables RFC 6069 for SYN_SENT sockets as a result.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Morley <morleyd@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
David used icsk->icsk_backoff field to track the number of linear timeouts.
Since then, tp->total_rto has been added.
This commit uses tp->total_rto instead of icsk->icsk_backoff
so that tcp_ld_RTO_revert() no longer can trigger an overflow
in inet_csk_rto_backoff(). Other than the potential UBSAN
report, there was no issue because receiving an ICMP message
currently aborts the connect().
In the following patch, we want to adhere to RFC 6069
and RFC 1122 4.2.3.9.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Morley <morleyd@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sockets used by udpgso_bench_tx aren't always ready when
udpgso_bench_tx transmits packets. This issue is more prevalent in -rt
kernels, but can occur in both. Replace the hacky sleep calls with a
function that checks whether the ports in the namespace are ready for
use.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela says:
====================
selftests: tc-testing: updates to tdc
- Patch 1 removes an obscure feature from tdc
- Patch 2 reworks the namespace and devices setup giving a nice speed
boost
- Patch 3 preloads all tc modules when running kselftests
- Patch 4 turns on parallel testing in kselftests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Leverage parallel tests in kselftests using all the available cpus.
We tested this in tuxsuite and locally extensively and it seems it's ready for prime time.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While running tdc tests in parallel it can race over the module loading
done by tc and fail the run with random errors.
So avoid this by preloading all modules before running tdc in kselftests.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As mentioned in the TC Workshop 0x17, our recent changes to tdc broke
downstream CI systems like tuxsuite. The issue is the classic problem
with rcu/workqueue objects where you can miss them if not enough wall time
has passed. The latter is subjective to the system and kernel config,
in my machine could be nanoseconds while in another could be microseconds
or more.
In order to make the suite deterministic, poll for the existence
of the objects in a reasonable manner. Talking netlink directly is the
the best solution in order to avoid paying the cost of multiple
'fork()' calls, so introduce a netlink based setup routine using
pyroute2. We leave the iproute2 one as a fallback when pyroute2 is not
available.
Also rework the iproute2 side to mimic the netlink routine where it
creates DEV0 as the peer of DEV1 and moves DEV1 into the net namespace.
This way when the namespace is deleted DEV0 is also deleted
automatically, leaving no margin for resource leaks.
Another bonus of this change is that our setup time sped up by a factor
of 2 when using netlink.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This argument would bypass the net namespace creation and run the test in
the root namespace, even if nsPlugin was specified.
Drop it as it's the same as commenting out the nsPlugin from a test and adds
additional complexity to the plugin code.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document bindings for Marvell Aquantia PHY.
The Marvell Aquantia PHY require a firmware to work correctly and there
at least 3 way to load this firmware.
Describe all the different way and document the binding "firmware-name"
to load the PHY firmware from userspace.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aquantia PHY-s require firmware to be loaded before they start operating.
It can be automatically loaded in case when there is a SPI-NOR connected
to Aquantia PHY-s or can be loaded from the host via MDIO.
This patch adds support for loading the firmware via MDIO as in most cases
there is no SPI-NOR being used to save on cost.
Firmware loading code itself is ported from mainline U-boot with cleanups.
The firmware has mixed values both in big and little endian.
PHY core itself is big-endian but it expects values to be in little-endian.
The firmware is little-endian but CRC-16 value for it is stored at the end
of firmware in big-endian.
It seems the PHY does the conversion internally from firmware that is
little-endian to the PHY that is big-endian on using the mailbox
but mailbox returns a big-endian CRC-16 to verify the written data
integrity.
Co-developed-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Robert Marko <robimarko@gmail.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move MMD_VEND define to header to clean things up and in preparation for
firmware loading support that require some define placed in
aquantia_main.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move aquantia PHY driver to separate driectory in preparation for
firmware loading support to keep things tidy.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shinas Rasheed says:
====================
Cleanup and optimizations to transmit code
Pad small packets to ETH_ZLEN before transmit, cleanup dma sync calls,
add xmit_more functionality and then further remove atomic
variable usage in the prior.
Changes:
V3:
- Stop returning NETDEV_TX_BUSY when ring is full in xmit_patch.
Change to inspect early if next packet can fit in ring instead of
current packet, and stop queue if not.
- Add smp_mb between stopping tx queue and checking if tx queue has
free entries again, in queue full check function to let reflect
IQ process completions that might have happened on other cpus.
- Update small packet padding patch changelog to give more info.
V2: https://lore.kernel.org/all/20231024145119.2366588-1-srasheed@marvell.com/
- Added patch for padding small packets to ETH_ZLEN, part of
optimization patches for transmit code missed out in V1
- Updated changelog to provide more details for dma_sync remove patch
- Updated changelog to use imperative tone in add xmit_more patch
V1: https://lore.kernel.org/all/20231023114449.2362147-1-srasheed@marvell.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace atomic variable "instr_pending" which represents number of
posted tx instructions pending completion, with host_write_idx and
flush_index variables in the xmit and completion processing respectively.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add xmit_more handling in tx datapath for octeon_ep pf.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup dma sync calls for scatter gather
mappings, since they are coherent allocations
and do not need explicit sync to be called.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pad small packets to ETH_ZLEN before transmit, as hardware
cannot pad and requires software padding to ensure
minimum ethernet frame length.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from Paolo Abeni:
"Including fixes from BPF and netfilter.
Current release - regressions:
- core: fix undefined behavior in netdev name allocation
- bpf: do not allocate percpu memory at init stage
- netfilter: nf_tables: split async and sync catchall in two
functions
- mptcp: fix possible NULL pointer dereference on close
Current release - new code bugs:
- eth: ice: dpll: fix initial lock status of dpll
Previous releases - regressions:
- bpf: fix precision backtracking instruction iteration
- af_unix: fix use-after-free in unix_stream_read_actor()
- tipc: fix kernel-infoleak due to uninitialized TLV value
- eth: bonding: stop the device in bond_setup_by_slave()
- eth: mlx5:
- fix double free of encap_header
- avoid referencing skb after free-ing in drop path
- eth: hns3: fix VF reset
- eth: mvneta: fix calls to page_pool_get_stats
Previous releases - always broken:
- core: set SOCK_RCU_FREE before inserting socket into hashtable
- bpf: fix control-flow graph checking in privileged mode
- eth: ppp: limit MRU to 64K
- eth: stmmac: avoid rx queue overrun
- eth: icssg-prueth: fix error cleanup on failing initialization
- eth: hns3: fix out-of-bounds access may occur when coalesce info is
read via debugfs
- eth: cortina: handle large frames
Misc:
- selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45"
* tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits)
macvlan: Don't propagate promisc change to lower dev in passthru
net: sched: do not offload flows with a helper in act_ct
net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
net/mlx5e: Check return value of snprintf writing to fw_version buffer
net/mlx5e: Reduce the size of icosq_str
net/mlx5: Increase size of irq name buffer
net/mlx5e: Update doorbell for port timestamping CQ before the software counter
net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
net/mlx5e: Fix pedit endianness
net/mlx5e: fix double free of encap_header in update funcs
net/mlx5e: fix double free of encap_header
net/mlx5: Decouple PHC .adjtime and .adjphase implementations
net/mlx5: DR, Allow old devices to use multi destination FTE
net/mlx5: Free used cpus mask when an IRQ is released
Revert "net/mlx5: DR, Supporting inline WQE when possible"
bpf: Do not allocate percpu memory at init stage
net: Fix undefined behavior in netdev name allocation
dt-bindings: net: ethernet-controller: Fix formatting error
...
Pull xen updates from Juergen Gross:
- A fix in the Xen events driver avoiding the use of RCU after
the call to rcu_report_dead() when taking a cpu down
- A fix for running as Xen dom0 to line up ACPI's idea of power
management capabilities with the one of Xen
- A cleanup eliminating several kernel-doc warnings in Xen related
code
- A cleanup series of the Xen events driver
* tag 'for-linus-6.7a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: remove some info_for_irq() calls in pirq handling
xen/events: modify internal [un]bind interfaces
xen/events: drop xen_allocate_irqs_dynamic()
xen/events: remove some simple helpers from events_base.c
xen/events: reduce externally visible helper functions
xen/events: remove unused functions
xen/events: fix delayed eoi list handling
xen/shbuf: eliminate 17 kernel-doc warnings
acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
xen/events: avoid using info_for_irq() in xen_send_IPI_one()
Pull virtio fixes from Michael Tsirkin:
"Bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-vdpa: fix use after free in vhost_vdpa_probe()
virtio_pci: Switch away from deprecated irq_set_affinity_hint
riscv, qemu_fw_cfg: Add support for RISC-V architecture
vdpa_sim_blk: allocate the buffer zeroed
virtio_pci: move structure to a header
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Remove unused variable causing compilation warning in nft_set_rbtree,
from Yang Li. This unused variable is a left over from previous
merge window.
2) Possible return of uninitialized in nf_conntrack_bridge, from
Linkui Xiao. This is there since nf_conntrack_bridge is available.
3) Fix incorrect pointer math in nft_byteorder, from Dan Carpenter.
Problem has been there since 2016.
4) Fix bogus error in destroy set element command. Problem is there
since this new destroy command was added.
5) Fix race condition in ipset between swap and destroy commands and
add/del/test control plane. This problem is there since ipset was
merged.
6) Split async and sync catchall GC in two function to fix unsafe
iteration over RCU. This is a fix-for-fix that was included in
the previous pull request.
* tag 'nf-23-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: split async and sync catchall in two functions
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
netfilter: nf_tables: bogus ENOENT when destroying element which does not exist
netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
netfilter: nf_conntrack_bridge: initialize err to 0
netfilter: nft_set_rbtree: Remove unused variable nft_net
====================
Link: https://lore.kernel.org/r/20231115184514.8965-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Macvlan device in passthru mode sets its lower device promiscuous mode
according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to
its own promiscuity setting. However, macvlan_change_rx_flags() function
doesn't check the mode before propagating such changes to the lower device
which can cause net_device->promiscuity counter overflow as illustrated by
reproduction example [0] and resulting dmesg log [1]. Fix the issue by
first verifying the mode in macvlan_change_rx_flags() function before
propagating promiscuous mode change to the lower device.
[0]:
ip link add macvlan1 link enp8s0f0 type macvlan mode passthru
ip link set macvlan1 promisc on
ip l set dev macvlan1 up
ip link set macvlan1 promisc off
ip l set dev macvlan1 down
ip l set dev macvlan1 up
[1]:
[ 5156.281724] macvlan1: entered promiscuous mode
[ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
[ 5156.287639] macvlan1: left promiscuous mode
[ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode
[ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
[ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.
Fixes: efdbd2b30c ("macvlan: Propagate promiscuity setting to lower devices.")
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>