mirror of
https://github.com/torvalds/linux.git
synced 2025-12-07 11:56:58 +00:00
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
1196 lines
31 KiB
C
1196 lines
31 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* PCI Peer 2 Peer DMA support.
|
|
*
|
|
* Copyright (c) 2016-2018, Logan Gunthorpe
|
|
* Copyright (c) 2016-2017, Microsemi Corporation
|
|
* Copyright (c) 2017, Christoph Hellwig
|
|
* Copyright (c) 2018, Eideticom Inc.
|
|
*/
|
|
|
|
#define pr_fmt(fmt) "pci-p2pdma: " fmt
|
|
#include <linux/ctype.h>
|
|
#include <linux/dma-map-ops.h>
|
|
#include <linux/pci-p2pdma.h>
|
|
#include <linux/module.h>
|
|
#include <linux/slab.h>
|
|
#include <linux/genalloc.h>
|
|
#include <linux/memremap.h>
|
|
#include <linux/percpu-refcount.h>
|
|
#include <linux/random.h>
|
|
#include <linux/seq_buf.h>
|
|
#include <linux/xarray.h>
|
|
|
|
struct pci_p2pdma {
|
|
struct gen_pool *pool;
|
|
bool p2pmem_published;
|
|
struct xarray map_types;
|
|
struct p2pdma_provider mem[PCI_STD_NUM_BARS];
|
|
};
|
|
|
|
struct pci_p2pdma_pagemap {
|
|
struct dev_pagemap pgmap;
|
|
struct p2pdma_provider *mem;
|
|
};
|
|
|
|
static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
|
|
{
|
|
return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
|
|
}
|
|
|
|
static ssize_t size_show(struct device *dev, struct device_attribute *attr,
|
|
char *buf)
|
|
{
|
|
struct pci_dev *pdev = to_pci_dev(dev);
|
|
struct pci_p2pdma *p2pdma;
|
|
size_t size = 0;
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (p2pdma && p2pdma->pool)
|
|
size = gen_pool_size(p2pdma->pool);
|
|
rcu_read_unlock();
|
|
|
|
return sysfs_emit(buf, "%zd\n", size);
|
|
}
|
|
static DEVICE_ATTR_RO(size);
|
|
|
|
static ssize_t available_show(struct device *dev, struct device_attribute *attr,
|
|
char *buf)
|
|
{
|
|
struct pci_dev *pdev = to_pci_dev(dev);
|
|
struct pci_p2pdma *p2pdma;
|
|
size_t avail = 0;
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (p2pdma && p2pdma->pool)
|
|
avail = gen_pool_avail(p2pdma->pool);
|
|
rcu_read_unlock();
|
|
|
|
return sysfs_emit(buf, "%zd\n", avail);
|
|
}
|
|
static DEVICE_ATTR_RO(available);
|
|
|
|
static ssize_t published_show(struct device *dev, struct device_attribute *attr,
|
|
char *buf)
|
|
{
|
|
struct pci_dev *pdev = to_pci_dev(dev);
|
|
struct pci_p2pdma *p2pdma;
|
|
bool published = false;
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (p2pdma)
|
|
published = p2pdma->p2pmem_published;
|
|
rcu_read_unlock();
|
|
|
|
return sysfs_emit(buf, "%d\n", published);
|
|
}
|
|
static DEVICE_ATTR_RO(published);
|
|
|
|
static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
|
|
const struct bin_attribute *attr, struct vm_area_struct *vma)
|
|
{
|
|
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
|
|
size_t len = vma->vm_end - vma->vm_start;
|
|
struct pci_p2pdma *p2pdma;
|
|
struct percpu_ref *ref;
|
|
unsigned long vaddr;
|
|
void *kaddr;
|
|
int ret;
|
|
|
|
/* prevent private mappings from being established */
|
|
if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
|
|
pci_info_ratelimited(pdev,
|
|
"%s: fail, attempted private mapping\n",
|
|
current->comm);
|
|
return -EINVAL;
|
|
}
|
|
|
|
if (vma->vm_pgoff) {
|
|
pci_info_ratelimited(pdev,
|
|
"%s: fail, attempted mapping with non-zero offset\n",
|
|
current->comm);
|
|
return -EINVAL;
|
|
}
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (!p2pdma) {
|
|
ret = -ENODEV;
|
|
goto out;
|
|
}
|
|
|
|
kaddr = (void *)gen_pool_alloc_owner(p2pdma->pool, len, (void **)&ref);
|
|
if (!kaddr) {
|
|
ret = -ENOMEM;
|
|
goto out;
|
|
}
|
|
|
|
/*
|
|
* vm_insert_page() can sleep, so a reference is taken to mapping
|
|
* such that rcu_read_unlock() can be done before inserting the
|
|
* pages
|
|
*/
|
|
if (unlikely(!percpu_ref_tryget_live_rcu(ref))) {
|
|
ret = -ENODEV;
|
|
goto out_free_mem;
|
|
}
|
|
rcu_read_unlock();
|
|
|
|
for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
|
|
struct page *page = virt_to_page(kaddr);
|
|
|
|
/*
|
|
* Initialise the refcount for the freshly allocated page. As
|
|
* we have just allocated the page no one else should be
|
|
* using it.
|
|
*/
|
|
VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page);
|
|
set_page_count(page, 1);
|
|
ret = vm_insert_page(vma, vaddr, page);
|
|
if (ret) {
|
|
gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
|
|
return ret;
|
|
}
|
|
percpu_ref_get(ref);
|
|
put_page(page);
|
|
kaddr += PAGE_SIZE;
|
|
len -= PAGE_SIZE;
|
|
}
|
|
|
|
percpu_ref_put(ref);
|
|
|
|
return 0;
|
|
out_free_mem:
|
|
gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
|
|
out:
|
|
rcu_read_unlock();
|
|
return ret;
|
|
}
|
|
|
|
static const struct bin_attribute p2pmem_alloc_attr = {
|
|
.attr = { .name = "allocate", .mode = 0660 },
|
|
.mmap = p2pmem_alloc_mmap,
|
|
/*
|
|
* Some places where we want to call mmap (ie. python) will check
|
|
* that the file size is greater than the mmap size before allowing
|
|
* the mmap to continue. To work around this, just set the size
|
|
* to be very large.
|
|
*/
|
|
.size = SZ_1T,
|
|
};
|
|
|
|
static struct attribute *p2pmem_attrs[] = {
|
|
&dev_attr_size.attr,
|
|
&dev_attr_available.attr,
|
|
&dev_attr_published.attr,
|
|
NULL,
|
|
};
|
|
|
|
static const struct bin_attribute *const p2pmem_bin_attrs[] = {
|
|
&p2pmem_alloc_attr,
|
|
NULL,
|
|
};
|
|
|
|
static const struct attribute_group p2pmem_group = {
|
|
.attrs = p2pmem_attrs,
|
|
.bin_attrs = p2pmem_bin_attrs,
|
|
.name = "p2pmem",
|
|
};
|
|
|
|
static void p2pdma_folio_free(struct folio *folio)
|
|
{
|
|
struct page *page = &folio->page;
|
|
struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_pgmap(page));
|
|
/* safe to dereference while a reference is held to the percpu ref */
|
|
struct pci_p2pdma *p2pdma = rcu_dereference_protected(
|
|
to_pci_dev(pgmap->mem->owner)->p2pdma, 1);
|
|
struct percpu_ref *ref;
|
|
|
|
gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page),
|
|
PAGE_SIZE, (void **)&ref);
|
|
percpu_ref_put(ref);
|
|
}
|
|
|
|
static const struct dev_pagemap_ops p2pdma_pgmap_ops = {
|
|
.folio_free = p2pdma_folio_free,
|
|
};
|
|
|
|
static void pci_p2pdma_release(void *data)
|
|
{
|
|
struct pci_dev *pdev = data;
|
|
struct pci_p2pdma *p2pdma;
|
|
|
|
p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
if (!p2pdma)
|
|
return;
|
|
|
|
/* Flush and disable pci_alloc_p2p_mem() */
|
|
pdev->p2pdma = NULL;
|
|
if (p2pdma->pool)
|
|
synchronize_rcu();
|
|
xa_destroy(&p2pdma->map_types);
|
|
|
|
if (!p2pdma->pool)
|
|
return;
|
|
|
|
gen_pool_destroy(p2pdma->pool);
|
|
sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
|
|
}
|
|
|
|
/**
|
|
* pcim_p2pdma_init - Initialise peer-to-peer DMA providers
|
|
* @pdev: The PCI device to enable P2PDMA for
|
|
*
|
|
* This function initializes the peer-to-peer DMA infrastructure
|
|
* for a PCI device. It allocates and sets up the necessary data
|
|
* structures to support P2PDMA operations, including mapping type
|
|
* tracking.
|
|
*/
|
|
int pcim_p2pdma_init(struct pci_dev *pdev)
|
|
{
|
|
struct pci_p2pdma *p2p;
|
|
int i, ret;
|
|
|
|
p2p = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
if (p2p)
|
|
return 0;
|
|
|
|
p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
|
|
if (!p2p)
|
|
return -ENOMEM;
|
|
|
|
xa_init(&p2p->map_types);
|
|
/*
|
|
* Iterate over all standard PCI BARs and record only those that
|
|
* correspond to MMIO regions. Skip non-memory resources (e.g. I/O
|
|
* port BARs) since they cannot be used for peer-to-peer (P2P)
|
|
* transactions.
|
|
*/
|
|
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
|
|
if (!(pci_resource_flags(pdev, i) & IORESOURCE_MEM))
|
|
continue;
|
|
|
|
p2p->mem[i].owner = &pdev->dev;
|
|
p2p->mem[i].bus_offset =
|
|
pci_bus_address(pdev, i) - pci_resource_start(pdev, i);
|
|
}
|
|
|
|
ret = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
|
|
if (ret)
|
|
goto out_p2p;
|
|
|
|
rcu_assign_pointer(pdev->p2pdma, p2p);
|
|
return 0;
|
|
|
|
out_p2p:
|
|
devm_kfree(&pdev->dev, p2p);
|
|
return ret;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pcim_p2pdma_init);
|
|
|
|
/**
|
|
* pcim_p2pdma_provider - Get peer-to-peer DMA provider
|
|
* @pdev: The PCI device to enable P2PDMA for
|
|
* @bar: BAR index to get provider
|
|
*
|
|
* This function gets peer-to-peer DMA provider for a PCI device. The lifetime
|
|
* of the provider (and of course the MMIO) is bound to the lifetime of the
|
|
* driver. A driver calling this function must ensure that all references to the
|
|
* provider, and any DMA mappings created for any MMIO, are all cleaned up
|
|
* before the driver remove() completes.
|
|
*
|
|
* Since P2P is almost always shared with a second driver this means some system
|
|
* to notify, invalidate and revoke the MMIO's DMA must be in place to use this
|
|
* function. For example a revoke can be built using DMABUF.
|
|
*/
|
|
struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar)
|
|
{
|
|
struct pci_p2pdma *p2p;
|
|
|
|
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
|
|
return NULL;
|
|
|
|
p2p = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
if (WARN_ON(!p2p))
|
|
/* Someone forgot to call to pcim_p2pdma_init() before */
|
|
return NULL;
|
|
|
|
return &p2p->mem[bar];
|
|
}
|
|
EXPORT_SYMBOL_GPL(pcim_p2pdma_provider);
|
|
|
|
static int pci_p2pdma_setup_pool(struct pci_dev *pdev)
|
|
{
|
|
struct pci_p2pdma *p2pdma;
|
|
int ret;
|
|
|
|
p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
if (p2pdma->pool)
|
|
/* We already setup pools, do nothing, */
|
|
return 0;
|
|
|
|
p2pdma->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
|
|
if (!p2pdma->pool)
|
|
return -ENOMEM;
|
|
|
|
ret = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
|
|
if (ret)
|
|
goto out_pool_destroy;
|
|
|
|
return 0;
|
|
|
|
out_pool_destroy:
|
|
gen_pool_destroy(p2pdma->pool);
|
|
p2pdma->pool = NULL;
|
|
return ret;
|
|
}
|
|
|
|
static void pci_p2pdma_unmap_mappings(void *data)
|
|
{
|
|
struct pci_p2pdma_pagemap *p2p_pgmap = data;
|
|
|
|
/*
|
|
* Removing the alloc attribute from sysfs will call
|
|
* unmap_mapping_range() on the inode, teardown any existing userspace
|
|
* mappings and prevent new ones from being created.
|
|
*/
|
|
sysfs_remove_file_from_group(&p2p_pgmap->mem->owner->kobj,
|
|
&p2pmem_alloc_attr.attr,
|
|
p2pmem_group.name);
|
|
}
|
|
|
|
/**
|
|
* pci_p2pdma_add_resource - add memory for use as p2p memory
|
|
* @pdev: the device to add the memory to
|
|
* @bar: PCI BAR to add
|
|
* @size: size of the memory to add, may be zero to use the whole BAR
|
|
* @offset: offset into the PCI BAR
|
|
*
|
|
* The memory will be given ZONE_DEVICE struct pages so that it may
|
|
* be used with any DMA request.
|
|
*/
|
|
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
|
u64 offset)
|
|
{
|
|
struct pci_p2pdma_pagemap *p2p_pgmap;
|
|
struct p2pdma_provider *mem;
|
|
struct dev_pagemap *pgmap;
|
|
struct pci_p2pdma *p2pdma;
|
|
void *addr;
|
|
int error;
|
|
|
|
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
|
|
return -EINVAL;
|
|
|
|
if (offset >= pci_resource_len(pdev, bar))
|
|
return -EINVAL;
|
|
|
|
if (!size)
|
|
size = pci_resource_len(pdev, bar) - offset;
|
|
|
|
if (size + offset > pci_resource_len(pdev, bar))
|
|
return -EINVAL;
|
|
|
|
error = pcim_p2pdma_init(pdev);
|
|
if (error)
|
|
return error;
|
|
|
|
error = pci_p2pdma_setup_pool(pdev);
|
|
if (error)
|
|
return error;
|
|
|
|
mem = pcim_p2pdma_provider(pdev, bar);
|
|
/*
|
|
* We checked validity of BAR prior to call
|
|
* to pcim_p2pdma_provider. It should never return NULL.
|
|
*/
|
|
if (WARN_ON(!mem))
|
|
return -EINVAL;
|
|
|
|
p2p_pgmap = devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL);
|
|
if (!p2p_pgmap)
|
|
return -ENOMEM;
|
|
|
|
pgmap = &p2p_pgmap->pgmap;
|
|
pgmap->range.start = pci_resource_start(pdev, bar) + offset;
|
|
pgmap->range.end = pgmap->range.start + size - 1;
|
|
pgmap->nr_range = 1;
|
|
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
|
|
pgmap->ops = &p2pdma_pgmap_ops;
|
|
p2p_pgmap->mem = mem;
|
|
|
|
addr = devm_memremap_pages(&pdev->dev, pgmap);
|
|
if (IS_ERR(addr)) {
|
|
error = PTR_ERR(addr);
|
|
goto pgmap_free;
|
|
}
|
|
|
|
error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings,
|
|
p2p_pgmap);
|
|
if (error)
|
|
goto pages_free;
|
|
|
|
p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
error = gen_pool_add_owner(p2pdma->pool, (unsigned long)addr,
|
|
pci_bus_address(pdev, bar) + offset,
|
|
range_len(&pgmap->range), dev_to_node(&pdev->dev),
|
|
&pgmap->ref);
|
|
if (error)
|
|
goto pages_free;
|
|
|
|
pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n",
|
|
pgmap->range.start, pgmap->range.end);
|
|
|
|
return 0;
|
|
|
|
pages_free:
|
|
devm_memunmap_pages(&pdev->dev, pgmap);
|
|
pgmap_free:
|
|
devm_kfree(&pdev->dev, p2p_pgmap);
|
|
return error;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
|
|
|
|
/*
|
|
* Note this function returns the parent PCI device with a
|
|
* reference taken. It is the caller's responsibility to drop
|
|
* the reference.
|
|
*/
|
|
static struct pci_dev *find_parent_pci_dev(struct device *dev)
|
|
{
|
|
struct device *parent;
|
|
|
|
dev = get_device(dev);
|
|
|
|
while (dev) {
|
|
if (dev_is_pci(dev))
|
|
return to_pci_dev(dev);
|
|
|
|
parent = get_device(dev->parent);
|
|
put_device(dev);
|
|
dev = parent;
|
|
}
|
|
|
|
return NULL;
|
|
}
|
|
|
|
/*
|
|
* Check if a PCI bridge has its ACS redirection bits set to redirect P2P
|
|
* TLPs upstream via ACS. Returns 1 if the packets will be redirected
|
|
* upstream, 0 otherwise.
|
|
*/
|
|
static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
|
|
{
|
|
int pos;
|
|
u16 ctrl;
|
|
|
|
pos = pdev->acs_cap;
|
|
if (!pos)
|
|
return 0;
|
|
|
|
pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
|
|
|
|
if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
|
|
return 1;
|
|
|
|
return 0;
|
|
}
|
|
|
|
static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
|
|
{
|
|
if (!buf)
|
|
return;
|
|
|
|
seq_buf_printf(buf, "%s;", pci_name(pdev));
|
|
}
|
|
|
|
static bool cpu_supports_p2pdma(void)
|
|
{
|
|
#ifdef CONFIG_X86
|
|
struct cpuinfo_x86 *c = &cpu_data(0);
|
|
|
|
/* Any AMD CPU whose family ID is Zen or newer supports p2pdma */
|
|
if (c->x86_vendor == X86_VENDOR_AMD && c->x86 >= 0x17)
|
|
return true;
|
|
#endif
|
|
|
|
return false;
|
|
}
|
|
|
|
static const struct pci_p2pdma_whitelist_entry {
|
|
unsigned short vendor;
|
|
unsigned short device;
|
|
enum {
|
|
REQ_SAME_HOST_BRIDGE = 1 << 0,
|
|
} flags;
|
|
} pci_p2pdma_whitelist[] = {
|
|
/* Intel Xeon E5/Core i7 */
|
|
{PCI_VENDOR_ID_INTEL, 0x3c00, REQ_SAME_HOST_BRIDGE},
|
|
{PCI_VENDOR_ID_INTEL, 0x3c01, REQ_SAME_HOST_BRIDGE},
|
|
/* Intel Xeon E7 v3/Xeon E5 v3/Core i7 */
|
|
{PCI_VENDOR_ID_INTEL, 0x2f00, REQ_SAME_HOST_BRIDGE},
|
|
{PCI_VENDOR_ID_INTEL, 0x2f01, REQ_SAME_HOST_BRIDGE},
|
|
/* Intel Skylake-E */
|
|
{PCI_VENDOR_ID_INTEL, 0x2030, 0},
|
|
{PCI_VENDOR_ID_INTEL, 0x2031, 0},
|
|
{PCI_VENDOR_ID_INTEL, 0x2032, 0},
|
|
{PCI_VENDOR_ID_INTEL, 0x2033, 0},
|
|
{PCI_VENDOR_ID_INTEL, 0x2020, 0},
|
|
{PCI_VENDOR_ID_INTEL, 0x09a2, 0},
|
|
{}
|
|
};
|
|
|
|
/*
|
|
* If the first device on host's root bus is either devfn 00.0 or a PCIe
|
|
* Root Port, return it. Otherwise return NULL.
|
|
*
|
|
* We often use a devfn 00.0 "host bridge" in the pci_p2pdma_whitelist[]
|
|
* (though there is no PCI/PCIe requirement for such a device). On some
|
|
* platforms, e.g., Intel Skylake, there is no such host bridge device, and
|
|
* pci_p2pdma_whitelist[] may contain a Root Port at any devfn.
|
|
*
|
|
* This function is similar to pci_get_slot(host->bus, 0), but it does
|
|
* not take the pci_bus_sem lock since __host_bridge_whitelist() must not
|
|
* sleep.
|
|
*
|
|
* For this to be safe, the caller should hold a reference to a device on the
|
|
* bridge, which should ensure the host_bridge device will not be freed
|
|
* or removed from the head of the devices list.
|
|
*/
|
|
static struct pci_dev *pci_host_bridge_dev(struct pci_host_bridge *host)
|
|
{
|
|
struct pci_dev *root;
|
|
|
|
root = list_first_entry_or_null(&host->bus->devices,
|
|
struct pci_dev, bus_list);
|
|
|
|
if (!root)
|
|
return NULL;
|
|
|
|
if (root->devfn == PCI_DEVFN(0, 0))
|
|
return root;
|
|
|
|
if (pci_pcie_type(root) == PCI_EXP_TYPE_ROOT_PORT)
|
|
return root;
|
|
|
|
return NULL;
|
|
}
|
|
|
|
static bool __host_bridge_whitelist(struct pci_host_bridge *host,
|
|
bool same_host_bridge, bool warn)
|
|
{
|
|
struct pci_dev *root = pci_host_bridge_dev(host);
|
|
const struct pci_p2pdma_whitelist_entry *entry;
|
|
unsigned short vendor, device;
|
|
|
|
if (!root)
|
|
return false;
|
|
|
|
vendor = root->vendor;
|
|
device = root->device;
|
|
|
|
for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
|
|
if (vendor != entry->vendor || device != entry->device)
|
|
continue;
|
|
if (entry->flags & REQ_SAME_HOST_BRIDGE && !same_host_bridge)
|
|
return false;
|
|
|
|
return true;
|
|
}
|
|
|
|
if (warn)
|
|
pci_warn(root, "Host bridge not in P2PDMA whitelist: %04x:%04x\n",
|
|
vendor, device);
|
|
|
|
return false;
|
|
}
|
|
|
|
/*
|
|
* If we can't find a common upstream bridge take a look at the root
|
|
* complex and compare it to a whitelist of known good hardware.
|
|
*/
|
|
static bool host_bridge_whitelist(struct pci_dev *a, struct pci_dev *b,
|
|
bool warn)
|
|
{
|
|
struct pci_host_bridge *host_a = pci_find_host_bridge(a->bus);
|
|
struct pci_host_bridge *host_b = pci_find_host_bridge(b->bus);
|
|
|
|
if (host_a == host_b)
|
|
return __host_bridge_whitelist(host_a, true, warn);
|
|
|
|
if (__host_bridge_whitelist(host_a, false, warn) &&
|
|
__host_bridge_whitelist(host_b, false, warn))
|
|
return true;
|
|
|
|
return false;
|
|
}
|
|
|
|
static unsigned long map_types_idx(struct pci_dev *client)
|
|
{
|
|
return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
|
|
}
|
|
|
|
/*
|
|
* Calculate the P2PDMA mapping type and distance between two PCI devices.
|
|
*
|
|
* If the two devices are the same PCI function, return
|
|
* PCI_P2PDMA_MAP_BUS_ADDR and a distance of 0.
|
|
*
|
|
* If they are two functions of the same device, return
|
|
* PCI_P2PDMA_MAP_BUS_ADDR and a distance of 2 (one hop up to the bridge,
|
|
* then one hop back down to another function of the same device).
|
|
*
|
|
* In the case where two devices are connected to the same PCIe switch,
|
|
* return a distance of 4. This corresponds to the following PCI tree:
|
|
*
|
|
* -+ Root Port
|
|
* \+ Switch Upstream Port
|
|
* +-+ Switch Downstream Port 0
|
|
* + \- Device A
|
|
* \-+ Switch Downstream Port 1
|
|
* \- Device B
|
|
*
|
|
* The distance is 4 because we traverse from Device A to Downstream Port 0
|
|
* to the common Switch Upstream Port, back down to Downstream Port 1 and
|
|
* then to Device B. The mapping type returned depends on the ACS
|
|
* redirection setting of the ports along the path.
|
|
*
|
|
* If ACS redirect is set on any port in the path, traffic between the
|
|
* devices will go through the host bridge, so return
|
|
* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE; otherwise return
|
|
* PCI_P2PDMA_MAP_BUS_ADDR.
|
|
*
|
|
* Any two devices that have a data path that goes through the host bridge
|
|
* will consult a whitelist. If the host bridge is in the whitelist, return
|
|
* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE with the distance set to the number of
|
|
* ports per above. If the device is not in the whitelist, return
|
|
* PCI_P2PDMA_MAP_NOT_SUPPORTED.
|
|
*/
|
|
static enum pci_p2pdma_map_type
|
|
calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
|
|
int *dist, bool verbose)
|
|
{
|
|
enum pci_p2pdma_map_type map_type = PCI_P2PDMA_MAP_THRU_HOST_BRIDGE;
|
|
struct pci_dev *a = provider, *b = client, *bb;
|
|
bool acs_redirects = false;
|
|
struct pci_p2pdma *p2pdma;
|
|
struct seq_buf acs_list;
|
|
int acs_cnt = 0;
|
|
int dist_a = 0;
|
|
int dist_b = 0;
|
|
char buf[128];
|
|
|
|
seq_buf_init(&acs_list, buf, sizeof(buf));
|
|
|
|
/*
|
|
* Note, we don't need to take references to devices returned by
|
|
* pci_upstream_bridge() seeing we hold a reference to a child
|
|
* device which will already hold a reference to the upstream bridge.
|
|
*/
|
|
while (a) {
|
|
dist_b = 0;
|
|
|
|
if (pci_bridge_has_acs_redir(a)) {
|
|
seq_buf_print_bus_devfn(&acs_list, a);
|
|
acs_cnt++;
|
|
}
|
|
|
|
bb = b;
|
|
|
|
while (bb) {
|
|
if (a == bb)
|
|
goto check_b_path_acs;
|
|
|
|
bb = pci_upstream_bridge(bb);
|
|
dist_b++;
|
|
}
|
|
|
|
a = pci_upstream_bridge(a);
|
|
dist_a++;
|
|
}
|
|
|
|
*dist = dist_a + dist_b;
|
|
goto map_through_host_bridge;
|
|
|
|
check_b_path_acs:
|
|
bb = b;
|
|
|
|
while (bb) {
|
|
if (a == bb)
|
|
break;
|
|
|
|
if (pci_bridge_has_acs_redir(bb)) {
|
|
seq_buf_print_bus_devfn(&acs_list, bb);
|
|
acs_cnt++;
|
|
}
|
|
|
|
bb = pci_upstream_bridge(bb);
|
|
}
|
|
|
|
*dist = dist_a + dist_b;
|
|
|
|
if (!acs_cnt) {
|
|
map_type = PCI_P2PDMA_MAP_BUS_ADDR;
|
|
goto done;
|
|
}
|
|
|
|
if (verbose) {
|
|
acs_list.buffer[acs_list.len-1] = 0; /* drop final semicolon */
|
|
pci_warn(client, "ACS redirect is set between the client and provider (%s)\n",
|
|
pci_name(provider));
|
|
pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
|
|
acs_list.buffer);
|
|
}
|
|
acs_redirects = true;
|
|
|
|
map_through_host_bridge:
|
|
if (!cpu_supports_p2pdma() &&
|
|
!host_bridge_whitelist(provider, client, acs_redirects)) {
|
|
if (verbose)
|
|
pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider (%s) do not share an upstream bridge or whitelisted host bridge\n",
|
|
pci_name(provider));
|
|
map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
|
}
|
|
done:
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(provider->p2pdma);
|
|
if (p2pdma)
|
|
xa_store(&p2pdma->map_types, map_types_idx(client),
|
|
xa_mk_value(map_type), GFP_ATOMIC);
|
|
rcu_read_unlock();
|
|
return map_type;
|
|
}
|
|
|
|
/**
|
|
* pci_p2pdma_distance_many - Determine the cumulative distance between
|
|
* a p2pdma provider and the clients in use.
|
|
* @provider: p2pdma provider to check against the client list
|
|
* @clients: array of devices to check (NULL-terminated)
|
|
* @num_clients: number of clients in the array
|
|
* @verbose: if true, print warnings for devices when we return -1
|
|
*
|
|
* Returns -1 if any of the clients are not compatible, otherwise returns a
|
|
* positive number where a lower number is the preferable choice. (If there's
|
|
* one client that's the same as the provider it will return 0, which is best
|
|
* choice).
|
|
*
|
|
* "compatible" means the provider and the clients are either all behind
|
|
* the same PCI root port or the host bridges connected to each of the devices
|
|
* are listed in the 'pci_p2pdma_whitelist'.
|
|
*/
|
|
int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
|
|
int num_clients, bool verbose)
|
|
{
|
|
enum pci_p2pdma_map_type map;
|
|
bool not_supported = false;
|
|
struct pci_dev *pci_client;
|
|
int total_dist = 0;
|
|
int i, distance;
|
|
|
|
if (num_clients == 0)
|
|
return -1;
|
|
|
|
for (i = 0; i < num_clients; i++) {
|
|
pci_client = find_parent_pci_dev(clients[i]);
|
|
if (!pci_client) {
|
|
if (verbose)
|
|
dev_warn(clients[i],
|
|
"cannot be used for peer-to-peer DMA as it is not a PCI device\n");
|
|
return -1;
|
|
}
|
|
|
|
map = calc_map_type_and_dist(provider, pci_client, &distance,
|
|
verbose);
|
|
|
|
pci_dev_put(pci_client);
|
|
|
|
if (map == PCI_P2PDMA_MAP_NOT_SUPPORTED)
|
|
not_supported = true;
|
|
|
|
if (not_supported && !verbose)
|
|
break;
|
|
|
|
total_dist += distance;
|
|
}
|
|
|
|
if (not_supported)
|
|
return -1;
|
|
|
|
return total_dist;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pdma_distance_many);
|
|
|
|
/**
|
|
* pci_has_p2pmem - check if a given PCI device has published any p2pmem
|
|
* @pdev: PCI device to check
|
|
*/
|
|
static bool pci_has_p2pmem(struct pci_dev *pdev)
|
|
{
|
|
struct pci_p2pdma *p2pdma;
|
|
bool res;
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
res = p2pdma && p2pdma->p2pmem_published;
|
|
rcu_read_unlock();
|
|
|
|
return res;
|
|
}
|
|
|
|
/**
|
|
* pci_p2pmem_find_many - find a peer-to-peer DMA memory device compatible with
|
|
* the specified list of clients and shortest distance
|
|
* @clients: array of devices to check (NULL-terminated)
|
|
* @num_clients: number of client devices in the list
|
|
*
|
|
* If multiple devices are behind the same switch, the one "closest" to the
|
|
* client devices in use will be chosen first. (So if one of the providers is
|
|
* the same as one of the clients, that provider will be used ahead of any
|
|
* other providers that are unrelated). If multiple providers are an equal
|
|
* distance away, one will be chosen at random.
|
|
*
|
|
* Returns a pointer to the PCI device with a reference taken (use pci_dev_put
|
|
* to return the reference) or NULL if no compatible device is found. The
|
|
* found provider will also be assigned to the client list.
|
|
*/
|
|
struct pci_dev *pci_p2pmem_find_many(struct device **clients, int num_clients)
|
|
{
|
|
struct pci_dev *pdev = NULL;
|
|
int distance;
|
|
int closest_distance = INT_MAX;
|
|
struct pci_dev **closest_pdevs;
|
|
int dev_cnt = 0;
|
|
const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
|
|
int i;
|
|
|
|
closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
|
|
if (!closest_pdevs)
|
|
return NULL;
|
|
|
|
for_each_pci_dev(pdev) {
|
|
if (!pci_has_p2pmem(pdev))
|
|
continue;
|
|
|
|
distance = pci_p2pdma_distance_many(pdev, clients,
|
|
num_clients, false);
|
|
if (distance < 0 || distance > closest_distance)
|
|
continue;
|
|
|
|
if (distance == closest_distance && dev_cnt >= max_devs)
|
|
continue;
|
|
|
|
if (distance < closest_distance) {
|
|
for (i = 0; i < dev_cnt; i++)
|
|
pci_dev_put(closest_pdevs[i]);
|
|
|
|
dev_cnt = 0;
|
|
closest_distance = distance;
|
|
}
|
|
|
|
closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
|
|
}
|
|
|
|
if (dev_cnt)
|
|
pdev = pci_dev_get(closest_pdevs[get_random_u32_below(dev_cnt)]);
|
|
|
|
for (i = 0; i < dev_cnt; i++)
|
|
pci_dev_put(closest_pdevs[i]);
|
|
|
|
kfree(closest_pdevs);
|
|
return pdev;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pmem_find_many);
|
|
|
|
/**
|
|
* pci_alloc_p2pmem - allocate peer-to-peer DMA memory
|
|
* @pdev: the device to allocate memory from
|
|
* @size: number of bytes to allocate
|
|
*
|
|
* Returns the allocated memory or NULL on error.
|
|
*/
|
|
void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
|
|
{
|
|
void *ret = NULL;
|
|
struct percpu_ref *ref;
|
|
struct pci_p2pdma *p2pdma;
|
|
|
|
/*
|
|
* Pairs with synchronize_rcu() in pci_p2pdma_release() to
|
|
* ensure pdev->p2pdma is non-NULL for the duration of the
|
|
* read-lock.
|
|
*/
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (unlikely(!p2pdma))
|
|
goto out;
|
|
|
|
ret = (void *)gen_pool_alloc_owner(p2pdma->pool, size, (void **) &ref);
|
|
if (!ret)
|
|
goto out;
|
|
|
|
if (unlikely(!percpu_ref_tryget_live_rcu(ref))) {
|
|
gen_pool_free(p2pdma->pool, (unsigned long) ret, size);
|
|
ret = NULL;
|
|
}
|
|
out:
|
|
rcu_read_unlock();
|
|
return ret;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
|
|
|
|
/**
|
|
* pci_free_p2pmem - free peer-to-peer DMA memory
|
|
* @pdev: the device the memory was allocated from
|
|
* @addr: address of the memory that was allocated
|
|
* @size: number of bytes that were allocated
|
|
*/
|
|
void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
|
|
{
|
|
struct percpu_ref *ref;
|
|
struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
|
|
gen_pool_free_owner(p2pdma->pool, (uintptr_t)addr, size,
|
|
(void **) &ref);
|
|
percpu_ref_put(ref);
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_free_p2pmem);
|
|
|
|
/**
|
|
* pci_p2pmem_virt_to_bus - return the PCI bus address for a given virtual
|
|
* address obtained with pci_alloc_p2pmem()
|
|
* @pdev: the device the memory was allocated from
|
|
* @addr: address of the memory that was allocated
|
|
*/
|
|
pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
|
|
{
|
|
struct pci_p2pdma *p2pdma;
|
|
|
|
if (!addr)
|
|
return 0;
|
|
|
|
p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
|
if (!p2pdma)
|
|
return 0;
|
|
|
|
/*
|
|
* Note: when we added the memory to the pool we used the PCI
|
|
* bus address as the physical address. So gen_pool_virt_to_phys()
|
|
* actually returns the bus address despite the misleading name.
|
|
*/
|
|
return gen_pool_virt_to_phys(p2pdma->pool, (unsigned long)addr);
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
|
|
|
|
/**
|
|
* pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
|
|
* @pdev: the device to allocate memory from
|
|
* @nents: the number of SG entries in the list
|
|
* @length: number of bytes to allocate
|
|
*
|
|
* Return: %NULL on error or &struct scatterlist pointer and @nents on success
|
|
*/
|
|
struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
|
|
unsigned int *nents, u32 length)
|
|
{
|
|
struct scatterlist *sg;
|
|
void *addr;
|
|
|
|
sg = kmalloc(sizeof(*sg), GFP_KERNEL);
|
|
if (!sg)
|
|
return NULL;
|
|
|
|
sg_init_table(sg, 1);
|
|
|
|
addr = pci_alloc_p2pmem(pdev, length);
|
|
if (!addr)
|
|
goto out_free_sg;
|
|
|
|
sg_set_buf(sg, addr, length);
|
|
*nents = 1;
|
|
return sg;
|
|
|
|
out_free_sg:
|
|
kfree(sg);
|
|
return NULL;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
|
|
|
|
/**
|
|
* pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
|
|
* @pdev: the device to allocate memory from
|
|
* @sgl: the allocated scatterlist
|
|
*/
|
|
void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
|
|
{
|
|
struct scatterlist *sg;
|
|
int count;
|
|
|
|
for_each_sg(sgl, sg, INT_MAX, count) {
|
|
if (!sg)
|
|
break;
|
|
|
|
pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
|
|
}
|
|
kfree(sgl);
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
|
|
|
|
/**
|
|
* pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
|
|
* other devices with pci_p2pmem_find()
|
|
* @pdev: the device with peer-to-peer DMA memory to publish
|
|
* @publish: set to true to publish the memory, false to unpublish it
|
|
*
|
|
* Published memory can be used by other PCI device drivers for
|
|
* peer-2-peer DMA operations. Non-published memory is reserved for
|
|
* exclusive use of the device driver that registers the peer-to-peer
|
|
* memory.
|
|
*/
|
|
void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
|
|
{
|
|
struct pci_p2pdma *p2pdma;
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
if (p2pdma)
|
|
p2pdma->p2pmem_published = publish;
|
|
rcu_read_unlock();
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
|
|
|
|
/**
|
|
* pci_p2pdma_map_type - Determine the mapping type for P2PDMA transfers
|
|
* @provider: P2PDMA provider structure
|
|
* @dev: Target device for the transfer
|
|
*
|
|
* Determines how peer-to-peer DMA transfers should be mapped between
|
|
* the provider and the target device. The mapping type indicates whether
|
|
* the transfer can be done directly through PCI switches or must go
|
|
* through the host bridge.
|
|
*/
|
|
enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provider,
|
|
struct device *dev)
|
|
{
|
|
enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
|
struct pci_dev *pdev = to_pci_dev(provider->owner);
|
|
struct pci_dev *client;
|
|
struct pci_p2pdma *p2pdma;
|
|
int dist;
|
|
|
|
if (!pdev->p2pdma)
|
|
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
|
|
|
if (!dev_is_pci(dev))
|
|
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
|
|
|
client = to_pci_dev(dev);
|
|
|
|
rcu_read_lock();
|
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
|
|
|
if (p2pdma)
|
|
type = xa_to_value(xa_load(&p2pdma->map_types,
|
|
map_types_idx(client)));
|
|
rcu_read_unlock();
|
|
|
|
if (type == PCI_P2PDMA_MAP_UNKNOWN)
|
|
return calc_map_type_and_dist(pdev, client, &dist, true);
|
|
|
|
return type;
|
|
}
|
|
|
|
void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state,
|
|
struct device *dev, struct page *page)
|
|
{
|
|
struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page_pgmap(page));
|
|
|
|
if (state->mem == p2p_pgmap->mem)
|
|
return;
|
|
|
|
state->mem = p2p_pgmap->mem;
|
|
state->map = pci_p2pdma_map_type(p2p_pgmap->mem, dev);
|
|
}
|
|
|
|
/**
|
|
* pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
|
|
* to enable p2pdma
|
|
* @page: contents of the value to be stored
|
|
* @p2p_dev: returns the PCI device that was selected to be used
|
|
* (if one was specified in the stored value)
|
|
* @use_p2pdma: returns whether to enable p2pdma or not
|
|
*
|
|
* Parses an attribute value to decide whether to enable p2pdma.
|
|
* The value can select a PCI device (using its full BDF device
|
|
* name) or a boolean (in any format kstrtobool() accepts). A false
|
|
* value disables p2pdma, a true value expects the caller
|
|
* to automatically find a compatible device and specifying a PCI device
|
|
* expects the caller to use the specific provider.
|
|
*
|
|
* pci_p2pdma_enable_show() should be used as the show operation for
|
|
* the attribute.
|
|
*
|
|
* Returns 0 on success
|
|
*/
|
|
int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
|
|
bool *use_p2pdma)
|
|
{
|
|
struct device *dev;
|
|
|
|
dev = bus_find_device_by_name(&pci_bus_type, NULL, page);
|
|
if (dev) {
|
|
*use_p2pdma = true;
|
|
*p2p_dev = to_pci_dev(dev);
|
|
|
|
if (!pci_has_p2pmem(*p2p_dev)) {
|
|
pci_err(*p2p_dev,
|
|
"PCI device has no peer-to-peer memory: %s\n",
|
|
page);
|
|
pci_dev_put(*p2p_dev);
|
|
return -ENODEV;
|
|
}
|
|
|
|
return 0;
|
|
} else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) {
|
|
/*
|
|
* If the user enters a PCI device that doesn't exist
|
|
* like "0000:01:00.1", we don't want kstrtobool to think
|
|
* it's a '0' when it's clearly not what the user wanted.
|
|
* So we require 0's and 1's to be exactly one character.
|
|
*/
|
|
} else if (!kstrtobool(page, use_p2pdma)) {
|
|
return 0;
|
|
}
|
|
|
|
pr_err("No such PCI device: %.*s\n", (int)strcspn(page, "\n"), page);
|
|
return -ENODEV;
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pdma_enable_store);
|
|
|
|
/**
|
|
* pci_p2pdma_enable_show - show a configfs/sysfs attribute indicating
|
|
* whether p2pdma is enabled
|
|
* @page: contents of the stored value
|
|
* @p2p_dev: the selected p2p device (NULL if no device is selected)
|
|
* @use_p2pdma: whether p2pdma has been enabled
|
|
*
|
|
* Attributes that use pci_p2pdma_enable_store() should use this function
|
|
* to show the value of the attribute.
|
|
*
|
|
* Returns 0 on success
|
|
*/
|
|
ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
|
|
bool use_p2pdma)
|
|
{
|
|
if (!use_p2pdma)
|
|
return sprintf(page, "0\n");
|
|
|
|
if (!p2p_dev)
|
|
return sprintf(page, "1\n");
|
|
|
|
return sprintf(page, "%s\n", pci_name(p2p_dev));
|
|
}
|
|
EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show);
|