NAME
    Data::DisjointSet::Shared - shared-memory union-find (disjoint-set) for
    Linux

SYNOPSIS
        use Data::DisjointSet::Shared;

        # a universe of 1000 elements (0 .. 999), anonymous shared mapping
        my $d = Data::DisjointSet::Shared->new(undef, 1000);

        $d->union(0, 1);          # merge the sets containing 0 and 1
        $d->union(1, 2);          # 0, 1, 2 are now one set
        $d->merge(3, 4);          # merge is an alias for union

        $d->connected(0, 2);      # true  -- same set
        $d->same(0, 3);           # false -- 'same' is an alias for connected
        $d->find(2);              # canonical representative (root) of 2's set
        $d->set_size(0);          # number of elements in 0's set (3)

        $d->num_sets;             # how many disjoint sets remain
        $d->capacity;             # the fixed element count (1000)

        # union many pairs in a single lock acquisition (flat [a0,b0,a1,b1,...])
        my $merged = $d->union_many([ 5,6, 6,7, 8,9 ]);   # returns how many merged

        $d->reset;                # back to all singletons (num_sets == capacity)

        # share across processes via a backing file
        my $shared = Data::DisjointSet::Shared->new("/tmp/dsu.bin", 1000);

DESCRIPTION
    A union-find (disjoint-set) structure in shared memory. It maintains a
    partition of a fixed universe of N integer elements (numbered "0 ..
    N-1") into disjoint sets. The three core operations are:

    *   union(a, b) -- merge the set containing "a" with the set containing
        "b" into a single set.

    *   find(x) -- return the *canonical representative* (the root) of the
        set containing "x". Two elements are in the same set if and only if
        they have the same root.

    *   connected(a, b) -- test whether "a" and "b" are currently in the
        same set.

    The structure uses path compression (path halving on "find") together
    with union by size (the larger-sized root always becomes the parent),
    which gives near-constant amortized time per operation (the
    inverse-Ackermann bound). Each element is one "parent" slot and one
    "size" slot, both 32-bit, so the payload is "8 * N" bytes regardless of
    how many unions are performed; the mapping also carries a fixed ~16 KB
    cross-process reader table, so the "mmap_size" stat reports the true
    total.

    Because the "parent"/"size" arrays live in a shared mapping, several
    processes share one structure: any process that opens the same backing
    file, inherits the anonymous mapping across "fork", or reopens a passed
    memfd, sees the others' unions and contributes its own. A
    write-preferring futex rwlock with dead-process recovery guards every
    mutation, so unions from many processes serialize cleanly and the final
    partition is well defined.

    IMPORTANT: "find", "connected" and "set_size" perform path compression
    -- they mutate the structure to keep future queries fast -- and
    therefore acquire the write lock. They are *not* read-only operations.
    Only "num_sets" and "capacity" are cheap reads ("capacity" is lock-free;
    it is immutable after construction).

    There is no operation that unions one whole structure into another. A
    disjoint-set is a partition of one fixed universe; combining it with a
    separate structure is not meaningful. To combine elements, use "union"
    or "union_many" within a single structure. Linux-only. Requires 64-bit
    Perl.

METHODS
  Constructors
        my $d = Data::DisjointSet::Shared->new($path, $n);
        my $d = Data::DisjointSet::Shared->new(undef, $n);          # anonymous
        my $d = Data::DisjointSet::Shared->new_memfd($name, $n);
        my $d = Data::DisjointSet::Shared->new_from_fd($fd);

    $path is the backing file ("undef" or omitted for an anonymous mapping).
    $n is the number of elements: the universe is "0 .. $n-1". It must be
    ">= 1" and "<= 2**31"; "new" and "new_memfd" croak if $n is out of
    range. A freshly created structure starts with every element in its own
    singleton set, so "num_sets == $n" and "set_size($i) == 1" for every $i.

    When reopening an existing file or memfd, the stored "n" wins and the
    existing partition is preserved; the $n you pass to "new" on a reopen is
    only used when the file is brand new. "new_memfd" creates a Linux memfd
    (transferable via its "memfd" descriptor); "new_from_fd" reopens one in
    another process.

  Union and query
        my $merged = $d->union($a, $b);      # merge; 1 if newly merged, 0 if already together
        my $merged = $d->merge($a, $b);      # alias for union
        my $root   = $d->find($x);           # canonical representative of $x's set
        my $bool   = $d->connected($a, $b);  # true if $a and $b are in the same set
        my $bool   = $d->same($a, $b);       # alias for connected
        my $size   = $d->set_size($x);       # number of elements in $x's set

    "union" merges the set containing $a with the set containing $b and
    returns 1 if they were in different sets (a merge happened, and
    "num_sets" decreased by one) or 0 if they were already in the same set
    (nothing changed). "merge" is a short alias.

    "find" returns the root of $x's set -- a stable representative that is
    equal for all members of the same set. "connected" (aliased "same")
    returns true when $a and $b share a root. "set_size" returns the number
    of elements in $x's set.

    All four of "find", "connected", "set_size" and "union" take the write
    lock: "union" obviously mutates, and "find"/"connected"/"set_size"
    mutate via path compression. Every index argument must be in "0 ..
    capacity-1"; an out-of-range index croaks before any lock is taken (so a
    caught croak never leaks a lock).

  Bulk union
        my $merged = $d->union_many(\@pairs);

    "union_many" takes an array reference of even length holding a flat list
    of pairs, "[a0, b0, a1, b1, ...]", and unions each consecutive "(a, b)"
    pair under a single write lock. It returns the number of pairs that
    actually caused a merge (i.e. the count of "union" calls that would have
    returned 1).

    The batch is atomic with respect to validation: every index is resolved
    and range-checked before the lock is taken, so an odd-length array or
    any out-of-range index croaks without performing "any" union and without
    holding the lock. An empty array reference is a valid no-op that
    performs zero unions.

  Partition size
        my $sets = $d->num_sets;             # current number of disjoint sets
        my $sets = $d->sets;                 # alias for num_sets
        my $n    = $d->capacity;             # the fixed element count (immutable)

    "num_sets" (aliased "sets") is the current count of disjoint sets: it
    starts equal to "capacity" and decreases by one on each successful
    "union". "capacity" is the number of elements the structure was created
    with and never changes. Both are read-only; "capacity" is lock-free.

  Lifecycle
        $d->reset;                           # back to all singletons
        $d->clear;                           # alias for reset
        $d->path; $d->memfd; $d->sync; $d->unlink;   # or Class->unlink($path)

    "reset" (aliased "clear") returns every element to its own singleton
    set, so "num_sets" becomes "capacity" again. "sync" flushes the mapping
    to its backing store (a no-op for anonymous and memfd structures, which
    have none); "unlink" removes the backing file (also callable as
    "Class->unlink($path)"); "path" returns the backing path ("undef" for
    anonymous, memfd, or fd-reopened structures) and "memfd" the backing
    descriptor -- the memfd of a "new_memfd" structure or the dup'd fd of a
    "new_from_fd" structure, and -1 for file-backed or anonymous structures.

STATS
    stats() returns a hashref describing the structure:

    *   "capacity" -- the fixed number of elements ("0 .. capacity-1").

    *   "sets" -- the current number of disjoint sets ("num_sets").

    *   "ops" -- running count of operations that took the write lock
        ("union", "union_many", "find", "connected", "set_size", "reset").

    *   "mmap_size" -- bytes of the shared mapping.

SHARING ACROSS PROCESSES
    The structure lives in a shared mapping, shared the same three ways as
    the rest of the family: a backing file (every process calls "new($path,
    $n)" on the same path), an anonymous mapping inherited across "fork", or
    a memfd whose descriptor is passed to an unrelated process (over a UNIX
    socket via "SCM_RIGHTS", or via "/proc/$pid/fd/$n") and reopened with
    new_from_fd($fd). Because the mapping is shared, every process unions
    into and queries the same partition. All mutation is serialized by the
    write lock, so the final partition is independent of how the processes
    interleave.

        # parent and children share one disjoint-set with no coordination
        my $d = Data::DisjointSet::Shared->new(undef, 100);   # before fork
        unless (fork) { $d->union($_, $_ + 1) for 0 .. 49; exit }
        wait;
        print $d->num_sets, "\n";   # reflects the child's unions

SECURITY
    The mmap region is writable by all processes that open it. Do not share
    backing files with untrusted processes.

CRASH SAFETY
    Mutation is guarded by a futex-based write-preferring rwlock with
    PID-encoded ownership; if a holder dies, the next contender detects the
    dead owner and recovers. Because "union" updates a couple of words while
    holding the lock, a crash leaves the partition consistent up to the last
    completed "union". Limitation: PID reuse is not detected (very unlikely
    in practice).

SEE ALSO
    Data::Histogram::Shared, Data::CountMinSketch::Shared,
    Data::HyperLogLog::Shared, Data::BloomFilter::Shared,
    Data::Intern::Shared, Data::SortedSet::Shared,
    Data::SpatialHash::Shared, and the rest of the "Data::*::Shared" family.

AUTHOR
    vividsnow

LICENSE
    This is free software; you can redistribute it and/or modify it under
    the same terms as Perl itself.