mississippi: multiple-source shortest paths and betweenness centrality in planar graphs (C)

David Eisenstat

March 2013, revised June 2013 and April 2014

Download

The latest release is April 2014.

Introduction

mississippi is an MIT-licensed C implementation of the Cabello–Chambers–Erickson algorithm for multiple-source shortest paths (MSSP) in planar graphs with nonnegative lengths. Also included is my algorithm for exact betweenness centrality in planar graphs, which uses Cabello–Chambers–Erickson as a subroutine and is described in Chapter 6 of my PhD dissertation Toward practical planar graph algorithms. On my quad-core workstation, this algorithm computes betweenness centrality for the entire planarized road network of the United States (28.1 million vertices and 35.7 million edges) in less than two and a half hours.

The April 2014 release of mississippi is intended primarily for researchers who wish to replicate my dissertation experiments. Nevertheless, I have worked hard to ensure that it is correct in all observable aspects and that the MSSP library conforms to the C99 standard.1 I have taken particular care to handle shortest paths whose length exceeds INT_MAX.

Version history

March 2013 was the first public release. April 2014 added the rand experiment and c.c.

Executable files

The scripts assume that the current working directory is the root directory of the archive. To rerun the experiments, execute the following commands.

scripts/get_shapefiles
scripts/make_inputs
scripts/run_compare
scripts/run_parbc
scripts/run_scale
scripts/run_sssp
scripts/run_rand
scripts/make_data

For the binaries stv and linv and bc and parbc, the enumeration constant DL controls the debug level. Level 1 enables linear-time checks. Level 2 enables more expensive checks. Level 3 enables verbose output.

bin/convert (map/main/convert.c)

Reads from stdin a catenation of TIGER/Line Shapefiles containing road data. Interprets the PolyLine shapes therein as a planar straight-line graph with Euclidean lengths, deletes all self-loops, elides vertices of degree two while leaving new self-loops intact, and writes to stdout a struct ms_gr with native layout. The optional first argument specifies a path to which the array of vertex coordinates is written with native layout.

bin/stv and bin/linv (submain/test.c)

Reads from stdin a struct ms_gr with native layout. Computes for every vertex a shortest-path tree rooted at that vertex and writes performance data to stdout.

bin/bc (submain/bc.c) and bin/parbc (submain/parbc.c)

Reads from stdin a struct ms_gr with native layout. Computes for every vertex the betweenness centrality of that vertex and writes performance data to stdout. The environment variable NUM_THREADS specifies the number of threads spawned by parbc.

bin/sssp (submain/sssp.c)

Reads from stdin a struct ms_gr with native layout. Computes a shortest-path forest and writes performance data to stdout.

bin/rand (submain/rand.c)

Reads from stdin a struct ms_gr with native layout. Writes to stderr one thousand samples of the number of changes between two random root vertices and writes performance data to stdout.

scripts/get_shapefiles

Downloads many gigabytes of road data from the United States Census Bureau, writing into the directory ~/shapefiles.

scripts/make_inputs

Converts the road data to graph and coordinate files, writing into the directory ~/inputs. Memory required: 18 GiB.

scripts/run_compare

Runs linv and stv on the state/territory graph files.

scripts/run_parbc

Runs parbc with NUM_THREADS=$(getconf _NPROCESSORS_ONLN) on the state/territory graph files.

scripts/run_scale

Runs bc and parbc with varying numbers of threads on the graph files ~/inputs/48.gr (Texas) and ~/inputs/us.gr. Memory required: 1 GiB plus 1.5 GiB per thread.

scripts/run_sssp

Runs sssp on the graph file ~/inputs/us.gr.

scripts/run_rand

Runs rand on the graph file ~/inputs/us.gr.

scripts/make_data

Combines the raw data in the directory results into three files, writing into the directory data.

Internals

Hic sunt dracones.

Planar graphs

Planar graphs are stored as rotation systems with asymmetric lengths. They have type struct ms_gr, whose last member is a C99 flexible array of struct ms_d.

    struct ms_d {
      int vdi;
      int vi;
      int fi;
      int l;
    };

    struct ms_gr {
      int nc;
      int nv;
      int ne;
      int nf;
      struct ms_d d[];
    };

Member nc is the number of connected components. Member nv is the number of vertices. Member ne is the number of edges, at most INT_MAX / 2 - 1. Member nf is the number of faces. The quantities nv - ne + nf and nc * 2 are equal. Member d has length (1 + ne) * 2.

Edges are integers between 1 and ne inclusive. Every edge ei consists of two half-edges or darts ei * 2 and ei * 2 + 1 that are reverse darts of each other. Consequently, darts are integers between 2 and (1 + ne) * 2 exclusive, and the reverse dart of each dart di is di ^ 1.

Each dart di has a head vertex d[di].vi between 0 and nv exclusive, a right face d[di].fi between 0 and nf exclusive, and a nonnegative length d[di].l. For every dart di, the tail vertex and left face of di are, respectively, the head vertex and right face of the reverse dart di ^ 1. Every vertex is the head vertex of some dart, and every face is the right face of some dart. For every edge, the sum of the lengths of its darts is at most INT_MAX. There is no way to specify a unidirectional edge. One workaround is to make the dart that should be missing very long.

The maps did[di].vdi and did[di].vdi ^ 1 are permutations. For every dart di, dart d[di].vdi is the next dart with the same head vertex as di in counterclockwise order. Dart d[di].vdi ^ 1 the next dart with the same right face as di in clockwise order. Two darts have the same head vertex if and only if they belong to the same orbit of did[di].vdi. They have the same right face if and only if they belong to the same orbit of did[di].vdi ^ 1.

Forests

Forests (more precisely, sets of arborescences whose unions comprise spanning forests) are stored as arrays int *pr. For every nonroot vertex vi, the value pr[vi] is the dart in the forest with head vertex vi. The other values are 0.

Operations (ms.h)

All operations assert their success. All except ms_wrgr and ms_chr allocate temporary storage.

gr.c

    struct ms_gr *ms_rdgr(FILE *fp);

Reads a graph with native layout from the stream fp into storage obtained by calling malloc. Running time: O(n).

    void ms_wrgr(struct ms_gr const *gr, FILE *fp);

Writes the graph gr with native layout to the stream fp. Running time: O(n).

    void ms_xgr(struct ms_gr *gr);

The graph gr must be valid except for nc and nf and fi. Sets those members to make gr valid. Running time: O(n).

    void ms_ckgr(struct ms_gr const *gr);

The array gr->d must have length (1 + gr->ne) * 2. Asserts that gr is a valid planar graph. Running time: O(n).

pr.c

    void ms_ckpr(struct ms_gr const *gr, int const *pr);

The array pr must have length gr->nv. Asserts that pr is a valid forest. Running time: O(n).

    int *ms_mkt(struct ms_gr const *gr, int const *pr);

In storage obtained by calling malloc, returns the vertices of the graph gr in the order that they are first visited by root-first noncrossing Euler tours of the forest pr. Running time: O(n).

sp.c

    int *ms_mksp(struct ms_gr const *gr, unsigned **kp);

In storage obtained by calling malloc, returns a shortest-path forest and, via *kp, its vertex distances modulo UINT_MAX + 1. Since every dart has length at most INT_MAX, the distances compared by Dijkstra’s algorithm differ by at most that much, and such comparisons require only the remainders. Running time: O(n log n).

    void ms_cksp(struct ms_gr const *gr, int const *pr, unsigned const *k);

The arrays pr and k must have length gr->nv. Asserts that pr is a valid shortest-path forest whose vertex distances modulo UINT_MAX + 1 are k. Running time: O(n).

    unsigned *ms_mkk(struct ms_gr const *gr, int const *pr);

In storage obtained by calling malloc, returns the vertex distances modulo UINT_MAX + 1 of the forest pr. Running time: O(n).

f.c

    struct ms_f *ms_mkf(struct ms_gr const *gr, int const *pr, unsigned const *k);

In storage obtained by calling malloc, constructs the data structures used by the multiple-source shortest-path algorithm. Running time: O(n log n) for stv, O(n) for linv.

    void ms_chr(struct ms_gr const *gr, int *pr, struct ms_f *f,
                void (*op)(void *o, int cdi, int ldi), void *o,
                int rvi);

Changes to rvi the root vertex of the shortest-path tree of pr to which rvi belongs. The changes to the shortest-path tree are reflected in pr and then reported by calling *op with first argument o. Argument cdi is the dart that was deleted, or 0 if no deletion occurred. Argument ldi is the dart that was inserted, or 0 if no insertion occurred. These darts, if they both exist, have the same head vertex. Running time: O(k log n) amortized for stv, O(k n) worst-case for linv; k is the number of changes.

The argument f is obtained by calling ms_mkf. Every call to ms_chr involving f must have the same arguments gr and pr as the call to ms_mkf. The graph gr must not change between the calls to ms_mkf and ms_chr. The shortest-path forest pr must change only by valid calls to ms_chr.

dt.c

stv/dt.c and linv/dt.c implement the same abstract data type for dynamic trees. stv provides a Sleator–Tarjan tree whose operations run in time O(log n) amortized. linv provides a naive tree whose link and cut operations run in constant time but whose path operations run in time proportional to the length of the path.

mst.c

    int *ms_mkmst(struct ms_gr const *gr);

In storage obtained by calling malloc, returns a minimum spanning forest. Every edge has length equal to the sum of the lengths of its darts, perturbed for uniqueness by the index of that edge times a fixed infinitesimal. Running time: O(n), via alternating primal and dual Borůvka passes.

    void ms_ckmst(struct ms_gr const *gr, int const *pr);

The array pr must have length gr->nv. Asserts that pr is a minimum spanning forest for the edge lengths specified above. Running time: O(n2).

c.c

    int *ms_mkvci(struct ms_gr const *gr);

In storage obtained by calling malloc, returns a labeling of vertices by connected component. The labels are integers between 1 and gr->nc inclusive. Running time: O(n).

    void ms_ckvci(struct ms_gr const *gr, int const *vci);

The array vci must have length gr->nv. Asserts that vci is a valid output of ms_mkvci. Running time: O(n).


© 2013–2014 David Eisenstat


  1. mississippi requires that the representation of a signed integer type have the same number of padding bits as that of the corresponding unsigned type. Recently I became aware that the C99 standard does not. If you’re wondering why I didn’t use exact-width types, consider what happens when two large uint32_ts are added in an execution environment with 33-bit ints. It’s long past time for the C standards committee to declare such pathological environments to be second-class citizens.↩︎