mississippi: multiple-source shortest paths and betweenness centrality in planar graphs (C)

- Download
- Introduction
- Version history
- Executable files
- bin/convert (map/main/convert.c)
- bin/stv and bin/linv (submain/test.c)
- bin/bc (submain/bc.c) and bin/parbc (submain/parbc.c)
- bin/sssp (submain/sssp.c)
- bin/rand (submain/rand.c)
- scripts/get_shapefiles
- scripts/make_inputs
- scripts/run_compare
- scripts/run_parbc
- scripts/run_scale
- scripts/run_sssp
- scripts/run_rand
- scripts/make_data

- Internals

The latest release is April 2014.

`mississippi`

is an MIT-licensed C implementation of the Cabello–Chambers–Erickson algorithm for multiple-source shortest paths (MSSP) in planar graphs with nonnegative lengths. Also included is my algorithm for exact betweenness centrality in planar graphs, which uses Cabello–Chambers–Erickson as a subroutine and is described in Chapter 6 of my PhD dissertation *Toward practical planar graph algorithms*. On my quad-core workstation, this algorithm computes betweenness centrality for the entire planarized road network of the United States (28.1 million vertices and 35.7 million edges) in less than two and a half hours.

The April 2014 release of `mississippi`

is intended primarily for researchers who wish to replicate my dissertation experiments. Nevertheless, I have worked hard to ensure that it is correct in all observable aspects and that the MSSP library conforms to the C99 standard.^{1} I have taken particular care to handle shortest paths whose length exceeds `INT_MAX`

.

March 2013 was the first public release. April 2014 added the `rand`

experiment and `c.c`

.

The scripts assume that the current working directory is the root directory of the archive. To rerun the experiments, execute the following commands.

```
scripts/get_shapefiles
scripts/make_inputs
scripts/run_compare
scripts/run_parbc
scripts/run_scale
scripts/run_sssp
scripts/run_rand
scripts/make_data
```

For the binaries `stv`

and `linv`

and `bc`

and `parbc`

, the enumeration constant `DL`

controls the debug level. Level `1`

enables linear-time checks. Level `2`

enables more expensive checks. Level `3`

enables verbose output.

Reads from `stdin`

a catenation of TIGER/Line Shapefiles containing road data. Interprets the PolyLine shapes therein as a planar straight-line graph with Euclidean lengths, deletes all self-loops, elides vertices of degree two while leaving new self-loops intact, and writes to `stdout`

a `struct ms_gr`

with native layout. The optional first argument specifies a path to which the array of vertex coordinates is written with native layout.

Reads from `stdin`

a `struct ms_gr`

with native layout. Computes for every vertex a shortest-path tree rooted at that vertex and writes performance data to `stdout`

.

Reads from `stdin`

a `struct ms_gr`

with native layout. Computes for every vertex the betweenness centrality of that vertex and writes performance data to `stdout`

. The environment variable `NUM_THREADS`

specifies the number of threads spawned by `parbc`

.

Reads from `stdin`

a `struct ms_gr`

with native layout. Computes a shortest-path forest and writes performance data to `stdout`

.

Reads from `stdin`

a `struct ms_gr`

with native layout. Writes to `stderr`

one thousand samples of the number of changes between two random root vertices and writes performance data to `stdout`

.

Downloads many gigabytes of road data from the United States Census Bureau, writing into the directory `~/shapefiles`

.

Converts the road data to graph and coordinate files, writing into the directory `~/inputs`

. Memory required: 18 GiB.

Runs `linv`

and `stv`

on the state/territory graph files.

Runs `parbc`

with `NUM_THREADS=$(getconf _NPROCESSORS_ONLN)`

on the state/territory graph files.

Runs `bc`

and `parbc`

with varying numbers of threads on the graph files `~/inputs/48.gr`

(Texas) and `~/inputs/us.gr`

. Memory required: 1 GiB plus 1.5 GiB per thread.

Runs `sssp`

on the graph file `~/inputs/us.gr`

.

Runs `rand`

on the graph file `~/inputs/us.gr`

.

Combines the raw data in the directory `results`

into three files, writing into the directory `data`

.

Hic sunt dracones.

Planar graphs are stored as rotation systems with asymmetric lengths. They have type `struct ms_gr`

, whose last member is a C99 flexible array of `struct ms_d`

.

```
struct ms_d {
int vdi;
int vi;
int fi;
int l;
};
struct ms_gr {
int nc;
int nv;
int ne;
int nf;
struct ms_d d[];
};
```

Member `nc`

is the number of connected components. Member `nv`

is the number of vertices. Member `ne`

is the number of edges, at most `INT_MAX / 2 - 1`

. Member `nf`

is the number of faces. The quantities `nv - ne + nf`

and `nc * 2`

are equal. Member `d`

has length `(1 + ne) * 2`

.

Edges are integers between `1`

and `ne`

inclusive. Every edge `ei`

consists of two half-edges or *darts* `ei * 2`

and `ei * 2 + 1`

that are reverse darts of each other. Consequently, darts are integers between `2`

and `(1 + ne) * 2`

exclusive, and the reverse dart of each dart `di`

is `di ^ 1`

.

Each dart `di`

has a head vertex `d[di].vi`

between `0`

and `nv`

exclusive, a right face `d[di].fi`

between `0`

and `nf`

exclusive, and a nonnegative length `d[di].l`

. For every dart `di`

, the tail vertex and left face of `di`

are, respectively, the head vertex and right face of the reverse dart `di ^ 1`

. Every vertex is the head vertex of some dart, and every face is the right face of some dart. For every edge, the sum of the lengths of its darts is at most `INT_MAX`

. **There is no way to specify a unidirectional edge.** One workaround is to make the dart that should be missing very long.

The maps `di`

↦ `d[di].vdi`

and `di`

↦ `d[di].vdi ^ 1`

are permutations. For every dart `di`

, dart `d[di].vdi`

is the next dart with the same head vertex as `di`

in counterclockwise order. Dart `d[di].vdi ^ 1`

the next dart with the same right face as `di`

in clockwise order. Two darts have the same head vertex if and only if they belong to the same orbit of `di`

↦ `d[di].vdi`

. They have the same right face if and only if they belong to the same orbit of `di`

↦ `d[di].vdi ^ 1`

.

Forests (more precisely, sets of arborescences whose unions comprise spanning forests) are stored as arrays `int *pr`

. For every nonroot vertex `vi`

, the value `pr[vi]`

is the dart in the forest with head vertex `vi`

. The other values are `0`

.

All operations assert their success. All except `ms_wrgr`

and `ms_chr`

allocate temporary storage.

` struct ms_gr *ms_rdgr(FILE *fp);`

Reads a graph with native layout from the stream `fp`

into storage obtained by calling `malloc`

. Running time: *O*(*n*).

` void ms_wrgr(struct ms_gr const *gr, FILE *fp);`

Writes the graph `gr`

with native layout to the stream `fp`

. Running time: *O*(*n*).

` void ms_xgr(struct ms_gr *gr);`

The graph `gr`

must be valid except for `nc`

and `nf`

and `fi`

. Sets those members to make `gr`

valid. Running time: *O*(*n*).

` void ms_ckgr(struct ms_gr const *gr);`

The array `gr->d`

must have length `(1 + gr->ne) * 2`

. Asserts that `gr`

is a valid planar graph. Running time: *O*(*n*).

` void ms_ckpr(struct ms_gr const *gr, int const *pr);`

The array `pr`

must have length `gr->nv`

. Asserts that `pr`

is a valid forest. Running time: *O*(*n*).

` int *ms_mkt(struct ms_gr const *gr, int const *pr);`

In storage obtained by calling `malloc`

, returns the vertices of the graph `gr`

in the order that they are first visited by root-first noncrossing Euler tours of the forest `pr`

. Running time: *O*(*n*).

` int *ms_mksp(struct ms_gr const *gr, unsigned **kp);`

In storage obtained by calling `malloc`

, returns a shortest-path forest and, via `*kp`

, its vertex distances modulo `UINT_MAX + 1`

. Since every dart has length at most `INT_MAX`

, the distances compared by Dijkstra’s algorithm differ by at most that much, and such comparisons require only the remainders. Running time: *O*(*n* log *n*).

` void ms_cksp(struct ms_gr const *gr, int const *pr, unsigned const *k);`

The arrays `pr`

and `k`

must have length `gr->nv`

. Asserts that `pr`

is a valid shortest-path forest whose vertex distances modulo `UINT_MAX + 1`

are `k`

. Running time: *O*(*n*).

` unsigned *ms_mkk(struct ms_gr const *gr, int const *pr);`

In storage obtained by calling `malloc`

, returns the vertex distances modulo `UINT_MAX + 1`

of the forest `pr`

. Running time: *O*(*n*).

` struct ms_f *ms_mkf(struct ms_gr const *gr, int const *pr, unsigned const *k);`

In storage obtained by calling `malloc`

, constructs the data structures used by the multiple-source shortest-path algorithm. Running time: *O*(*n* log *n*) for `stv`

, *O*(*n*) for `linv`

.

```
void ms_chr(struct ms_gr const *gr, int *pr, struct ms_f *f,
void (*op)(void *o, int cdi, int ldi), void *o,
int rvi);
```

Changes to `rvi`

the root vertex of the shortest-path tree of `pr`

to which `rvi`

belongs. The changes to the shortest-path tree are reflected in `pr`

and then reported by calling `*op`

with first argument `o`

. Argument `cdi`

is the dart that was deleted, or `0`

if no deletion occurred. Argument `ldi`

is the dart that was inserted, or `0`

if no insertion occurred. These darts, if they both exist, have the same head vertex. Running time: *O*(*k* log *n*) amortized for `stv`

, *O*(*k* *n*) worst-case for `linv`

; *k* is the number of changes.

The argument `f`

is obtained by calling `ms_mkf`

. Every call to `ms_chr`

involving `f`

must have the same arguments `gr`

and `pr`

as the call to `ms_mkf`

. The graph `gr`

must not change between the calls to `ms_mkf`

and `ms_chr`

. The shortest-path forest `pr`

must change only by valid calls to `ms_chr`

.

`stv/dt.c`

and `linv/dt.c`

implement the same abstract data type for dynamic trees. `stv`

provides a Sleator–Tarjan tree whose operations run in time *O*(log *n*) amortized. `linv`

provides a naive tree whose link and cut operations run in constant time but whose path operations run in time proportional to the length of the path.

` int *ms_mkmst(struct ms_gr const *gr);`

In storage obtained by calling `malloc`

, returns a minimum spanning forest. Every edge has length equal to the sum of the lengths of its darts, perturbed for uniqueness by the index of that edge times a fixed infinitesimal. Running time: *O*(*n*), via alternating primal and dual Borůvka passes.

` void ms_ckmst(struct ms_gr const *gr, int const *pr);`

The array `pr`

must have length `gr->nv`

. Asserts that `pr`

is a minimum spanning forest for the edge lengths specified above. Running time: *O*(*n*^{2}).

` int *ms_mkvci(struct ms_gr const *gr);`

In storage obtained by calling `malloc`

, returns a labeling of vertices by connected component. The labels are integers between `1`

and `gr->nc`

inclusive. Running time: *O*(*n*).

` void ms_ckvci(struct ms_gr const *gr, int const *vci);`

The array `vci`

must have length `gr->nv`

. Asserts that `vci`

is a valid output of `ms_mkvci`

. Running time: *O*(*n*).

© 2013–2014 David Eisenstat

`mississippi`

requires that the representation of a signed integer type have the same number of padding bits as that of the corresponding unsigned type. Recently I became aware that the C99 standard does not. If you’re wondering why I didn’t use exact-width types, consider what happens when two large`uint32_t`

s are added in an execution environment with 33-bit`int`

s. It’s long past time for the C standards committee to declare such pathological environments to be second-class citizens.↩