March 2013, revised June 2013 and April 2014
The latest release is April 2014.
mississippi
is an MIT-licensed C implementation of the Cabello–Chambers–Erickson algorithm for multiple-source shortest paths (MSSP) in planar graphs with nonnegative lengths. Also included is my algorithm for exact betweenness centrality in planar graphs, which uses Cabello–Chambers–Erickson as a subroutine and is described in Chapter 6 of my PhD dissertation Toward practical planar graph algorithms. On my quad-core workstation, this algorithm computes betweenness centrality for the entire planarized road network of the United States (28.1 million vertices and 35.7 million edges) in less than two and a half hours.
The April 2014 release of mississippi
is intended primarily for researchers who wish to replicate my dissertation experiments. Nevertheless, I have worked hard to ensure that it is correct in all observable aspects and that the MSSP library conforms to the C99 standard.1 I have taken particular care to handle shortest paths whose length exceeds INT_MAX
.
March 2013 was the first public release. April 2014 added the rand
experiment and c.c
.
The scripts assume that the current working directory is the root directory of the archive. To rerun the experiments, execute the following commands.
scripts/get_shapefiles
scripts/make_inputs
scripts/run_compare
scripts/run_parbc
scripts/run_scale
scripts/run_sssp
scripts/run_rand
scripts/make_data
For the binaries stv
and linv
and bc
and parbc
, the enumeration constant DL
controls the debug level. Level 1
enables linear-time checks. Level 2
enables more expensive checks. Level 3
enables verbose output.
Reads from stdin
a catenation of TIGER/Line Shapefiles containing road data. Interprets the PolyLine shapes therein as a planar straight-line graph with Euclidean lengths, deletes all self-loops, elides vertices of degree two while leaving new self-loops intact, and writes to stdout
a struct ms_gr
with native layout. The optional first argument specifies a path to which the array of vertex coordinates is written with native layout.
Reads from stdin
a struct ms_gr
with native layout. Computes for every vertex a shortest-path tree rooted at that vertex and writes performance data to stdout
.
Reads from stdin
a struct ms_gr
with native layout. Computes for every vertex the betweenness centrality of that vertex and writes performance data to stdout
. The environment variable NUM_THREADS
specifies the number of threads spawned by parbc
.
Reads from stdin
a struct ms_gr
with native layout. Computes a shortest-path forest and writes performance data to stdout
.
Reads from stdin
a struct ms_gr
with native layout. Writes to stderr
one thousand samples of the number of changes between two random root vertices and writes performance data to stdout
.
Downloads many gigabytes of road data from the United States Census Bureau, writing into the directory ~/shapefiles
.
Converts the road data to graph and coordinate files, writing into the directory ~/inputs
. Memory required: 18 GiB.
Runs linv
and stv
on the state/territory graph files.
Runs parbc
with NUM_THREADS=$(getconf _NPROCESSORS_ONLN)
on the state/territory graph files.
Runs bc
and parbc
with varying numbers of threads on the graph files ~/inputs/48.gr
(Texas) and ~/inputs/us.gr
. Memory required: 1 GiB plus 1.5 GiB per thread.
Runs sssp
on the graph file ~/inputs/us.gr
.
Runs rand
on the graph file ~/inputs/us.gr
.
Combines the raw data in the directory results
into three files, writing into the directory data
.
Hic sunt dracones.
Planar graphs are stored as rotation systems with asymmetric lengths. They have type struct ms_gr
, whose last member is a C99 flexible array of struct ms_d
.
struct ms_d {
int vdi;
int vi;
int fi;
int l;
};
struct ms_gr {
int nc;
int nv;
int ne;
int nf;
struct ms_d d[];
};
Member nc
is the number of connected components. Member nv
is the number of vertices. Member ne
is the number of edges, at most INT_MAX / 2 - 1
. Member nf
is the number of faces. The quantities nv - ne + nf
and nc * 2
are equal. Member d
has length (1 + ne) * 2
.
Edges are integers between 1
and ne
inclusive. Every edge ei
consists of two half-edges or darts ei * 2
and ei * 2 + 1
that are reverse darts of each other. Consequently, darts are integers between 2
and (1 + ne) * 2
exclusive, and the reverse dart of each dart di
is di ^ 1
.
Each dart di
has a head vertex d[di].vi
between 0
and nv
exclusive, a right face d[di].fi
between 0
and nf
exclusive, and a nonnegative length d[di].l
. For every dart di
, the tail vertex and left face of di
are, respectively, the head vertex and right face of the reverse dart di ^ 1
. Every vertex is the head vertex of some dart, and every face is the right face of some dart. For every edge, the sum of the lengths of its darts is at most INT_MAX
. There is no way to specify a unidirectional edge. One workaround is to make the dart that should be missing very long.
The maps di
↦ d[di].vdi
and di
↦ d[di].vdi ^ 1
are permutations. For every dart di
, dart d[di].vdi
is the next dart with the same head vertex as di
in counterclockwise order. Dart d[di].vdi ^ 1
the next dart with the same right face as di
in clockwise order. Two darts have the same head vertex if and only if they belong to the same orbit of di
↦ d[di].vdi
. They have the same right face if and only if they belong to the same orbit of di
↦ d[di].vdi ^ 1
.
Forests (more precisely, sets of arborescences whose unions comprise spanning forests) are stored as arrays int *pr
. For every nonroot vertex vi
, the value pr[vi]
is the dart in the forest with head vertex vi
. The other values are 0
.
All operations assert their success. All except ms_wrgr
and ms_chr
allocate temporary storage.
struct ms_gr *ms_rdgr(FILE *fp);
Reads a graph with native layout from the stream fp
into storage obtained by calling malloc
. Running time: O(n).
void ms_wrgr(struct ms_gr const *gr, FILE *fp);
Writes the graph gr
with native layout to the stream fp
. Running time: O(n).
void ms_xgr(struct ms_gr *gr);
The graph gr
must be valid except for nc
and nf
and fi
. Sets those members to make gr
valid. Running time: O(n).
void ms_ckgr(struct ms_gr const *gr);
The array gr->d
must have length (1 + gr->ne) * 2
. Asserts that gr
is a valid planar graph. Running time: O(n).
void ms_ckpr(struct ms_gr const *gr, int const *pr);
The array pr
must have length gr->nv
. Asserts that pr
is a valid forest. Running time: O(n).
int *ms_mkt(struct ms_gr const *gr, int const *pr);
In storage obtained by calling malloc
, returns the vertices of the graph gr
in the order that they are first visited by root-first noncrossing Euler tours of the forest pr
. Running time: O(n).
int *ms_mksp(struct ms_gr const *gr, unsigned **kp);
In storage obtained by calling malloc
, returns a shortest-path forest and, via *kp
, its vertex distances modulo UINT_MAX + 1
. Since every dart has length at most INT_MAX
, the distances compared by Dijkstra’s algorithm differ by at most that much, and such comparisons require only the remainders. Running time: O(n log n).
void ms_cksp(struct ms_gr const *gr, int const *pr, unsigned const *k);
The arrays pr
and k
must have length gr->nv
. Asserts that pr
is a valid shortest-path forest whose vertex distances modulo UINT_MAX + 1
are k
. Running time: O(n).
unsigned *ms_mkk(struct ms_gr const *gr, int const *pr);
In storage obtained by calling malloc
, returns the vertex distances modulo UINT_MAX + 1
of the forest pr
. Running time: O(n).
struct ms_f *ms_mkf(struct ms_gr const *gr, int const *pr, unsigned const *k);
In storage obtained by calling malloc
, constructs the data structures used by the multiple-source shortest-path algorithm. Running time: O(n log n) for stv
, O(n) for linv
.
void ms_chr(struct ms_gr const *gr, int *pr, struct ms_f *f,
void (*op)(void *o, int cdi, int ldi), void *o,
int rvi);
Changes to rvi
the root vertex of the shortest-path tree of pr
to which rvi
belongs. The changes to the shortest-path tree are reflected in pr
and then reported by calling *op
with first argument o
. Argument cdi
is the dart that was deleted, or 0
if no deletion occurred. Argument ldi
is the dart that was inserted, or 0
if no insertion occurred. These darts, if they both exist, have the same head vertex. Running time: O(k log n) amortized for stv
, O(k n) worst-case for linv
; k is the number of changes.
The argument f
is obtained by calling ms_mkf
. Every call to ms_chr
involving f
must have the same arguments gr
and pr
as the call to ms_mkf
. The graph gr
must not change between the calls to ms_mkf
and ms_chr
. The shortest-path forest pr
must change only by valid calls to ms_chr
.
stv/dt.c
and linv/dt.c
implement the same abstract data type for dynamic trees. stv
provides a Sleator–Tarjan tree whose operations run in time O(log n) amortized. linv
provides a naive tree whose link and cut operations run in constant time but whose path operations run in time proportional to the length of the path.
int *ms_mkmst(struct ms_gr const *gr);
In storage obtained by calling malloc
, returns a minimum spanning forest. Every edge has length equal to the sum of the lengths of its darts, perturbed for uniqueness by the index of that edge times a fixed infinitesimal. Running time: O(n), via alternating primal and dual Borůvka passes.
void ms_ckmst(struct ms_gr const *gr, int const *pr);
The array pr
must have length gr->nv
. Asserts that pr
is a minimum spanning forest for the edge lengths specified above. Running time: O(n2).
int *ms_mkvci(struct ms_gr const *gr);
In storage obtained by calling malloc
, returns a labeling of vertices by connected component. The labels are integers between 1
and gr->nc
inclusive. Running time: O(n).
void ms_ckvci(struct ms_gr const *gr, int const *vci);
The array vci
must have length gr->nv
. Asserts that vci
is a valid output of ms_mkvci
. Running time: O(n).
© 2013–2014 David Eisenstat
mississippi
requires that the representation of a signed integer type have the same number of padding bits as that of the corresponding unsigned type. Recently I became aware that the C99 standard does not. If you’re wondering why I didn’t use exact-width types, consider what happens when two large uint32_t
s are added in an execution environment with 33-bit int
s. It’s long past time for the C standards committee to declare such pathological environments to be second-class citizens.↩︎