Under the hood: Hash tables

[I have a few more posts describing what’s going on “under the hood.” I hope you’re finding this interesting… but what I’m really hoping for is to find commenters who are ready to discuss/critique what they’d like to see from the user-facing features. For instance, how should a counter be drawn in the circuit, and with what inputs/outputs? Or what file format should Toves use?]

Another change “under the hood” in Toves is a move away from hash tables. Logisim uses hash tables liberally: For the circuit under simulation, there is one hash table that maps locations to the values stored at those, and there is another hash table that maps circuit components to their individual state (for components that have state, like flip-flops). And there are several hash tables beyond that.

I believe a large fraction of Logisim’s simulation time is spent in hash table lookups, and so Toves is much more careful in avoiding hash tables. For example, stylistically it’s tempting to have a hash table mapping each “node” to its current “value,” since the current value is conceptually not a property of the node. But it’s more efficient for each node to have a mutable field saying what its value is. (Actually, Toves groups linked nodes together into “subnets,” and each node has a mutable field saying what subnet it belongs to; the subnet is actually where the value is stored.) To look up a node’s value, it has only to dereference a couple of pointers, which should be far faster than a hash table lookup.

In fact, right now the simulation engine in Toves includes just two hash tables. (More could easily be added as more features are added – two big missing features that potentially will involve substantial edits to the simulation engine are subcircuits and splitters.) One of these hash tables is used to track which subnets have changed in the current time step, so that subnets’ values aren’t re-computed when the result is guaranteed to be the same. Another tracks which components have seen their inputs changed, so that we don’t ask the components to recompute their output values.

As I write this, I believe each of these remaining two hash tables could be replaced with an array-backed list accompanied by a boolean flag in the subnet/instance indicating whether it is already in the list. That should be more efficient (though there are potential problems with this for a multithreaded implementation to take advantage of multicore computation, which is something that I hope in the vague future to tackle). I plan to look into that after adding subcircuits and splitters.