Strictly speaking, the basis set is arbitrary -- the local clusterization is just one of the reasonable methods that seem to work for NMR. It looks at what is connected to what and includes those product states that are judged to be important.

Indeed clusters can overlap by one or more spins, in which case the corresponding state spaces cannot be disentangled. But the direct sum of two subspaces (with intersection carefully removed to avoid duplication) is still a lot smaller than the direct product -- this is where the savings are coming from.

After the state list is compiled (using whatever method, you can simply tell Spinach to keep the states you want to keep), the rest is rigorous Lie algebra representation theory: operators are built in that basis and returned to the user. So the basis is up to the user ultimately, clusterization just happens to work rather well for NMR.

In general, the stages are:

1. Heuristic reduction based on our knowledge of the system. Could be clusterization, could be coherence order filter, could be blanket restriction to correlations lower than a certain level.

2. Construction of the Liouvillian matrix and state vectors in the resulting basis.

3. Analysis of Liouvillian to determine if it has any dynamically disconnected subspaces (symmetries, conservation laws, etc.). Those are simulated separately.

4. At run time, ZTE cuts the dimension to the absolute minimum required.