Author Topic: memory usage overshoot in parallel ensemble average calculations with SDv2.6.1  (Read 5768 times)

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi all,

I define my events like in the examples below
Code: [Select]
ev = ({opI["x"], 10^-10} & /@ Range[2000]);
Table[evs[i] = ev, {i, 1, 2000}];
and then I calculate the ensemble average like
Code: [Select]
Trajectory[opI["z"], evs[i], EnsembleAverage -> {i, Range@2000}][
  opI["z"]];
or
Code: [Select]
ev = ({opI["x"], 10^-10} & /@ Range[2000]);
evs = ev & /@ Range[2000];
and then ensemble average
Code: [Select]
Trajectory[opI["z"], evs[[i]], EnsembleAverage -> {i, Range@2000}][
  opI["z"]];

SpinDynamica uses much more RAM memory than is necessary,  in both of the examples. With my 8 core computer the memory usage jumps from 3GiB to over 11GiB before the 8 cores start calculating the trajectories. The ByteCount[] of evs of the latter example is about 1.3GiB. It seems SpinDynamica distributes definitions of all the event lists to all 8 kernels.

In the latter example all the sub-kernels also print a harmless message
Code: [Select]
Part::pspec: Part specification i is neither an integer nor a list of integers.
The above problems can be fixed by replacing
Code: [Select]
NotebookDelete[temp];
CombineData[
 Parallelize[
  Table[
   TransformationAmplitudeTable[
     ops,
     events /. varsubrules[i],
     more,
     Evaluate@SimplifyRules[
       EnsembleAverage -> False,
       OperatorBasis -> opbasis,
       Basis -> basis,
       ReleaseHold[Hold[opts] /. varsubrules[i]]
       ]
     ][\[Rho]ini /. varsubrules[i]],
   {i, 1, nens}
   ],
  Method -> "CoarsestGrained",
  DistributedContexts -> All
  ],
 weights,
 Parallel -> False
 ]

with

Code: [Select]
Table[evs[i] = events /. varsubrules[i], {i, 1, nens}];
NotebookDelete[temp];
CombineTrajectoryFunctions[
 Parallelize[
  Table[
   Trajectory[
     ops,
     evs[i],
     more,
     Evaluate@SimplifyRules[
       EnsembleAverage -> False,
       OperatorBasis -> opbasis,
       Basis -> basis,
       ReleaseHold[Hold[opts] /. varsubrules[i]]
       ]
     ][\[Rho]ini /. varsubrules[i]],
   {i, 1, nens}
   ],
  Method -> "CoarsestGrained",
  DistributedContexts -> All
  ],
 weights,
 Sequence @@ FilterRules[
   Join[{Parallel -> False, opts}, Options[Trajectory]],
   Options[CombineTrajectoryFunctions]
   ]
 ]
in the ensemble average modules.

Jyrki.

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
This problem exists also in SDv2.6.2b2.

MalcolmHLevitt

  • Administrator
  • Member
  • *****
  • Posts: 107
    • View Profile
Hi Jyrki,
 thanks for this. If you get time then please download v2.6.2b3

https://www.dropbox.com/sh/x26heihcib5mbqu/1MN3HBANk5

please insert your new code using colours to indicate the changes and share it as v2.6.2b4

thanks!
malcolm

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi Malcolm,

thanks for SDv2.6.2b3. I created SDv2.6.2b4 and its available here

https://www.dropbox.com/sh/ptfd6fcwedr3lcv/EjpwCr1yJU

I implemented the above modifications to Trajectory,Signal1D,TransformationAmplitudeTable and TransformationAmplitude.

Additionally I changed codes
Code: [Select]
events /. varsubrules[i] ;  \[Rho]ini /. varsubrules[i];
to
Code: [Select]
ReleaseHold[Hold[events] /. varsubrules[i]]; ReleaseHold[Hold[\[Rho]ini] /. varsubrules[i]];in-order to prevent SpinDynamica print the
Code: [Select]
Part::pspec: Part specification i is neither an integer nor a list of integers.messages.

I also noticed that
Code: [Select]
pwtfuncs =Transpose@Map[Function[tfunc,Function[t, Evaluate@Assuming[(First[#] < t < Last[#]), Refine[tfunc[[2]][t]]]] & /@ tintervals],tfuncs];code in combine single TrajectoryFunctions sub-section of Trajectory causes the combining of TrajectoryFunctions to take quite long time. I changed it to
Code: [Select]
pwtfuncs = Transpose@Map[Function[tfunc, Function[Evaluate[tfunc[[2]][[1]]], #] & /@ (First@(List @@ tfunc[[2]][[2]]) /. {trajf_, _LessEqual | _Inequality} :> trajf)], tfuncs];which seems to reduce the combining time significantly.

I marked all the modifications with blue and I left all the sections, which I modified, open.

Jyrki.
« Last Edit: June 14, 2013, 05:18:02 PM by Jyrki Rantaharju »

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi Malcolm,

yesterday I tested SDv2.6.2b4 with the examples from my first message to this thread. The modifications reduced the memory used, for distribution of the event lists, from 8GiB to about 1GiB.

Today I tried SDv2.6.2b4 with EPR simulation and the memory used for distribution of the event lists was still huge, about 6GiB. The ByteCount[] of the event lists was about 2GiB and I used 8 kernels.

I will try to get the memory usage on control but it might be that this would require inconveniently large modifications to the parallelisation of SpinDynamica. After all I'm probably only one using SpinDynamica to simulations with event lists containing thousands of events.

I'm currently using SDv2.5.0, with my own parallelisation code, for my simulations and it works fine.

Jyrki.

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
One way to fix the above memory issue would be to evaluate the table of Trajectories so that contents of the Trajectories are on Hold
Code: [Select]
table = Table[Trajectory[Hold[Evaluate@ops,Evaluate@ReleaseHold[Hold[events] /. varsubrules[i]],Evaluate@more,
Evaluate@SimplifyRules[EnsembleAverage -> False,OperatorBasis -> opbasis,Basis -> basis,Evaluate@ReleaseHold[Hold[opts] /. varsubrules[i]]]]][Evaluate@ReleaseHold[Hold[\[Rho]ini] /. varsubrules[i]]],{i, 1, nens}];
and then evaluate the  Trajectories in parallel
Code: [Select]
CombineTrajectoryFunctions[Parallelize[ReleaseHold[#] & /@ table,Method -> "CoarsestGrained",DistributedContexts -> "Global`"],weights,
 Sequence @@ FilterRules[Join[{Parallel -> False, opts}, Options[Trajectory]],Options[CombineTrajectoryFunctions]]]

This way varsubrules would not need to be distributed to the kernels.

I will run test simulations with these changes and report how it goes.
« Last Edit: June 16, 2013, 06:20:41 PM by Jyrki Rantaharju »

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi all,

I made the above changes to the Trajectory routine of SDv2.6.2b4 and ran the same EPR simulation I mentioned above (average over 200 trajectories, event list of each containing 2000 events). The simulation took about 6h and used 7.2GiB of memory. 
Simulation of single event list took about 16min and used about 0.3GiB of memory.

I will now run the same simulation again without the changes and report how much memory it uses.

With SDv2.5.0 simulation of single event list, same as above, took only 1.5min and used 0.3GiB of memory.

Jyrki.
« Last Edit: June 18, 2013, 08:08:51 PM by Jyrki Rantaharju »

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
I did run the EPR simulation without the changes to SDv2.6.2b4 and it used about 12GiB of memory and took about 6h. One of the 8 kernels died during the simulation so some of the Trajectory calculations failed.

I also simulated the trajectory of a single event list ( one of the 200 event lists) again and this time it took only 6min, I probably made some error in the previous single event list calculation. With SDv2.5.0 the simulation of single event list took 1.5min and with SDv2.6.1 it took 14min.


What do you think about the above modification, could something like that be implemented to SpinDynamica?

Jyrki.
« Last Edit: June 19, 2013, 01:22:42 PM by Jyrki Rantaharju »

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi all,

I traced the above timing differences to the diagonalization routine  of  NPropagate.

example with SDv2.5.0
Code: [Select]
events = {opI["z"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 =  TimeUsed[]; t2 - t1
3.63623

events = {opI["x"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 = TimeUsed[]; t2 - t1
5.62835

example with SD2.6.2b4

Code: [Select]
events = {opI["z"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 =TimeUsed[]; t2 - t1
3.87224

events = {opI["x"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 =TimeUsed[]; t2 - t1
82.0211

The diagonalization routine of SDv2.5.0 seems to be faster.


with SDv2.6.2b4 I then tried to set the UseDiagonalizationWhenPossible option to false , but it did not help
Code: [Select]
events = {opI["z"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True, UseDiagonalizationWhenPossible -> False][opI["z"]]; t2 =TimeUsed[]; t2 - t1

4.06025

events = {opI["x"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True,UseDiagonalizationWhenPossible -> False][opI["z"]]; t2 =TimeUsed[]; t2 - t1

82.1651

I then did force feed the UseDiagonalizationWhenPossible -> False option to the diagonalization routine

Code: [Select]
HoldPattern@NPropagate[Event[{Lsop_?SuperoperatorQ, \[Tau]_}, {tb_?NumericQ, ta_?NumericQ}, evopts___Rule], opbasis_, basis_, opts___Rule][\[Rho]ini_Operator]
/;(And @@ {Continuous, UseDiagonalizationWhenPossible}/.{UseDiagonalizationWhenPossible -> False} /. {opts} /. Options[NPropagate]) && MatchQ[PropagatorCalculationProcedure /. {evopts} /. Options[NPropagate], IntegrateEvolutionOverTime] :=Module[...];

it worked
Code: [Select]
events = {opI["z"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 =TimeUsed[]; t2 - t1

3.87624

events = {opI["x"], 10^-10} & /@ Range[200];t1 = TimeUsed[]; NPropagate[events, Continuous -> True][opI["z"]]; t2 =TimeUsed[]; t2 - t1

6.02838

Jyrki.
« Last Edit: June 21, 2013, 03:01:45 PM by Jyrki Rantaharju »

MalcolmHLevitt

  • Administrator
  • Member
  • *****
  • Posts: 107
    • View Profile
Hi Jyrki,
 unfortunately the version SD v.2.6.2b4 you posted is no longer there. Can you repost it, please?
 all the best
malcolm

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile
Hi Malcolm,

I re-posted the SDv2.6.2b4. It's available here: https://www.dropbox.com/sh/ptfd6fcwedr3lcv/EjpwCr1yJU

Best regards,
Jyrki.
« Last Edit: July 11, 2013, 04:58:10 PM by Jyrki Rantaharju »

JyrkiRantaharju

  • Member
  • *
  • Posts: 36
    • View Profile

Yesterday I shared SDv2.6.2b4 which, due to accidental modification,  had some problem with loading the packages.

I created SDv2.6.2b5 from SDv2.6.2b3. SDv2.6.2b5 is identical to SDv2.6.2b4 excluding the accidental modification.

SDv2.6.2b5 is available here: https://www.dropbox.com/sh/xii7p8u99d8i59r/nKbHDwiEPN

Jyrki

MalcolmHLevitt

  • Administrator
  • Member
  • *****
  • Posts: 107
    • View Profile
I hope that the new release SDv2.7.1 is somewhat more efficient in this regard. I've incorporated some of your suggestions (with some changes)! However, I have to confess that optimizing memory usage has not been the priority for the work in our group so I have not put in maximum effort in that regard.

I could identify various places where more work is needed but I do not have time right now.

Nevertheless please try out SDv2.7.1 and let me know!

Malcolm