Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • In case of multi-node training jobs, you will need to run the nvdashboard command on all the nodes.

  • Also you will need to run multiple ssh tunnel connection for each node and will fire up in separate browsers. Try to use different localhost ports:

    Code Block
    ssh -L 10101:gpu212-04:10101 username@glogin.ibex.kaust.edu.sa
    ssh -L 10102:gpu212-04:10101 username@glogin.ibex.kaust.edu.sa
    ssh -L 10103:gpu212-04:10101 username@glogin.ibex.kaust.edu.sa
    ssh -L 10104:gpu212-04:10101 username@glogin.ibex.kaust.edu.sa

    The above assumes connecting to 4 different nodes on Ibex. Your localhost (i.e. your laptop/workstation) will be listening to these nodes on 4 different ports.

  • NVLink metrics is broken at the moment and develops have developer has an open Git issue to fix it