Skip to content

Conversation

@DrStoop
Copy link

@DrStoop DrStoop commented Mar 2, 2020

As far as I understand, the current GpuInfo is a Metric attached to Events.ITERATION_COMPLETED, meaning it logs the current GPU utilization during its downtime when the model in not updated or inferring as Engine().process_function() is completed:

class Engine():
...
   def _run_once_on_dataset(..):
        ...
        self._fire_event(Events.ITERATION_STARTED)
        self.state.output = self._process_function(self, self.state.batch)
        self._fire_event(Events.ITERATION_COMPLETED)
        ...

Therefore the measurement is not representative for the actual GPU usage during training. Or did I miss anything?

An alternative quick-fix to the suggestion below for the current GpuInfo - at least for the memory logging - would be to replace the self.nvsmi.DeviceQuery("memory.used") by torch.cuda.max_memory_allocated() which would return the maximum used GPU-memory for each iteration (don't forget to also rest the method after logging).

Description:

The suggested GpuInfo and CpuInfo are both run on independent threads logging the hardware every user-defined time interval. This should basically lead to a random logging of GPU/CPU uptime and downtime usage which represents the actual use much better.

DrStoop and others added 2 commits March 2, 2020 12:14
- bugfix of current `GpuInfo` which currently only logs GPU down time utilizations at `Events.ITERATION_COMPLETED`
- adding CPU info logging
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 2, 2020

@DrStoop AFAIK torch.cuda.max_memory_allocated() does not show the same thing as nvidia-smi, so using pynmvl is IMO better in the way if we need to match what we see with nvidia-smi.

Concerning GPU utilization, yes probably it would tricky to catch it as a metric. Maybe there can be an option to request a mean value or something...

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 2, 2020

The suggested GpuInfo and CpuInfo are both run on independent threads logging the hardware every user-defined time interval. This should basically lead to a random logging of GPU/CPU uptime and downtime usage which represents the actual use much better.

Thanks for the PR, I think we need to compare both implementations to see the diffs.

Do we care about CPU usage ?

@DrStoop
Copy link
Author

DrStoop commented Mar 2, 2020

Do we care about CPU usage ?

Depends, most OSs have multiple live CPU-logger integrated anyway, that are good enough for debugging. So it's not the most important feature.

Generally CPU become a bottleneck e.g. for data pre-processing or data handling. If you're working in a pipeline with "live'-pre-processing, this may become the relevant bottleneck, or shoveling data around in a (unwanted/unrecognized) inefficient way... The combination of both GPU & CPU can be helpful to understand why the GPU is not used to capacity while the CPU is running single-cored at 101%.

If you're running batchwise pre-processing of data, this would probably be a necessary feature... but therefore one would have to first thing about a DataPreprossorEngine ;-) E.g. in the transfer-learning-conv-ai you mentioned on slack, the pre-processing is brutally single-cored & I rewrote it as multiprocessing/threading. I used it therefore. But to be honest, the OS-CPU-tracker would have been enough.

Nevertheless, the *Info runs the thread independently of the rest of the Engine, so you could start it even when only pre-processing (without a "live-pipe"), just needed to get the loop between the GpuInfo-timestamp & the engine iterations.

Conclusion: Maybe nice to have, no real necessity..

(Note: In the framework, it defaults to n_samples_ref of the first engine started in the state and initialized by changing a single state object e.g. state.default_config.logging_all_hardware_utilization = True, x_axis_ref can also be user-defined.)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 2, 2020

Generally CPU become a bottleneck e.g. for data pre-processing or data handling. If you're working in a pipeline with "live'-pre-processing, this may become the relevant bottleneck, or shoveling data around in a (unwanted/unrecognized) inefficient way... The combination of both GPU & CPU can be helpful to understand why the GPU is not used to capacity while the CPU is running single-cored at 101%.

Yes, correct. Also I/O can be a bottleneck, such that GPU/CPU activities can be both ~0.

Nevertheless, the *Info runs the thread independently of the rest of the Engine, so you could start it even when only pre-processing (without a "live-pipe"), just needed to get the loop between the GpuInfo-timestamp & the engine iterations.

I see your point. Let me take a look at your code and I'll comment out...

Copy link
Author

@DrStoop DrStoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, first bugs came to my mind... the pytest wasn't running for other reasons.

tb_logger.attach(trainer,
log_handler=OutputHandler(tag="training", metric_names='all'),
event_name=Events.ITERATION_COMPLETED)
class GpuPynvmlLogger(Thread):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already discovered the first bugs... the pytest wasn't running for other reasons:
GpuPynvmlLogger -> GpuInfo

def attach(self, engine, name="gpu", event_name=Events.ITERATION_COMPLETED):
engine.add_event_handler(event_name, self.completed, name)
def __init__(self, logger_directory, logger_name='GPULogger', log_interval_seconds=1, unit='GB'):
super(GpuPynvmlLogger, self).__init__(name=logger_name, daemon=True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(GpuPynvmlLogger, self) -> super()

# Close tensorboard logger
self._tb_logger.close()
# Join thread
self.join()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot the attach method, e.g.:

    def attach(self, engine, name="gpu", event_name_started=Events.STARTED, event_name_completed=Events.COMPLETED):
        engine.add_event_handler(event_name_started, self.start, name)
        engine.add_event_handler(event_name_completed, self.close,name)

Copy link
Author

@DrStoop DrStoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as with gpu_info.py...

# Close tensorboard logger
self._tb_logger.close()
# Join thread
self.join()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot the attach method, e.g.:

    def attach(self, engine, name="gpu", event_name_started=Events.STARTED, event_name_completed=Events.COMPLETED):
        engine.add_event_handler(event_name_started, self.start, name)
        engine.add_event_handler(event_name_completed, self.close,name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants