Replies: 1 comment 6 replies
-
I believe the FLOP count for performing an operation scales roughly with the number of elements in the matrix (or the matrix representation of a gate). Your custom operation is a 512x512 matrix, and applying that matrix is much more expensive (from a FLOP perspective) than applying 9 2x2 matrices. I think this is expected behavior. My recommendation would be to make each custom operation operate on as few qubits as possible. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Executing a kernel of cudaq native gates takes approx. 1 second.
Executing an equivalent kernel of custom gates takes approx. 7 to 12 seconds. (7 first time, 12 following runs)
Description
I noticed, using custom operations defined via
cudaq.register_operation(...)
make the kernel execution time explode. I am not talking about some percentages, but factors of 10x-100x are easily reached.For example, see the following mwe using Hadamard gates (or manually composed custom operations thereof) only.
The circuit consists of 10 qubits, with a circuit depth/length/layers of 10.
In each layer, a Hadamard gate is applied to every qubit but the first (initially, I thought it had to do with the control mechanism, but it turns out it doesn't). The first kernel uses th cudaq native
h(qubit)
gates. The second kernel uses a custom gate, defined viacudaq.register_operation(...)
.Questions
File:
mwe.py
Which gives the following output:
$ python mwe.py
$ python mwe.py
Environment
tqdm
,pyqsp
,pennylane
inside the container usingpip install
Beta Was this translation helpful? Give feedback.
All reactions