this is a c++ cuda program for matrix multiplication of both square and non-square matrices
The size of input files should be more than 32*32
The input file should have row and column size in the first row. Then each row should have row elements separated by tab.
There are 4 execution arguments. First 2 are matrices to be multiplied in order, 3rd one is output and 4th one is for keeping log of time taken.