Currently `tfq.convert_to_tensor` uses just one core and makes use of Cirq serialization protocols. They are pretty slow for large circuits. A quick benchmark shows that more than 95% of all time spent computing in `tfq.convert_to_tensor` is spent in the cirq serialization logic and the protobuf `SerializeToString` function. Since it's unlikely we can speed either of those up quickly, perhaps we should look into parallelization of `tfq.convert_to_tensor` ?