Skip to content

Commit a19ba04

Browse files
Waymo Researchsabeek
authored andcommitted
Merged commit includes the following changes:
369731584 by Waymo Research: Internal changes -- 369724720 by Waymo Research: Set minimum score and IOU with flag, tidy up the console outputs -- 369556120 by Waymo Research: Add features for frame timestamp and rolling shutter -- 368078025 by Waymo Research: Enable motion metrics in get_bp_metrics_handler.cc. -- 365599653 by Waymo Research: Fix the matcher to work when there are a different number of latency vs submission detections -- 365491346 by Waymo Research: Relax the box comparison thresholds -- PiperOrigin-RevId: 369731584
1 parent de02ef6 commit a19ba04

File tree

6 files changed

+94
-57
lines changed

6 files changed

+94
-57
lines changed

waymo_open_dataset/latency/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ User-submitted models will take the form of a Python module named `wod_latency_s
2020
Converting from Frame protos to usable point clouds/images can be non-trivially expensive (involving various unzippings and transforms) and does not reflect a workflow that would realistically be present in an autonomous driving scenario. Thus, our evaluation of submitted models does not time the conversion from Frame proto to tensor. Instead, we have pre-extracted the dataset into numpy ndarrays. The keys, shapes, and data types are:
2121

2222
* `POSE`: 4x4 float32 array with the vehicle pose.
23+
* `TIMESTAMP`: int64 scalar with the timestamp of the frame in microseconds.
2324
* For each lidar:
2425
* `<LIDAR_NAME>_RANGE_IMAGE_FIRST_RETURN`: HxWx6 float32 array with the range image of the first return for this lidar. The six channels are range, intensity, elongation, x, y, and z. The x, y, and z values are in vehicle frame. Pixels with range 0 are not valid points.
2526
* `<LIDAR_NAME>_RANGE_IMAGE_SECOND_RETURN`: HxWx6 float32 array with the range image of the first return for this lidar. Same channels as the first return range image.
@@ -34,6 +35,10 @@ Converting from Frame protos to usable point clouds/images can be non-trivially
3435
* `<CAMERA_NAME>_EXTRINSIC`: 4x4 float32 array with the 4x4 extrinsic matrix for this camera.
3536
* `<CAMERA_NAME>_WIDTH`: int64 scalar with the width of this camera image.
3637
* `<CAMERA_NAME>_HEIGHT`: int64 scalar with the height of this camera image.
38+
* `<CAMERA_NAME>_POSE`: 4x4 float32 array with the vehicle pose at the timestamp of this camera image.
39+
* `<CAMERA_NAME>_POSE_TIMESTAMP`: float32 scalar with the timestamp in seconds for the image (i.e. the timestamp that `<CAMERA_NAME>_POSE` is valid at).
40+
* `<CAMERA_NAME>_ROLLING_SHUTTER_DURATION`: float32 scalar with the duration of the rolling shutter in seconds. See the documentation for `CameraImage.shutter in [dataset.proto](https://github.com/waymo-research/waymo-open-dataset/blob/eb7d74d1e11f40f5f8485ae8e0dc71f0944e8661/waymo_open_dataset/dataset.proto#L268-L283) for details.
41+
* `<CAMERA_NAME>_ROLLING_SHUTTER_DIRECTION`: int64 scalar with the direction of the rolling shutter, expressed as the int value of a `CameraCalibration.RollingShutterReadOutDirection` enum.
3742

3843
See the `LaserName.Name` and `CameraName.Name` enums in [dataset.proto](https://github.com/waymo-research/waymo-open-dataset/blob/eb7d74d1e11f40f5f8485ae8e0dc71f0944e8661/waymo_open_dataset/dataset.proto#L48-L69) for the valid lidar and camera name strings.
3944

waymo_open_dataset/latency/compare_objects_file_to_submission_main.cc

Lines changed: 66 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,18 @@ limitations under the License.
3333
#include "waymo_open_dataset/protos/submission.pb.h"
3434

3535
ABSL_FLAG(std::string, latency_result_filename, {},
36-
"Comma separated list of sharded files that contains "
37-
"car.open_dataset.Objects proto from the latency evaluation"
38-
"scripts..");
36+
"File that contains the car.open_dataset.Objects proto from the "
37+
"latency evaluation scripts.");
3938
ABSL_FLAG(std::vector<std::string>, full_result_filenames, {},
4039
"Comma separated list of sharded files that contains "
4140
"car.open_dataset.Objects proto from user provided submissions.");
41+
ABSL_FLAG(double, iou_threshold, 0.9,
42+
"IOU threshold to match detections between the latency evaluator "
43+
"results and the user submission.");
44+
ABSL_FLAG(double, minimum_score, 0.0,
45+
"Minimum score of detections to consider. Detections with scores "
46+
"lower than this will not be checked for equivalence between the "
47+
"submission proto and the latency evaluation script.");
4248

4349
namespace waymo {
4450
namespace open_dataset {
@@ -48,15 +54,15 @@ namespace {
4854
// proto with ones from the objects file. Uses very high IOU thresholds since
4955
// the boxes should be nearly identical. If is_3d is true, this binary will do 3
5056
// IOU matching; otherwise, it will do 2D axis-aligned IOU matching.
51-
Config GetConfig(bool is_3d) {
57+
Config GetConfig(bool is_3d, double iou_threshold) {
5258
Config config;
5359

5460
config.set_matcher_type(MatcherProto::TYPE_HUNGARIAN);
55-
config.add_iou_thresholds(0.9);
56-
config.add_iou_thresholds(0.9);
57-
config.add_iou_thresholds(0.9);
58-
config.add_iou_thresholds(0.9);
59-
config.add_iou_thresholds(0.9);
61+
config.add_iou_thresholds(iou_threshold);
62+
config.add_iou_thresholds(iou_threshold);
63+
config.add_iou_thresholds(iou_threshold);
64+
config.add_iou_thresholds(iou_threshold);
65+
config.add_iou_thresholds(iou_threshold);
6066
if (is_3d) {
6167
config.set_box_type(Label::Box::TYPE_3D);
6268
} else {
@@ -77,8 +83,6 @@ Objects ReadObjectsFromFile(const std::vector<std::string>& paths) {
7783
const std::string content((std::istreambuf_iterator<char>(s)),
7884
std::istreambuf_iterator<char>());
7985
if (!objs.ParseFromString(content)) {
80-
LOG(ERROR) << "Could not parse " << path
81-
<< " as Objects file. Trying as a Submission file.";
8286
Submission submission;
8387
if (!submission.ParseFromString(content)) {
8488
LOG(FATAL) << "Could not parse " << path << " as submission either.";
@@ -100,7 +104,8 @@ Objects ReadObjectsFromFile(const std::vector<std::string>& paths) {
100104
// dimensions, confidence score, and class name) are nearly identical.
101105
// Returns 0 if the two sets of results match and 1 otherwise.
102106
int Compute(const std::string& latency_result_filename,
103-
const std::vector<std::string>& full_result_filename) {
107+
const std::vector<std::string>& full_result_filename,
108+
double iou_threshold, double minimum_score) {
104109
using KeyTuple = std::tuple<std::string, int64, CameraName::Name>;
105110
Objects latency_result_objs = ReadObjectsFromFile({latency_result_filename});
106111
Objects full_result_objs = ReadObjectsFromFile(full_result_filename);
@@ -120,20 +125,22 @@ int Compute(const std::string& latency_result_filename,
120125

121126
bool is_2d;
122127
for (auto& o : *latency_result_objs.mutable_objects()) {
123-
const KeyTuple key(o.context_name(), o.frame_timestamp_micros(),
124-
o.camera_name());
125128
is_2d = o.object().box().has_heading();
126-
latency_result_map[key].push_back(std::move(o));
129+
if (o.score() >= minimum_score) {
130+
const KeyTuple key(o.context_name(), o.frame_timestamp_micros(),
131+
o.camera_name());
132+
latency_result_map[key].push_back(std::move(o));
133+
}
127134
}
128135
for (auto& o : *full_result_objs.mutable_objects()) {
129-
const KeyTuple key(o.context_name(), o.frame_timestamp_micros(),
130-
o.camera_name());
131-
full_result_map[key].push_back(std::move(o));
136+
if (o.score() >= minimum_score) {
137+
const KeyTuple key(o.context_name(), o.frame_timestamp_micros(),
138+
o.camera_name());
139+
full_result_map[key].push_back(std::move(o));
140+
}
132141
}
133142

134-
std::cout << latency_result_map.size() << " frames found.\n";
135-
136-
const Config config = GetConfig(is_2d);
143+
const Config config = GetConfig(is_2d, iou_threshold);
137144
std::unique_ptr<Matcher> matcher = Matcher::Create(config);
138145

139146
// This loop iterates over the key-value pairs in the latency result map
@@ -144,29 +151,34 @@ int Compute(const std::string& latency_result_filename,
144151
const auto& latency_results = kv.second;
145152
auto full_result_it = full_result_map.find(example_key);
146153
if (full_result_it == full_result_map.end()) {
147-
std::cerr << print_key(example_key) << " not found in full results"
148-
<< std::endl;
154+
LOG(FATAL) << print_key(example_key)
155+
<< " in latency evaluator results but not in submission.";
149156
return 1;
150157
}
151158
const auto& full_results = full_result_it->second;
152159

153160
const size_t num_detections = latency_results.size();
154-
if (full_results.size() != num_detections) {
155-
std::cerr << "Different number of detections found: " << num_detections
156-
<< " in latency results, " << full_results.size()
157-
<< " in full results for frame " << print_key(example_key)
158-
<< std::endl;
159-
return 1;
161+
162+
// Keep track of the number of detections that do not match, starting by
163+
// subtracting the number of detections in the latency results from the
164+
// number of detections in the full results since that difference
165+
// constitutes detections in the full results that cannot have a match in
166+
// the latency results.
167+
size_t unmatched_detections = 0;
168+
if (full_results.size() > num_detections) {
169+
unmatched_detections = full_results.size() - num_detections;
160170
}
161171

162172
// Run the Hungarian matcher on the two sets of results from this frame.
163173
matcher->SetPredictions(latency_results);
164174
matcher->SetGroundTruths(full_results);
165175

166-
std::vector<int> subset(num_detections);
167-
std::iota(subset.begin(), subset.end(), 0);
168-
matcher->SetPredictionSubset(subset);
169-
matcher->SetGroundTruthSubset(subset);
176+
std::vector<int> pred_subset(num_detections);
177+
std::iota(pred_subset.begin(), pred_subset.end(), 0);
178+
matcher->SetPredictionSubset(pred_subset);
179+
std::vector<int> gt_subset(full_results.size());
180+
std::iota(gt_subset.begin(), gt_subset.end(), 0);
181+
matcher->SetGroundTruthSubset(gt_subset);
170182

171183
std::vector<int> matches;
172184
matcher->Match(&matches, nullptr);
@@ -175,37 +187,33 @@ int Compute(const std::string& latency_result_filename,
175187
const Object& latency_obj = latency_results[latency_ind];
176188
const int full_ind = matches[latency_ind];
177189
if (full_ind < 0) {
178-
std::cerr << "No match found for object " << latency_ind
179-
<< " for frame " << print_key(example_key) << std::endl;
180-
return 1;
190+
LOG(INFO) << "No match found for object " << latency_ind
191+
<< " for frame " << print_key(example_key) << std::endl
192+
<< latency_obj.DebugString();
193+
++unmatched_detections;
194+
continue;
181195
}
182196
const Object& full_obj = full_results[full_ind];
183197

184-
if (std::abs(latency_obj.score() - full_obj.score()) > 1e-3 ||
185-
std::abs(latency_obj.object().box().center_x() -
186-
full_obj.object().box().center_x()) > 1e-3 ||
187-
std::abs(latency_obj.object().box().center_y() -
188-
full_obj.object().box().center_y()) > 1e-3 ||
189-
std::abs(latency_obj.object().box().center_z() -
190-
full_obj.object().box().center_z()) > 1e-3 ||
191-
std::abs(latency_obj.object().box().length() -
192-
full_obj.object().box().length()) > 1e-3 ||
193-
std::abs(latency_obj.object().box().width() -
194-
full_obj.object().box().width()) > 1e-3 ||
195-
std::abs(latency_obj.object().box().height() -
196-
full_obj.object().box().height()) > 1e-3 ||
197-
std::abs(latency_obj.object().box().heading() -
198-
full_obj.object().box().heading()) > 1e-3 ||
199-
latency_obj.object().type() != full_obj.object().type()) {
200-
std::cerr << "Matched objects for frame " << print_key(example_key)
198+
if (std::abs(latency_obj.score() - full_obj.score()) > 0.05) {
199+
LOG(INFO) << "Matched objects for frame " << print_key(example_key)
201200
<< " are not identical: " << std::endl
202201
<< latency_obj.DebugString() << std::endl
203202
<< "vs" << std::endl
204203
<< full_obj.DebugString();
205-
return 1;
204+
++unmatched_detections;
206205
}
207206
}
208207

208+
if (unmatched_detections > 0.05 * num_detections) {
209+
LOG(FATAL) << "Latency evaluator results did not match submission "
210+
<< "proto for " << print_key(example_key) << std::endl
211+
<< unmatched_detections << " detections out of "
212+
<< num_detections << " did not match. This exceeds our "
213+
<< "cut-off of 5% of detections being unmatched.";
214+
return 1;
215+
}
216+
209217
std::cout << "Results matched for " << print_key(example_key) << std::endl;
210218
}
211219

@@ -224,7 +232,10 @@ int main(int argc, char* argv[]) {
224232
absl::GetFlag(FLAGS_latency_result_filename);
225233
const std::vector<std::string> full_result_filennames =
226234
absl::GetFlag(FLAGS_full_result_filenames);
235+
const double iou_threshold = absl::GetFlag(FLAGS_iou_threshold);
236+
const double minimum_score = absl::GetFlag(FLAGS_minimum_score);
227237

228238
return waymo::open_dataset::Compute(latency_result_filename,
229-
full_result_filennames);
239+
full_result_filennames, iou_threshold,
240+
minimum_score);
230241
}

waymo_open_dataset/latency/make_objects_file_from_latency_results.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,5 +148,6 @@ def make_object_list_from_subdir(np_dir,
148148
objects.objects.extend(make_object_list_from_subdir(
149149
timestamp_dir, context_name, int(timestamp_micros)))
150150

151+
print('Got ', len(objects.objects), 'objects')
151152
with open(args.output_file, 'wb') as f:
152153
f.write(objects.SerializeToString())

waymo_open_dataset/latency/run_latency_evaluation.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ docker run --rm \
3838
# submission.
3939
OBJS_FILE=$(mktemp)
4040
$MAKE_OBJS_CMD --results_dir $DETECTION_OUTPUT_DIR --output_file $OBJS_FILE
41-
$COMPARE_OBJS_TO_SUBMISSION_CMD $OBJS_FILE $SUBMISSION_PB
41+
$COMPARE_OBJS_TO_SUBMISSION_CMD --latency_result_filename $OBJS_FILE --full_result_filenames $SUBMISSION_PB
4242

4343
# Clean up the outputs of the accuracy check.
4444
sudo rm -rf $DETECTION_OUTPUT_DIR $OBJS_FILE

waymo_open_dataset/metrics/motion_metrics.cc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,9 @@ Status ValidatePredictions(const MotionMetricsConfig& config,
140140
const Scenario& scenario) {
141141
// Validate that the scenario IDs match.
142142
if (scenario_predictions.scenario_id() != scenario.scenario_id()) {
143-
return InvalidArgumentError("Scenario IDs do not match.");
143+
return InvalidArgumentError(
144+
"Scenario IDs do not match : " + scenario_predictions.scenario_id() +
145+
" vs. " + scenario.scenario_id());
144146
}
145147

146148
// Validate the predictions trajectory lengths and construct a set of the

waymo_open_dataset/utils/frame_utils.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ def convert_frame_to_dict(frame):
217217
218218
The keys, shapes, and data types are:
219219
POSE: 4x4 float32 array
220+
TIMESTAMP: int64 scalar
220221
221222
For each lidar:
222223
<LIDAR_NAME>_BEAM_INCLINATION: H float32 array
@@ -233,6 +234,11 @@ def convert_frame_to_dict(frame):
233234
<CAMERA_NAME>_EXTRINSIC: 4x4 float32 array
234235
<CAMERA_NAME>_WIDTH: int64 scalar
235236
<CAMERA_NAME>_HEIGHT: int64 scalar
237+
<CAMERA_NAME>_SDC_VELOCITY: 6 float32 array
238+
<CAMERA_NAME>_POSE: 4x4 float32 array
239+
<CAMERA_NAME>_POSE_TIMESTAMP: float32 scalar
240+
<CAMERA_NAME>_ROLLING_SHUTTER_DURATION: float32 scalar
241+
<CAMERA_NAME>_ROLLING_SHUTTER_DIRECTION: int64 scalar
236242
237243
NOTE: This function only works in eager mode for now.
238244
@@ -291,6 +297,15 @@ def convert_frame_to_dict(frame):
291297
for im in frame.images:
292298
cam_name_str = dataset_pb2.CameraName.Name.Name(im.name)
293299
data_dict[f'{cam_name_str}_IMAGE'] = tf.io.decode_jpeg(im.image).numpy()
300+
data_dict[f'{cam_name_str}_SDC_VELOCITY'] = np.array([
301+
im.velocity.v_x, im.velocity.v_y, im.velocity.v_z, im.velocity.w_x,
302+
im.velocity.w_y, im.velocity.w_z
303+
], np.float32)
304+
data_dict[f'{cam_name_str}_POSE'] = np.reshape(
305+
np.array(im.pose.transform, np.float32), (4, 4))
306+
data_dict[f'{cam_name_str}_POSE_TIMESTAMP'] = np.array(
307+
im.pose_timestamp, np.float32)
308+
data_dict[f'{cam_name_str}_ROLLING_SHUTTER_DURATION'] = np.array(im.shutter)
294309

295310
# Save the intrinsics, 4x4 extrinsic matrix, width, and height of each camera.
296311
for c in frame.context.camera_calibrations:
@@ -300,6 +315,8 @@ def convert_frame_to_dict(frame):
300315
np.array(c.extrinsic.transform, np.float32), [4, 4])
301316
data_dict[f'{cam_name_str}_WIDTH'] = np.array(c.width)
302317
data_dict[f'{cam_name_str}_HEIGHT'] = np.array(c.height)
318+
data_dict[f'{cam_name_str}_ROLLING_SHUTTER_DURATION'] = np.array(
319+
c.rolling_shutter_direction)
303320

304321
# Save the range image pixel pose for the top lidar.
305322
data_dict['TOP_RANGE_IMAGE_POSE'] = np.reshape(
@@ -308,5 +325,6 @@ def convert_frame_to_dict(frame):
308325

309326
data_dict['POSE'] = np.reshape(
310327
np.array(frame.pose.transform, np.float32), (4, 4))
328+
data_dict['TIMESTAMP'] = np.array(frame.timestamp_micros)
311329

312330
return data_dict

0 commit comments

Comments
 (0)