Library generates huge amount of requests if publishing rejected metrics

The error handling code in `HttpInlinerSender`/`QueueableSender` can lead to a huge amount of writes being pushed to influx in certain situations.

More specifically what happens:
Assuming our service is producing a metric which is rejected (in our case we were trying to write an `int` metric to a metric with type `float`), then every time `HttpInlinerSender::doSend` is called, the entire batch will be rejected. The issue arises once the `measures` queue is full. At that point every time `QueueableSender.send` is called, it will try to send the entire batch (5000 items). Assuming this batch continually gets rejected, that means for each reporting interval, for each metric, the library will try and send the entire batch.

Some back of the envelope estimates - assuming a reporting interval of 10 seconds, with 10 metrics, we would expect to see **(5000 measures * 10 metrics)/10 seconds  = 5000 measurements/second written to influx**. This is a huge amount of writes to push to influx. Even if your influx cluster can handle that load OK, it still introduces quite a bit of resource overhead for the service doing the reporting.

So there are two distinct issues here:
- The metrics that were failed to be written actually connected to Influx correctly, but a 400 was returned. However, on a non 200/204 return code `HttpInlinerSender.java:81` tries to log the following:
`LOGGER.info("failed to send {} Measures to {}://{}:{}, HTTP CODE received: {}\n", measures.size(), writeURL.getProtocol(), writeURL.getHost(), writeURL.getPort(), responseCode,  Miscellaneous.readFrom(con.getInputStream()));`
In our case the `con.getInputStream()` was failing (presumably the connection was closed on the server's end). Which then went to the `catch` block which caused the metrics to never be cleared. This issue is likely most easily fixed by just removing the `Miscellaneous.readFrom(con.getInputStream())` (especially since it seems to not be logged). I will submit a pull request for this once we test it out internally.

- I am less sure whether this is actually an issue - the catch block causes the metric back to never be cleared. If there is another scenario (outside of the one described in the last bullet) that would cause the requests to be sent to influx but also cause the library to catch an `IOException` we would expect to see the same behavior. If that's possible it is probably worth adding something to handle expiring failed measurements after a number of failed sends.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Library generates huge amount of requests if publishing rejected metrics #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Library generates huge amount of requests if publishing rejected metrics #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions