Page MenuHomePhabricator

Varnish does not vary elasticsearch query by request body
Closed, DeclinedPublic

Description

Elastic Search uses GET (with a request body) for complex _search API requests:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html

However, Varnish on https://logstash.wikimedia.org/elasticsearch/_search does not Vary the cache by the request body, so the same result is returned even when the request body is changed.

I tried doing POST but that results in a 404

Event Timeline

ema triaged this task as Medium priority.Sep 28 2017, 2:51 PM

@dbarratt can you please provide some examples, including request/response headers and body, the behavior you're seeing and the one you'd expect?

Thanks!

@dbarratt can you please provide some examples, including request/response headers and body, the behavior you're seeing and the one you'd expect?

Thanks!

Here's an example request

curl --request GET --url https://logstash.wikimedia.org/elasticsearch/_search --header 'authorization: Basic REDACTED' --header 'content-type: application/json'  --data '{"query":{"term":"error"}}'

I cannot provide the body because the body may contain provide information, but here are the headers:

Date: Mon, 23 Oct 2017 22:26:38 GMT
Content-Type: application/json; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache
kbn-name: kibana
kbn-version: 5.3.3
cache-control: no-cache
content-encoding: gzip
Backend-Timing: D=494393 t=1508797597685458
Vary: Authorization, Accept-Encoding
X-Varnish: 122373115, 11763704
Via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Accept-Ranges: bytes
Age: 0
X-Cache: cp1058 pass, cp1051 pass
X-Cache-Status: pass
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
X-Analytics: WMF-Last-Access=23-Oct-2017;https=1
X-Client-IP: REDACTED

This is exactly what I would expect.

However, if I then make this request:

curl --request GET --url https://logstash.wikimedia.org/elasticsearch/_search --header 'authorization: Basic REDACTED' --header 'content-type: application/json'  --data '{"query":{"term":"alskdjflaksdfsdf"}}'

I get the exact same request body and these headers:

Date: Mon, 23 Oct 2017 22:28:11 GMT
Content-Type: application/json; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache
kbn-name: kibana
kbn-version: 5.3.3
cache-control: no-cache
content-encoding: gzip
Backend-Timing: D=555304 t=1508797690573222
Vary: Authorization, Accept-Encoding
X-Varnish: 44603017, 30001862
Via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Accept-Ranges: bytes
Age: 0
X-Cache: cp1061 pass, cp1051 pass
X-Cache-Status: pass
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
X-Analytics: WMF-Last-Access=23-Oct-2017;https=1
X-Client-IP: REDACTED

Any change to the request body has no effect what-so-ever on the response. This makes sending requests with a request body completely useless as only the first request will be returned until the cache expires.

I suppose i can add that the reason it has to be GET, rather than POST, is because the kibana application that receives these requests and proxies them to elasticsearch only proxies GET requests. If it tried to proxy POST it would require a good bit more complexity to ensure the requests don't perform writes.

I'm surprised that this doesn't work as is though, because the kibana application we use for dashboarding depends fairly heavily on getting appropriate responses.

I doubt Varnish in default config does anything about GET request bodies, they're a fairly non-standard thing. I think our current versions of Varnish are capable, but we'll need to do some configuration work to make it happen.

Actually on closer review, kibana is allowing some POST requests to a limited set of endpoints, but not your _search endpoint:

https://github.com/elastic/kibana/blob/v5.3.3/src/core_plugins/elasticsearch/index.js#L122-L127

createProxy(server, 'GET', '/{paths*}');
createProxy(server, 'POST', '/_mget');
createProxy(server, 'POST', '/{index}/_search');
createProxy(server, 'POST', '/{index}/_field_stats');
createProxy(server, 'POST', '/_msearch');
createProxy(server, 'POST', '/_search/scroll');

Looks like if you post to _msearch instead of _search things should work out just fine. _msearch is elasticsearch's multi-search, basically a method of providing multiple search requests at once. You can of course send a single request that way.

Actually on closer review, kibana is allowing some POST requests to a limited set of endpoints, but not your _search endpoint:

[...]

Looks like if you post to _msearch instead of _search things should work out just fine. _msearch is elasticsearch's multi-search, basically a method of providing multiple search requests at once. You can of course send a single request that way.

Would this be a solution @dbarratt?

I tried a POST request to the _msearch endpoint, but got this response:

{
	"statusCode": 400,
	"error": "Bad Request",
	"message": "Request must contain an kbn-xsrf header"
}

I added the kbn-version header, with a value of 5.3.3 but now I am getting:

{
	"error": {
		"root_cause": [
			{
				"type": "json_e_o_f_exception",
				"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@304f0404; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@304f0404; line: 1, column: 3]"
			}
		],
		"type": "json_e_o_f_exception",
		"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@304f0404; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@304f0404; line: 1, column: 3]"
	},
	"status": 500
}

Here is the request:

curl --request POST --url https://logstash.wikimedia.org/elasticsearch/_msearch --header 'authorization: Basic REDACTED' --header 'content-type: application/json' --header 'kbn-version: 5.3.3' --data '{"query":{"term": "asdfasdfasdf"}}'

Maybe the syntax is different? it looks the same from the docs.

Yes the syntax is slightly different:

  • you need to set Content-Type: application/x-ndjson
  • every request must be formed of 2 lines:
    • first line some metadata such as the index you want to query
    • second line the search request body

see https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

  • first line some metadata such as the index you want to query

What is our index? errr.. what is the default index (in our instance)?

@dbarratt sadly I don't know all the details of this cluster, but you could get it working by not specifying an index:

with a requests file as:

{}
{"query":{"term": {"_all": "test"}}}

And the following curl command:

curl -u 'User:Pass' -H 'Content-Type: application/x-ndjson' -H 'kbn-version: 5.3.3'  -XPOST https://logstash.wikimedia.org/elasticsearch/_msearch --data-binary "@requests"

you'll get some data out of elastic.

@dcausse so that does work, so @ema this is a valid work around, although, imho, it's not elegant.

I've updated the docs so others hopefully will not get tripped up by this.

However, I think it would be better if we fixed it. :)

fgiunchedi subscribed.

Tentatively resolving since things are working as intended