Hoppa till innehållet

Modul:Sandlådan/SM5POR/Cache

Från Wikipedia

Dokumentation [visa] [redigera] [historik] [rensa sidcachen]


Introduction

This module implements generic caching of repeated Lua function calls made using identical arguments. For this mechanism to be meaningful, the function called must be guaranteed not to produce different results for the same arguments within a short period of time, say by making the result partially dependent on a random number generator or a value read from an external communications device or service that does not come with a corresponding guarantee. It is the responsibility of the user of the Cache module to verify that any function or service called satisfies these stability criteria.

Basic concepts

Performance optimization

The cache system performs internal book-keeping in order to obtain performance data to optimize its own behavior. Specifically, the information indicated below as being collected per object category relate to categories lsted at the end. is recorded and updated as function calls are made via the cache system.

  • Temporary records, retained only during the function call, deleted after use
    • Timestamp at the receipt of each function call
    • Timestamp at the return of each function call
  • Number of calls made, by category
    • CPU time spent on function calls, by category
    • Wall clock time spent waiting for the call to return, by category
    • Maximum amount of memory used during function call, by category

Object categories

There are three category levels:

  1. Totals for the cache system, arithmetic sums of the next level categories
  2. Aggregate data for each function, arithmetic sum of the next level categories
  3. Aggregate data for each permutation of argument size

The size of each positional argument to the function is calculated in the following way:

  1. The size of each supplied argument is determined, in bits
  2. Binary classes are defined as powers of powers of two. The argument is assigned to the smallest binary class it will fit in;
    1. Class 0, pow(2, pow(2, 0)): 2 bits, 4 values (a single DNA nucleic acid, one or A, C, G or T)
    2. Class 1, pow(2, pow(2, 1)): 4 bits, 16 values (a hex digit display, 0-9 and A-F)
    3. Class 2, pow(2, pow(2, 2)): 16 bits, 65,536 values (a Z80 PC or 2-byte register)
    4. Class 3, pow(2, pow(2, 3)): 256 bits (a typical URL)
    5. Class 4, pow(2, pow(2, 4)): 65536 bits, 8 KB (a compact SVG image file)
    6. Class 5, pow(2, pow(2, 5)): 4294967296 bits, 500 MB (all the software you need)
    7. Class 6, pow(2, pow(2, 6)): 18446744073709551616 bits (the total DNA of the entire human population)
    8. Class 7, pow(2, pow(2, 7)): 340282366920938463463374607431768211456 bits

It is highly doubtful that we will ever need to consider a computer system with a total memory capacity reaching class 7, even theoretically, while class 6 (minimum size 1 exabyte) should be at least within the realm of possibility. For practical purposes though, the cache system is unlikely to deal with function arguments larger than class 5 (which starts at half a gigabyte, meaybe a video clip).

This essentially means each argument will belong to one of seven classes (class 0-6). Given the number of arguments in a function call, this will translate to pow(7, N) for N arguments:

  1. 0 arguments: pow(7, 0): 1
  2. 1 arguments: pow(7, 1): 7
  3. 2 arguments: pow(7, 2): 49
  4. 3 arguments: pow(7, 3): 343
  5. 4 arguments: pow(7, 4): 2401
  6. 5 arguments: pow(7, 5): 16807

In reality, we may be dealing with less than 100 different function argument categories. heavily concentrated towards the smaller arguments. We'll see from the statistics how it plays out.

Execution time

As the primary reason for the use of caching in the first place tends to be the desire to avoid long execution times, it is of crucial importance that the caching mechanism doesn't actually increase execution times due to convoluted calling routines and evaluation methods.

Memory usage

Since the caching mechanism uses memory storage to save execution time, and memory also is a limited resource, memory usage must be monitored so that the module can find the optimal caching strategy dynamically as the application workload changes.

Self maintenance

Security

Technical reference

Exported functions

call

cache.call( age, f, ... )

Calls the function f with the remaining arguments to the call, or obtain the result from a recent call involving the same function and arguments, depending on the age of that result as well as other circumstances. In the following, only calls to the same function with identical arguments are considered.

  • There is no earlier result stored, and no recent call made pending a result.
    • Register the call and pass it on to function f.
  • There is no earlier result stored, but another call was requested less than age seconds ago.
    • Register the new call, and put it on hold pending the result from the previous call.
  • No value, whether pending or already returned and stored, was requested less than age seconds ago.
    • Register the new call, and pass it on to function f.
  • The most recent value returned was requested less than age seconds ago.
    • Let the caller return with the stored value.
Example

local result = cache.call( 86400, primeFactors, n ) -- Use cached value if less than one day old.

control

cache.control( cmd, opt )

Performs control command cmd to the cache system, with respect to optional parameter opt (when provided, if command is applicable to it).

Commands
cmdClear
Clear the cache from all stored results pertaining to the function opt, if provided. If no function is given, clear the entire cache system (unless this operation has been disabled for security reasons).
cmdMemLimit
Set the maximum amount of memory that may be allocated for the cache system to opt bytes. When this limit is reached, begin purging old stored results that are unlikely to be used again before they expire.
Example

local result = cache.control( cache.cmdClear, primeFactors ) -- Clear cache from old factorizations.

statistics

cache.statistics( f )

Retrieve statistics regarding the performance of the cache with respect to the function f. If argument f is nil or not given, the statistics returned pertain to all cached functions in aggregate.

Example

local result = cache.statistics( f )

License

The software module described here, as well as this documentation, is available under CC0 (effectively public domain). To avoid confusion and duplicated work due to multiple forks or versions being distributed simultaneously, you are still both welcome and encouraged to contact the author to discuss potential coordination or cooperation.

local diag = require("Modul:Sandlådan/SM5POR/Diag")

local addmem = function(v, mem)
	if v.maxmem then
		v.maxmem = v.maxmem + mem
	else
		v.maxmem = mem
	end
end

local addtime = function(v, time)
	v.wclk = v.wclk + time
end

local idhook = {}

local hashtest = function()
	local hl = mw.hash.listAlgorithms()
	local hv = {}
	local c
	local n
	local m = 0
	local tn = {}
	local tn2 = {}
	local st = "Llanfairpwyllgwyngyllgogeryllllwyrndroblochllantysiliogogogosh"
	for k, ha in pairs(hl) do
		c = mw.hash.hashValue(ha, st)
		n = #c
		if tn[n] then
			tn[n][#tn[n]+1] = ha
		else
			tn[n] = {ha}
		end
		if n > m then
			m = n
		end
	end
	for k, v in pairs(tn) do
		tn2[#tn2+1] = k
	end
	hv = {#hl, #tn2, m, #tn[m], tn2}
	return diag.var("hashv", hv)
end

local varid = function(v, tl)
	local sf = idhook[type(v)]
	return mw.hash.hashValue("md5", sf(v, tl))
end

local idnil = function(v, tl)
	return "nil"
end

local idnumber = function(v, tl)
	return tostring(v)
end

local idstring = function(v, tl)
	return '"' .. v .. '"'
end

local idboolean = function(v, tl)
	return tostring(v)
end

local idtable = function(v, tl)
	local r
	local s = {}
	local id = tl[v]
	local i = 0
	if id then
		r = "tl" .. tostring(id)
	else
		tl[#tl+1] = v
		for k, w in pairs(v) do
			s[#s+1] = varid(k, tl)
			s[#s+1] = varid(w, tl)
		end
		r = table.concat(s)
	end
	return r
end

local idfunction = function(v, tl)
	return tostring(v)
end

idhook["nil"] = idnil
idhook["number"] = idnumber
idhook["string"] = idstring
idhook["boolean"] = idboolean
idhook["table"] = idtable
idhook["function"] = idfunction

local sizehook = {}

local varsize = function(v, tl)
	local sf = sizehook[type(v)]
	return sf(v, tl)
end

local sizenil = function(v, tl)
	return 1
end

local sizenumber = function(v, tl)
	local n = math.abs(v)
	local c = 0
	local p = 1
	while p < v do
		c = c + 1
		p = 2 * p
	end
	return c
end

local sizestring = function(v, tl)
	return 2 + #v * 8
end

local sizeboolean = function(v, tl)
	return 1
end

local sizetable = function(v, tl)
	local s = 2
	local k
	local w
	for k, w in pairs(v) do
		s = s + 2 + varsize(k, tl) + varsize(w, tl) 
	end
	return s
end

local sizefunction = function(v, tl)
	return 1
end

sizehook["nil"] = sizenil
sizehook["number"] = sizenumber
sizehook["string"] = sizestring
sizehook["boolean"] = sizeboolean
sizehook["table"] = sizetable
sizehook["function"] = sizefunction

local szclass = function(n)
	local c = 0
	local p = 2
	while p < n do
		c = c + 1
		p = p * p
	end
	return c
end

local argsize = function(args)
	local k
	local v
	local s = {}
	for k, v in pairs(args) do
		s[k] = varsize(v)
	end
	return s
end

local mapclass = function(asz)
	local k
	local v
	local s = {}
	for k, v in pairs(asz) do
		s[k] = szclass(v)
	end
	return s
end

local classid = function(s)
	local prime = {2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59}
	local id = 1
	for k = 1, #s do
		id = id * math.pow(prime[k], s[k]+1)
	end
	return id
end

local sumsize = function(sizes)
	local k
	local v
	local s = 0
	for k, v in pairs(sizes) do
		s = s + v
	end
	return s
end

local objcatactions = function(header)
	if header then
		return {"Calls", "Reads"}
	else
		return {"calls", "reads"}
	end
end

local objcatcolumns = function(key)
	local col = {header={"Count", "Changes", "Repeats", "Time", "CPU"},
				 label={"count", "changes", "repeats", "wclk", "cpu"},
				 scale={1, 1, 1, 1, 1000},
				 format={nil, nil, nil, nil, "% 5.1f"}}
	return col[key]
end

local objcatzero = function(s, a, c)
	for i, v in pairs(a) do
		for j, w in pairs(c) do
			s[v][w] = 0
		end
	end
end

local actionzero = function(s)
	local columns = objcatcolumns("label")
	local actions = objcatactions(false)
	s.calls = {}
	s.reads = {}
	objcatzero(s, actions, columns)
end

local objzero = function(s)
	s.a = {b=2, c=3}
	s.d = {b=2, c=3}
	s.e = {}
	s.e[s.a] = 17
	s.e[s.d] = 42
	s.time = os.time()
	s.functions = {}
	s.colhd = {}
	s.colhd.actions = objcatactions(true)
	s.colhd.columns = objcatcolumns("header")
	actionzero(s)
	s.maxmem = 0
end

local regfunction = function(stat, f)
	local fentry = stat.functions[f]
	if not fentry then
		fentry = {}
		fentry.args = {}
		fentry.acls = {}
		actionzero(fentry)
		fentry.maxmem = 0
		stat.functions[f] = fentry
	end
	return fentry
end

local regacls = function(fentry, acls, mem)
	local id = classid(acls)
	local centry = fentry.acls[id]
	if not centry then
		centry = {}
		centry.acls = acls
		centry.args = {}
		actionzero(centry)
		centry.maxmem = 0
		fentry.acls[id] = centry
	end
	addmem(centry, mem)
	addmem(fentry, mem)
	addmem(stat, mem)
	return centry
end

local regargs = function(fentry, args)
	local id = varid(args, {})
	local aentry = fentry.args[id]
	local asz
	local mem
	if not aentry then
		asz = argsize(args)
		mem = sumsize(asz)
		rmem = 0
		aentry = {}
		aentry.args = args
		aentry.maxmem = mem
		aentry.latest = 0
		aentry.centry = regacls(fentry, mapclass(asz), mem)
		actionzero(aentry)
		aentry.centry.args[id] = aentry
		fentry.args[id] = aentry
	end
	return aentry
end

local catadd = function(action, repeated, tdelta, cpu)
	action.count = action.count + 1
	if repeated then
		action.repeats = action.repeats + 1
	else
		action.changes = action.changes + 1
	end
	action.wclk = action.wclk + tdelta
	action.cpu = action.cpu + cpu
end

local condadd = function(cached, cat, repeated, tdelta, cpu)
	if cached then
		catadd(cat.reads, repeated, tdelta, cpu)
	else
		catadd(cat.calls, repeated, tdelta, cpu)
	end
end

local statadd = function(cached, repeated, fentry, aentry, t0, t1, cpu)
	aentry.last = t0
	local tdelta = t1-t0
	condadd(cached, stat, repeated, tdelta, cpu)
	condadd(cached, fentry, repeated, tdelta, cpu)
	condadd(cached, aentry.centry, repeated, tdelta, cpu)
	condadd(cached, aentry, repeated, tdelta, cpu)
end

if not stat then
	stat = {}
	objzero(stat)
end

local cache = {}

--- Exported functions

cache.call = function(age, f, ...)
	local t0 = os.time()
	local c0 = os.clock()
	local fentry = regfunction(stat, f)
	local aentry = regargs(fentry, {...})
	local cached = aentry.calls.count > 0 and t0 < aentry.latest + age
	local repeated = true
	local r
	local rid
	local rmem
	if cached then
		r = aentry.result
	else
		r = f(...)
		rid = varid(r, {})
		repeated = aentry.calls.count > 0 and rid == aentry.rid
		if not repeated then
			aentry.result = r
			aentry.rid = rid
			rmem = varsize(r)
			if aentry.rmem then
				if rmem > aentry.rmem then
					aentry.rmem = rmem
				end
			else
				aentry.rmem = rmem
			end
		end
	end
--	local mem = aentry.maxmem + aentry.rmem
	aentry.latest = t0
	local c1 = os.clock()
	local t1 = os.time()
	statadd(cached, repeated, fentry, aentry, t0, t1, c1-c0)
	return r
end

cache.cmdClear = 0

cache.control = function(cmd, opt)
	return 0
end

cache.statistics = function(ag)
	return stat
end

local stathd = function(hd, object, opt)
	local crhdr = {}
	local t = {}
	local c = {}
	local r = {}
	t[1] = {}
	t[2] = {}
	c[1] = {}
	r[1] = {}
	local i = 0
	local j = 0
	i = i + 1
	if hd then
		t[1][i] = hd
	else
		t[1][i] = ""
	end
	r[1][i] = 2
	for b, v in pairs(object.colhd.actions) do
		i = i + 1
		t[1][i] = v
		c[1][i] = #object.colhd.columns
		for d, w in pairs(object.colhd.columns) do
			j = j + 1
			t[2][j] = w
		end
	end
	i = i + 1
	t[1][i] = "Memory"
	r[1][i] = 2
	i = i + 1
	if opt then
		t[1][i] = opt
	else
		t[1][i] = ""
	end
	r[1][i] = 2
	crhdr.data = t
	crhdr.cols = c
	crhdr.rows = r
	return crhdr
end

local condformat = function(v, s, f)
	if f then
		r = string.format(f, s * v)
	else
		r = s * v
	end
	return r
end

local statrow = function(hd, rec, result)
	local crtbl = {}
	crtbl.data = {}
	crtbl.cols = {}
	crtbl.rows = {}
	local ocs = objcatcolumns("scale")
	local ocf = objcatcolumns("format")
	local i = 1
	if hd then
		if type(hd) == "table" then
			crtbl.data[i] = diag.lualist(hd)
		else
			crtbl.data[i] = hd
		end
	else
		crtbl.data[i] = ""
	end
	for b, v in pairs({rec.calls, rec.reads}) do
		for d, w in pairs({v.count, v.changes, v.repeats, v.wclk, v.cpu}) do
			i = i + 1
			crtbl.data[i] = condformat(w, ocs[d], ocf[d])
		end
	end
	i = i + 1
	crtbl.data[i] = rec.maxmem
	i = i + 1
	if result then
		crtbl.data[i] = diag.var(nil, result)
	else
		crtbl.data[i] = "H"
	end
	return crtbl
end

cache.tabulate = function(stat)
	local crhdr = stathd("Object", stat, "Result")
	local t = {}
	local crtbl = {}
	crtbl.data = {}
	crtbl.cols = {}
	crtbl.rows = {}
	local i = 1
	t[i] = statrow("Total", stat, nil)
	for f, fentry in pairs(stat.functions) do
		i = i + 1
		t[i] = statrow("", fentry, nil)
		for cid, centry in pairs(fentry.acls) do
			i = i + 1
			t[i] = statrow(centry.acls, centry, id)
			if centry.args then
				for aid, aentry in pairs(centry.args) do
					i = i + 1
					t[i] = statrow(aentry.args, aentry, aentry.result)
				end
			end
		end
	end
	for k, v in pairs(t) do
		crtbl.data[k] = v.data
		crtbl.cols[k] = v.cols
		crtbl.rows[k] = v.rows
	end
--	return diag.htmltable(crhdr, crtbl)
	for k, v in pairs(_G) do
		t = type(k)
		u = type(v)
		if n[t] then
			n[t] = n[t] + 1
		else
			n[t] = 1
		end
		if u[t] then
			u[t] = u[t] + 1
		else
			u[t] = 1
		end
	end
	return diag.varlim("nu", 5, {nn=n, uu=u})
end

return cache