Clone this repo:
  1. c2634b4 Fix logging in JsonStringTimestampExtractor by Joseph Allemandou · 2 years ago main
  2. ebe3c1c [maven-release-plugin] prepare for next development iteration by maven-release-user · 2 years, 4 months ago
  3. 3b3f9f9 [maven-release-plugin] prepare release gobblin-wmf-1.0.1 by maven-release-user · 2 years, 4 months ago gobblin-wmf-1.0.1
  4. ed1cf6d Change version of guava to 23.0 by Ottomata · 2 years, 4 months ago
  5. c27928f [maven-release-plugin] prepare for next development iteration by maven-release-user · 2 years, 4 months ago

gobblin-wmf

This package contains Gobblin modules created and maitained by the Wikimedia Foundation.

Metrics

gobblin reports two types of metrics: events and metrics.

Metrics are usual metrics stuff, values, timestamps, etc. Complex.

Events mostly about something has happened. There is a flow of them.

They are separated in terms of flows, so we need 2 reporters.

Pattern used building reporters was not simple, tried to use existing reporters and it was relatively complicated. Based on factory. (why return null?)

reportEventQueue is what gets called to report events. Type GobblinTrackingEvent Your built Reporter will be called multiple times.

eventsByTaskId holds reports until task is complete. reportEventToPrometheus converts these events to prometheus metrics . This needs to be called in close() (not implemented yet).

Events are a bit simpler. Metrics more complicated. Events:

  • when job finishes, job duration.

want kafka topic partition metrics. we know that each task uses one topic part. Can broadcast this to all metrics in the same task. If in map id we find topic-part. In event or metrics, many of them we don't have toppar. We would like it for each metric. We have other events at other moments that have that info. Within a task, if any metric has toppar, we can use it as a dimension for all metricas in that task. (per task id)

each metric has custom logic to convert to a prometheus metric.