User Details
- User Since
- Jan 13 2020, 11:39 PM (233 w, 11 h)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- JWang (WMF) [ Global Accounts ]
Wed, Jun 5
DAG files have been submitted to gitlab. A request for review and merge has been made. link
- Write updated SQL queries for table creation and updating
Queries have been checked in to Gitlab: link
Tue, Jun 4
QA code has been cleaned and uploaded to gitlab. gitlab link
QA code has been cleaned and uploaded to gitlab. gitlab link
QA code has been cleaned and uploaded to gitlab. gitlab link
May 29 2024
May 28 2024
Here are the analysis result based on the data collected from beta users in vector-2022 skin on desktop web between may 16, 2024 and may 28 2024. cc @ovasileva.
What percentage of users have changed their color theme?
We don’t have an accurate answer for this question. Here are the data we have and their limitations.
- 31145 users have enabled beta feature preference cross wikis by May 28, 2024.They would be exposed to the font menu and color theme menu if they visited our website. But we don’t have data on how many of them have visited our website after we deployed the color theme menu in May.
- The 8066 clicks were made by 1967 unique sessions from May 16 and May 28, 2024. One user could have one or more sessions.
May 21 2024
Here is the summary based on the result collected between May 7, 2024 and May 20, 2024. cc: @ovasileva
Take-aways:
- The result is similar to the initial result.
- For logged-in users, after the sudden increase after deployment, the number of clicks fell and stayed flat.
May 13 2024
Here is the initial summary based on the result collected between May 7, 2024 and May 12, 2024 on pilot wikis.
How many users clicked the font options?
- Among logged-in users, 2.1% of sessions have changed their text size.
- Among anonymous users, 1.5% of sessions have changed their text size.
user type | clicks | clicked sessions | initialization | initialized sessions | click_rate |
---|---|---|---|---|---|
Logged-in users | 4031 | 1287 | 744885 | 60439 | 0.021 |
Anonymous users | 273802 | 90304 | 9283552 | 5858477 | 0.015 |
- Daily trend of number of clicks on font options
May 10 2024
I have submitted for LS3C review. Here is the Asana link.
May 9 2024
May 8 2024
Hi @kostajh, here are the draft of measurement plan and the draft of instrumentation spec. Please review and feel free to edit. Let me know if you think they are ready to submit for legal review.
May 7 2024
May 6 2024
Apr 30 2024
@KSarabia-WMF , my understanding is that mediawiki_web_ui_actions will be the merged schema. After all issues in mediawiki_web_ui_actions are resolved, we can move forward to using it as the analysis data source. And DesktopWeb and MobileWebUIClickTracking can be retired.
Apr 26 2024
Apr 25 2024
The draft is under review now.
Apr 23 2024
Apr 18 2024
Apr 17 2024
Both skin preference and global preference reflect the status as of the data collection date, which is April 15, 2024.
Apr 16 2024
Apr 15 2024
Apr 12 2024
@JAllemandou @BTullis, Thank you very much for detailed explanation! I will move from hive to presto and spark. I am going to mark this ticket as resolved.
Apr 5 2024
@ovasileva, here are the analysis result. The answer to the third questions is a very rough estimate. Let me know if you disagree with any of the assumptions.
Apr 3 2024
Apr 2 2024
What is the default font value on vector-2022 ? Regular
Mar 29 2024
Here is the font size stats on desktop web by skin version. A few questions based on the data
- What is the default font value on vector-2022 ?
- What does the value disabled stand for on vector-2022?
- What is the default font value on vector ? Should we include vector skin in this analysis?
Mar 26 2024
Mar 25 2024
As a followup, I have documented sample rate at data hub.
As a followup, I have documented the sample rate at datahub. @KSarabia-WMF , please review and confirm whether they are reflecting the current configuration.
As a followup, I have documented the current sample rate at https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,event.mobilewebuiactionstracking,PROD)/Documentation?is_lineage_mode=false
Mar 21 2024
- Following the inclusion of client hints in the analysis, there was an average increase of 2 in the maximum number of unique user agents on a daily basis.
- Throughout January 2024, the daily maximum rose from 6 to 8 unique user agents per IP on English Wikipedia.
- For some days, the increase in maximum after including client info could be as large as 5.
Mar 19 2024
Mar 13 2024
What has been checked | Status | Note | Snapshot of the result from the old schema | Snapshot of the result from the new schema |
---|---|---|---|---|
Pick one session_id, compare the result | PASS | Captured same number of events. | ||
Pick one pageview_id, compare the result | PASS | Captured same number of events. | ||
By date | PASS | The new schema captured 0.37% more events than the old schema. The new schema captured 0.34% more sessions than the old schema. | ||
By action | PASS | Between March 1st and 10th, The new schema captured 0.95% more click events than old schema. The new schema captured 0.36% more init events than old schema. The new schema captured 2.23% more show events than old schema. They are within a 2.5% acceptable variance. | ||
By event name | ❌ | Based on the data collected from 2024-03-01 to 2024-03-10: 1) 176 types of events are captured in new schema or old schema. 2) 31 types of events are captured in new schema, but not in old schema. 3) 2 types of events are captured in old schema, but not in new schema. They are menu.preferences and menu.ve-edit | event name diff file | |
By wiki | ❓Is a difference of 2.6% on commonswiki OK? | Between 2024-03-01 and 2024-03-10: New schema captured 819 wikis, while the old schema captured 820. The missed wiki is nycwikimedia. The new schema captured 0.56% more events than the old schema in average. The new schema captured 0.47% more sessions than the old schema in average. The highest different rate of session count is from small wikis.Among the large wikis, the events on commonswiki is 2.6% more in new schema. | ||
By skin name | ❓is it expected that the new schema captured 'vector' and 'vector-2022' skin with agent.client_platform_family='mobile_browser'. | Based on the data collected from 2024-03-01 to 2024-03-10: The new schema captured 0.37% more minerva events than old schema. The new schema captured 0.34% more minerva sessions than old schema. 'vector' and 'vector-2022' skins are not captured in old schema, but captured in new schema with agent.client_platform_family='mobile_browser'. To check with engineer whehter it is expected. | ||
By user type | PASS | The difference is within a 2.5% variance. | ||
agent type | PASS | |||
edit count bucket | ❓ Is it expected that performer.edit_count_bucket is NULL in new schema for logged out users, while in old schema, event.editCountBucket is '0 edits'. | For loggedin users, editcountbucket difference is within 2.5% variance. For loggedout users, in new schema performer.edit_count_bucket is NULL, while in old schema event.editCountBucket is '0 edits'. Need to confirm whether it is expected. | ||
pageNamespace | ❌ | page.namespace_id is NULL in new schema | ||
is_dark_mode_on, | ❓ Is the null in old schema expected? | The difference is within a 2.5% variance. The old schema captured some NULLs, while new schema didnot.For the events with null in event.is_dark_mode_on, their kin is also NULL. To check with engineer | ||
is_dark_mode_prepared_by_os | ❓ Is the null in old schema expected? | The different is within a 2.5% variance. The old schema captured some NULLs, their skin field is NULL too. To check with engineer | ||
dark_mode_setting | ❓ Is the null in old and new schemas expected? | The differences in dark_mode_setting being 0,1, 2, and NULL are within a 2.5% variance. | ||
is_full_width | ❓ | The difference is within a 2.5% variance. - The old schema captured some NULLs, their skin fields are NULL too . To check with engineer | ||
is_media_viewer_enabled | ❓ | For is_media_viewer_enabled=true, the difference is within a 2.5% variance. For is_media_viewer_enabled=false, the new schema captured 2.55% more events than the old schema. To check with engineer. | ||
is_page_preview_on | PASS | The difference is within a 2.5% variance | ||
is_pinned | PASS | The difference is within a 2.5% variance | ||
font | ❓Is font size 0 expected? | The differences in font sizes, being small, regular, and large, are within a 2.5% variance. The difference in font size being large exceeds 2.5%. Given the low volume and small absolute difference, we mark it as PASS. Both schemas captured some events where the font size was 0. To check with the engineer. | ||
action_context | ❓ What's the meaning of the field value | need to document the meanings of the values: stable, stable,amc | ||
sample.rate | ❌ incorrect | 100% for all wikis and for all type of users | ||
is_bot | ❌ | performer.is_bot is NULL in new schema. |
Based on the number of events captured in the old and new schema, we believe the new schema is configured with the same sample rate as the old schema, as mentioned in T353029#9621127. However, it is recorded as 100% for all wikis in the new schema.
Mar 12 2024
Mar 11 2024
@KSarabia-WMF , thanks for the info.
What has been checked | Status | Note | Snapshot of the result from the old schema | Snapshot of the result from the new schema |
---|---|---|---|---|
Pick one session_id, compare the result | PASS | Captured same number of events. | ||
Pick one pageview_id, compare the result | PASS | Captured same number of events. | ||
By date | PASS | The new schema captured 0.39% more events than the old schema. The new schema captured 0.34% more sessions than the old schema. | ||
By action | PASS | Between March 1st and Match 5th, the new schema captured 0.18% more click events than old schema. The new schema captured 0.18% more click sessions than old schema. The new schema captured 0.58% more init events than old schema.The new schema captured 0.49% more init sessions than old schema. They are within 2.5% acceptable variance. | ||
By event name | ❌ | 4000+ types of event names in desktopwebuiactionstracking schema schema. Event names contain content info of the pages. . Some event names are in old schema but not in new schema, for example ui.sidebar-toc. Some event names are not in old schema but in new schema, for example, ns=0, most of them are from minerva skin | even_name.diff_comparison | |
By wiki | PASS | New schema captured 828 wikis, same as the old schema, in the month of Feb 2024.The highest different rate of session count is from small wikis. The events on nowiktionary is 42.3% fewer in new schema. The difference is reduced to 10% since 2024-02-26. The new schema captured 0.85% more events than the old schema in average.The new schema captured 1.44% more sessions than the old schema in average. | ||
By skin name | ❓ is it expected that the new schema captured 'minerva' skin with agent.client_platform_family='desktop_browser'. | Based on the data collected from 20240301 to 20240305. The new schema captured 0.52% more vector events than old schema.The new schema captured 0.50% more vector sessions than old schema. The new schema captured 0.3% more vector2022 events than old schema.The new schema captured 0.25% more vector2022 sessions than old schema. minerva skin is not captured in old schema, but captured in new schema with agent.client_platform_family='desktop_browser'. To check with engineer whehter it is expected. | ||
By user type | PASS | Based on the data collected from 20240301 to 20240305. New scheam captured more sessions and events than the old schema, but within 2.5% variance. | ||
agent type | PASS | {F42562439} | {F42562448} | |
edit count bucket | ❓ Is it expected that for logged-out users performer.edit_count_bucket is NULL in new schema, while in old schema, event.editCountBucket is '0 edits'. | For logged-in users, editcountbucket difference is within 2.5% variance. For logged-out users, performer.edit_count_bucket is NULL in new schema, while in old schema, event.editCountBucket is '0 edits'. Need to confirm whether it is expected. | ||
pageNamespace | ❌ | page.namespace_id is NULL in new schema | ||
viewportSizeBucket | ❓ | diff is within 2.5% variance. new schema captured 2620 NULL viewportsizebucket with skin minerva . To check with engineer | ||
is_dark_mode_on, | ❓ Is null in old schema expected | The diff is within 2.5% variance. old schema captured some NULLs, while new schema did not. To check with engineer | ||
is_dark_mode_prepared_by_os | ❓ Is null in old schema expected | The diff is within 2.5% variance. old schema captured some NULLs, while new schema did not. To check with engineer | ||
dark_mode_setting | ❓ Is null in old schema expected | The differences in dark_mode_setting being 0, 2, and NULL are within a 2.5% variance. The difference in dark_mode_setting being 1 is larger than 2.5%. Due to the low volume and small absolute difference, we mark it as a pass. | ||
is_full_width | ❓ | The diff is within a 2.5% variance. The old schema captured some NULLs, while new schema did not.The NULL is from anonymous users. To check with engineer | ||
is_media_viewer_enabled | PASS | The difference is within a 2.5% variance | ||
is_page_preview_on | PASS | The difference is within a 2.5% variance | ||
is_pinned | PASS | The difference is within a 2.5% variance | ||
font | ❓ | The diff in font=0,1,2 is within a 2.5% variance.. Some values, like large, null, regular and small, are captured in old schema only. To check with engineer. | ||
action_context, | ❓ is it expected | value is desktop for minerva skin in new schema | ||
is_bot | ❌ | performer.is_bot is NULL in new schema | {F42603014} | {F42603033} |
sample rate | ❌ | incorrect in new schema |
Mar 8 2024
@KSarabia-WMF, thanks for checking. Can you also clarify what's the sample rate for logged-in users?
Mar 7 2024
Mar 6 2024
Hi, @KSarabia-WMF , Can you confirm if below sample rate captured in the new schema is correct?
@kostajh, please see the findings below.
Methodology
We reviewed the distribution of the number of distinct user agents that appear for a given IP address per day on each pilot wiki candidate and the largest wiki enwiki.
We also reviewed the worst-case scenario: the maximum number of the distinct user agents that appear for a given IP address per day across all wikis.
The analysis is limited to anonymous edits committed between 2024-01-01 and 2024-01-31.
Mar 5 2024
@KSarabia-WMF, can you also provide the sample rate of the old schema DesktopWebUIActionsTracking? Thanks.
@KSarabia-WMF, can you also provide the sample rate of the old schema MobileWebUIActionsTracking? Thanks.
Mar 4 2024
Feb 29 2024
HI @ovasileva, please see my investigation summary below.
Feb 28 2024
Thanks for checking on it. Regarding 0.2% discrepancy, it can be marked as PASS given 1) it's within variance range , 2.5% variance for daily events across all wikis, that we defined in Metrics Platform Instrument Migration Data QA Process Description ; 2) the new instrumentation is capturing more unique sessions than old instrumentation.
Feb 26 2024
I'll defer to Jennifer about 2 vs auto. I think it's better to do 2 personally in case these definitions ever change in future this will be more resilient to change.
Feb 13 2024
Migration of desktopwebuiactionstracking schema is ready for QA.
The mobilewebuiactionstracking schema is pending for migration.
Feb 2 2024
Hi, thank you for bringing up and clarifying that.
@phuedx, Here are some findings from my investigation.
Jan 31 2024
Here are the baselines for devices with a viewport larger than 1200px. @ovasileva , let me know if you have any questions.
Preview disable rate (viewport > 1200px)
Metric: Number of unique sessions with preview off (non-default)/ total number of unique initialized sessions (viewport > 1200px).
The following statistics are based on the data collected between Dec. 21, 2023 and Dec. 31, 2023
User type | Daily average | Std |
---|---|---|
Loggedin users | 44.37% | 0.27% |
Anonymous users | 3.65% | 0.12% |
Jan 25 2024
@phuedx, Thanks for resolving all the questions. I will further investigate the remaining question of why the numbers of events, sessions and pages are slightly higher in the new schema. Will bring it up to you when I have more data.
Jan 20 2024
Questions to confirm with engineers
- The number of events, sessions and pages are slightly higher in the new schema. Is it expected?
- Which field is to capture Spider user agent?
- Is access_method captured in agent.client_platform_family in the new schema?
- Please review the field mapping table below and confirm whether all entries are as expected.
Field in old schema | Field in new schema | Value example |
---|---|---|
action | action | scroll-to-top |
action_context | NULL | |
action_source | NULL | |
action_subtype | NULL | |
web_session_id | performer.session_id | e.g. , '2751f1d9e9a0417cbc1x' |
meta.dt | meta.dt | e.g. "2024-01-16T00:17:25.272Z" |
page_id | page.id | 59519 |
access_method | agent.client_platform_family❓ | access_method= 'desktop' ; agent.client_platform_family='desktop_browser' |
is_anon | performer.is_logged_in | true, false. The old schema captures the status of being an anoymous user, while the new schema captures the status of being a loggedin users. |
skin | mediawiki.skin | vector-2022 |
user_agent_map['device_family'] | MISSING ❓ | Spider |