Page MenuHomePhabricator

isarantopoulos (Ilias Sarantopoulos)
Machine Learning/MLOps Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Nov 1 2022, 12:34 PM (91 w, 2 d)
Availability
Available
LDAP User
Ilias Sarantopoulos
MediaWiki User
ISarantopoulos-WMF [ Global Accounts ]

Recent Activity

Yesterday

isarantopoulos added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

I took a first swing at this, copying over the Dockerfile instructions to the hf blubber image.
At the moment this is failing on install

18.87   × python setup.py egg_info did not run successfully.
18.87   │ exit code: 1
18.87   ╰─> [9 lines of output]
18.87       Traceback (most recent call last):
18.87         File "<string>", line 2, in <module>
18.87         File "<pip-setuptools-caller>", line 34, in <module>
18.87         File "/srv/app/flash-attention-v2/setup.py", line 21, in <module>
18.87           import torch
18.87         File "/opt/lib/python/site-packages/torch/__init__.py", line 237, in <module>
18.87           from torch._C import *  # noqa: F403
18.87           ^^^^^^^^^^^^^^^^^^^^^^
18.87       ImportError: libamdhip64.so: cannot enable executable stack as shared object requires: Invalid argument
18.87       [end of output]
18.87   
18.87   note: This error originates from a subprocess, and is likely not a problem with pip.
19.10 error: metadata-generation-failed
19.10 
19.10 × Encountered error while generating package metadata.
19.10 ╰─> See above for output.
19.10 
19.10 note: This is an issue with the package mentioned above, not pip.
19.10 hint: See above for details.
------
ERROR: failed to solve: process "/bin/sh -c python3 \"-m\" \"pip\" \"install\" \"-r\" \"huggingface_modelserver/requirements.txt\"" did not complete successfully: exit code: 1

I need to recheck if this is a permissions issue and if so if it would make sense to install flash attention in the pytorch base image in prodcution images instead of the inference-services repository.

Thu, Aug 1, 4:34 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

I've deployed the new model in the experimental namespace in ml-staging so it is now available for further testing.

Thu, Aug 1, 4:29 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

I've uploaded the model on swift and in the public analytics space

Thu, Aug 1, 12:23 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team

Tue, Jul 30

isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

@Isaac We're going to solve the numpy issue by relaxing the kserve restriction by using our wmf kserve fork. At some point in the near future it is going to be supported anyway so we will switch to the official release then. It wouldn't make much sense to build things with an older version just to make things work. Thanks for offering to help!

Tue, Jul 30, 5:04 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos moved T370408: Fix articletopic-outlink CrashLoopBackOff issue from Unsorted to In Progress on the Machine-Learning-Team board.
Tue, Jul 30, 2:56 PM · Machine-Learning-Team
isarantopoulos triaged T370935: [LLM] Explore low_cpu_mem_usage option when loading model in transformers as High priority.
Tue, Jul 30, 2:39 PM · Machine-Learning-Team
isarantopoulos moved T370935: [LLM] Explore low_cpu_mem_usage option when loading model in transformers from Unsorted to Ready To Go on the Machine-Learning-Team board.
Tue, Jul 30, 2:39 PM · Machine-Learning-Team
isarantopoulos moved T370615: [LLM] Gemma2 in staging: HIP out of memory from Unsorted to Ready To Go on the Machine-Learning-Team board.
Tue, Jul 30, 2:39 PM · Machine-Learning-Team
isarantopoulos moved T371344: [LLM] Use Flash attention 2 for GPU inference from Unsorted to Ready To Go on the Machine-Learning-Team board.
Tue, Jul 30, 2:38 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos triaged T371344: [LLM] Use Flash attention 2 for GPU inference as High priority.
Tue, Jul 30, 2:38 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos triaged T370615: [LLM] Gemma2 in staging: HIP out of memory as High priority.
Tue, Jul 30, 2:37 PM · Machine-Learning-Team
isarantopoulos moved T370670: [LLM] Allow additional cmd arguments in hf image from Unsorted to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 30, 2:30 PM · Machine-Learning-Team
isarantopoulos moved T370149: [LLM] Use vllm for ROCm in huggingface image from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T370271: [LLM] Use huggingface text generation interface (TGI) on huggingface image. from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T370656: [LLM] Run LLMs locally in ml-testing from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370615: [LLM] Gemma2 in staging: HIP out of memory from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370670: [LLM] Allow additional cmd arguments in hf image from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370775: [LLM] log input/output size per request from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370935: [LLM] Explore low_cpu_mem_usage option when loading model in transformers from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370992: [LLM] add locust entry for huggingfaceserver from LLM Sprint to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T371344: [LLM] Use Flash attention 2 for GPU inference from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos moved T371384: [LLM] Multi-GPU Inference from LLM Sprint to Unsorted on the Machine-Learning-Team board.
Tue, Jul 30, 1:59 PM · Machine-Learning-Team
isarantopoulos created T371384: [LLM] Multi-GPU Inference.
Tue, Jul 30, 1:58 PM · Machine-Learning-Team
isarantopoulos created P67075 (An Untitled Masterwork).
Tue, Jul 30, 1:04 PM
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

Update: I'm having some issues while building the Lift Wing service which is cause by dependencies.
I'm getting this issue on model load caused by numpy. The issue is that kserve demands numpy <2.0.0. Locally I've had no issue running things in a notebook but with numpy 2.0.0

getting this error Traceback (most recent call last):
  File "/srv/articlequality/model_server/model.py", line 110, in <module>
    model = ArticleQualityModel(
            ^^^^^^^^^^^^^^^^^^^^
  File "/srv/articlequality/model_server/model.py", line 50, in __init__
    self.load()
  File "/srv/articlequality/model_server/model.py", line 53, in load
    self.model = load_pickle(self.model_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/statsmodels/iolib/smpickle.py", line 42, in load_pickle
    return pickle.load(fin)
           ^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/numpy/random/_pickle.py", line 34, in __bit_generator_ctor
    raise ValueError(str(bit_generator_name) + ' is not a known '
ValueError: <class 'numpy.random._mt19937.MT19937'> is not a known BitGenerator module.

I'll work on this and provide an update.

Tue, Jul 30, 9:00 AM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos claimed T360455: Add Article Quality Model to LiftWing.
Tue, Jul 30, 7:39 AM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos moved T360455: Add Article Quality Model to LiftWing from Blocked to In Progress on the Machine-Learning-Team board.
Tue, Jul 30, 7:37 AM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos renamed T371344: [LLM] Use Flash attention 2 for GPU inference from [LLM] Use Flash attention 2 for inference to [LLM] Use Flash attention 2 for GPU inference.
Tue, Jul 30, 7:01 AM · Patch-For-Review, Machine-Learning-Team
isarantopoulos created T371344: [LLM] Use Flash attention 2 for GPU inference.
Tue, Jul 30, 6:05 AM · Patch-For-Review, Machine-Learning-Team

Mon, Jul 29

isarantopoulos added a comment to T371021: [articletopic-outlink] fetch data from mwapi using revid instead of article title.

Until now the solution I have found would involve making 2 requests instead of 1 (with examples using the API sandbox):

Mon, Jul 29, 12:37 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos updated the task description for T371021: [articletopic-outlink] fetch data from mwapi using revid instead of article title.
Mon, Jul 29, 12:34 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos updated the task description for T371021: [articletopic-outlink] fetch data from mwapi using revid instead of article title.
Mon, Jul 29, 11:57 AM · Lift-Wing, Machine-Learning-Team

Fri, Jul 26

isarantopoulos added a comment to T364551: [SPIKE] Send an image thumbnail to the logo detection service.

@mfossati after our discussion I said I'll provide the links to the mediawiki code in ores extension that makes requests to Lift Wing:

Fri, Jul 26, 1:44 PM · MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), Structured-Data-Backlog (Current Work), Machine-Learning-Team
isarantopoulos added a comment to T370992: [LLM] add locust entry for huggingfaceserver.

It seems that the above behavior with the increased memory usage is a standard thing.
I redeployed the service and was using 18GB of VRAM. Just a short while after I ran a load test usage went up to 46GB again (grafana link)

Fri, Jul 26, 12:52 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos added a comment to T370992: [LLM] add locust entry for huggingfaceserver.

I ran a load test with the folowing setup:
duration: 10 minutes
users: 2
output_size(max_tokens): 10-200
prompt_input_size (# words): 15-300

MODEL=huggingface locust
[2024-07-26 07:45:58,337] stat1008/INFO/locust.main: Run time limit set to 600 seconds
[2024-07-26 07:45:58,337] stat1008/INFO/locust.main: Starting Locust 2.29.1
[2024-07-26 07:45:58,338] stat1008/INFO/locust.runners: Ramping to 2 users at a rate of 10.00 per second
[2024-07-26 07:45:58,338] stat1008/INFO/locust.runners: All users spawned: {"HuggingfaceServer": 2} (2 total users)
[2024-07-26 07:55:57,838] stat1008/INFO/locust.main: --run-time limit reached, shutting down
Load test results are within the threshold
[2024-07-26 07:55:57,923] stat1008/INFO/locust.main: Shutting down (exit code 0)
Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST     /openai/v1/completions                                                            71     0(0.00%) |  13810    2915   21509  14000 |    0.12        0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
         Aggregated                                                                        71     0(0.00%) |  13810    2915   21509  14000 |    0.12        0.00
Fri, Jul 26, 8:34 AM · Lift-Wing, Machine-Learning-Team

Thu, Jul 25

isarantopoulos created P66927 (An Untitled Masterwork).
Thu, Jul 25, 2:42 PM
isarantopoulos created T371021: [articletopic-outlink] fetch data from mwapi using revid instead of article title.
Thu, Jul 25, 1:29 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos added a comment to T370992: [LLM] add locust entry for huggingfaceserver.

I've managed to run the locust tests from stat1008 for gemma2-9b-it using the following process:

Thu, Jul 25, 12:02 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos created T370992: [LLM] add locust entry for huggingfaceserver.
Thu, Jul 25, 8:26 AM · Lift-Wing, Machine-Learning-Team

Wed, Jul 24

isarantopoulos added a comment to T370670: [LLM] Allow additional cmd arguments in hf image.

With the merged patch we would have the below change in deployment charts:

command: [ "./entrypoint.sh"]
args: ["--dtype", "float32"]
Wed, Jul 24, 12:21 PM · Machine-Learning-Team
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

I'll have an update on this next week, since this week the team is doing a focus week on LLM work. I've already done some work in the patch seen above.

Wed, Jul 24, 8:11 AM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team

Tue, Jul 23

isarantopoulos assigned T370656: [LLM] Run LLMs locally in ml-testing to kevinbazira.
Tue, Jul 23, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370149: [LLM] Use vllm for ROCm in huggingface image from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T370271: [LLM] Use huggingface text generation interface (TGI) on huggingface image. from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T370615: [LLM] Gemma2 in staging: HIP out of memory from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370670: [LLM] Allow additional cmd arguments in hf image from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370656: [LLM] Run LLMs locally in ml-testing from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Machine-Learning-Team
isarantopoulos moved T370775: [LLM] log input/output size per request from Unsorted to LLM Sprint on the Machine-Learning-Team board.
Tue, Jul 23, 1:59 PM · Machine-Learning-Team
isarantopoulos created T370775: [LLM] log input/output size per request.
Tue, Jul 23, 1:58 PM · Machine-Learning-Team
isarantopoulos moved T370670: [LLM] Allow additional cmd arguments in hf image from 2024-2025 Q1 Done to Unsorted on the Machine-Learning-Team board.
Tue, Jul 23, 1:19 PM · Machine-Learning-Team
isarantopoulos moved T370670: [LLM] Allow additional cmd arguments in hf image from Unsorted to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 23, 1:18 PM · Machine-Learning-Team

Mon, Jul 22

isarantopoulos renamed T370149: [LLM] Use vllm for ROCm in huggingface image from Use vllm for ROCm in huggingface image to [LLM] Use vllm for ROCm in huggingface image .
Mon, Jul 22, 4:17 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos renamed T370271: [LLM] Use huggingface text generation interface (TGI) on huggingface image. from Use huggingface text generation interface (TGI) on huggingface image. to [LLM] Use huggingface text generation interface (TGI) on huggingface image..
Mon, Jul 22, 4:17 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos renamed T370615: [LLM] Gemma2 in staging: HIP out of memory from Gemma2 in staging: HIP out of memory to [LLM] Gemma2 in staging: HIP out of memory.
Mon, Jul 22, 4:16 PM · Machine-Learning-Team
isarantopoulos renamed T370656: [LLM] Run LLMs locally in ml-testing from Run LLMs locally in ml-testing to [LLM] Run LLMs locally in ml-testing.
Mon, Jul 22, 4:16 PM · Machine-Learning-Team
isarantopoulos created T370670: [LLM] Allow additional cmd arguments in hf image.
Mon, Jul 22, 4:16 PM · Machine-Learning-Team
isarantopoulos reopened T370408: Fix articletopic-outlink CrashLoopBackOff issue, a subtask of T369344: Reorganize LiftWing isvcs repo structure to improve maintainability, as Open.
Mon, Jul 22, 1:25 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos reopened T370408: Fix articletopic-outlink CrashLoopBackOff issue as "Open".

Let's keep this Open until we deploy the new version with the fix to production

Mon, Jul 22, 1:25 PM · Machine-Learning-Team
isarantopoulos updated the title for P66882 HIP error from untitled to HIP error.
Mon, Jul 22, 12:06 PM
isarantopoulos created P66882 HIP error.
Mon, Jul 22, 12:05 PM

Fri, Jul 19

isarantopoulos added a comment to T370408: Fix articletopic-outlink CrashLoopBackOff issue.

The service is up and running in staging and works as expected.
The production service is still running an older image that works fine and as this is not an urgent thing to deploy (nor a fix) we'll deploy and test production after the following week, as the ML team is doing a focus week on LLMs.

Fri, Jul 19, 11:11 AM · Machine-Learning-Team
isarantopoulos added a comment to T370408: Fix articletopic-outlink CrashLoopBackOff issue.

The issue was caused because the cmd args that are passed to the container are not parsed by the model_server_entrypoint.sh script which is the entrypoint for the container.
This means that the command that is run in:

python3 transformers/transformers.py

instead of

python3 transformers/transformers.py --model_name outlink-topic-model --predictor_host outlink-topic-model-predictor-default.articletopic-outlink --http_port 8080

Since the first argument for the script is the python script to execute we parse additional arguments from the 2nd on as follows:

exec /usr/bin/python3 ${MODEL_SERVER_PATH} "${@:2}"
Fri, Jul 19, 10:05 AM · Machine-Learning-Team

Thu, Jul 18

isarantopoulos added a comment to T370408: Fix articletopic-outlink CrashLoopBackOff issue.

This service hasn't been deployed for quite a while (last deployed change on 14/12/2023) so there are some changes that have been causing errors.
However it is puzzling that the --predictor_host and name aren't set anywhere, however when inspecting the deployed transformer pod (kubectl describe pod xxxx) we seen the following:

Containers:
  kserve-container:
   Image:      docker-registry.discovery.wmnet/wikimedia/machinelearning-liftwing-inference-services-outlink-transformer:2023-12-14-124100-publish
   Port:       8080/TCP
   Host Port:  0/TCP
   Args:
     --model_name
     outlink-topic-model
     --predictor_host
     outlink-topic-model-predictor-default.articletopic-outlink
     --http_port
     8080
   Limits:
     cpu:     1
     memory:  2Gi
   Requests:
     cpu:     1
     memory:  2Gi
   Environment:
     WIKI_URL:         http://mw-api-int-ro.discovery.wmnet:4680
     PORT:             8080
     K_REVISION:       outlink-topic-model-transformer-default-00020
     K_CONFIGURATION:  outlink-topic-model-transformer-default
     K_SERVICE:        outlink-topic-model-transformer-default

We need to understand:

  • how the model_name and predictor_host are set in the args
  • how these args are used by the transformer. Even though they are set, they are not parsed from kserve code. Our code doesn't explicitly set these args on cmd as it calls python3 transformer/transformer.py which is used as an entrypoint. However looking at git history it has been like this all along.
Thu, Jul 18, 4:13 PM · Machine-Learning-Team
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

Thanks for the update Isaac!
By looking at the above code + model iiuc the following changes need to be introduced in Lift Wing:

  • switch from sklearn to of statsmodels ordinal regression
  • change output schema to match the one on the model card
Thu, Jul 18, 9:55 AM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team

Wed, Jul 17

isarantopoulos created T370271: [LLM] Use huggingface text generation interface (TGI) on huggingface image..
Wed, Jul 17, 1:48 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos added a comment to T364551: [SPIKE] Send an image thumbnail to the logo detection service.

In the past we have used the envoy proxy to access Lift Wing from mw. Here is the relevant piece of config used by the Ores Extension.
Here is the relevant piece of config in puppet that configures Lift Wing production (service is named as inference) and the one for ml-staging (named as inference-staging).

Wed, Jul 17, 10:30 AM · MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), Structured-Data-Backlog (Current Work), Machine-Learning-Team

Tue, Jul 16

isarantopoulos triaged T353974: LLM that specializes in assisting Wikimedia/MediaWiki technical contributors as Medium priority.
Tue, Jul 16, 4:20 PM · artificial-intelligence, Machine-Learning-Team
isarantopoulos triaged T353025: Investigate how to improve model card integration with existing user flows as Medium priority.
Tue, Jul 16, 4:20 PM · Machine-Learning-Team
isarantopoulos triaged T356256: Epic: Implement prototype inference service that uses Cassandra for request caching as Medium priority.
Tue, Jul 16, 4:19 PM · Epic, Machine-Learning-Team
isarantopoulos triaged T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context as Medium priority.
Tue, Jul 16, 4:19 PM · Machine-Learning-Team
isarantopoulos triaged T367048: Investigate kserve 0.13.0 upgrade as Medium priority.
Tue, Jul 16, 4:19 PM · Machine-Learning-Team
isarantopoulos triaged T362749: Deploy logo-detection model-server to LiftWing staging as Medium priority.
Tue, Jul 16, 4:19 PM · Machine-Learning-Team
isarantopoulos triaged T365554: Run load tests for the rec-api-ng and update production resources to meet expected load as Medium priority.
Tue, Jul 16, 4:18 PM · Machine-Learning-Team
isarantopoulos triaged T363449: Configure the logo-detection model-server hosted on LiftWing to process images from Wikimedia Commons as Medium priority.
Tue, Jul 16, 4:18 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos triaged T363506: Pass image objects to the logo detection service as Medium priority.
Tue, Jul 16, 4:18 PM · Machine-Learning-Team, Structured-Data-Backlog
isarantopoulos triaged T369344: Reorganize LiftWing isvcs repo structure to improve maintainability as Medium priority.
Tue, Jul 16, 4:18 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos triaged T368359: Upgrade Knative control plane Docker images to Bullseye/Bookworm as Medium priority.
Tue, Jul 16, 4:18 PM · Machine-Learning-Team
isarantopoulos moved T354257: Investigate inference optimization frameworks for Large Language Models (LLMs) from Ready To Go to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 16, 1:46 PM · Machine-Learning-Team
isarantopoulos moved T354870: Deploy 7b parameter models from HF from Ready To Go to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 16, 1:46 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos moved T357986: Use Huggingface model server image for HF LLMs from In Progress to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 16, 1:46 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos moved T369055: Investigate deployment of gemma2 on LiftWing from In Progress to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Tue, Jul 16, 1:46 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos closed T357986: Use Huggingface model server image for HF LLMs, a subtask of T353337: Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models, as Resolved.
Tue, Jul 16, 1:46 PM · Goal, Machine-Learning-Team
isarantopoulos closed T357986: Use Huggingface model server image for HF LLMs, a subtask of T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU, as Resolved.
Tue, Jul 16, 1:46 PM · Goal, Machine-Learning-Team
isarantopoulos closed T357986: Use Huggingface model server image for HF LLMs as Resolved.

The current work can be marked done as we can now deploy images using the huggingfaceserver and in a stable way after completing https://phabricator.wikimedia.org/T369359

Tue, Jul 16, 1:46 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos closed T354257: Investigate inference optimization frameworks for Large Language Models (LLMs) as Resolved.

The current task can be marked done as after investigation vllm seems to be the most prominent solution for an inference optimization engine and work is continued here -> https://phabricator.wikimedia.org/T370149

Tue, Jul 16, 1:45 PM · Machine-Learning-Team
isarantopoulos closed T369055: Investigate deployment of gemma2 on LiftWing as Resolved.
Tue, Jul 16, 1:45 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos closed T354257: Investigate inference optimization frameworks for Large Language Models (LLMs), a subtask of T353337: Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models, as Resolved.
Tue, Jul 16, 1:45 PM · Goal, Machine-Learning-Team
isarantopoulos closed T354870: Deploy 7b parameter models from HF as Resolved.

The current work can be marked done as we can now deploy images using the huggingfaceserver.

Tue, Jul 16, 1:44 PM · Patch-For-Review, Machine-Learning-Team
isarantopoulos renamed T370149: [LLM] Use vllm for ROCm in huggingface image from [LLM] Use vllm with rocm in huggingface image to Use vllm for ROCm in huggingface image .
Tue, Jul 16, 1:41 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos created T370149: [LLM] Use vllm for ROCm in huggingface image .
Tue, Jul 16, 12:53 PM · Lift-Wing, Machine-Learning-Team

Mon, Jul 15

isarantopoulos closed T369359: Simplify dependencies in hf image as Resolved.

Resolving this as the previous issue that occurred during a deployment (https://phabricator.wikimedia.org/T369359#9974140) doesn't have anything to do with this task.

Mon, Jul 15, 2:53 PM · Machine-Learning-Team
isarantopoulos moved T369359: Simplify dependencies in hf image from In Progress to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Mon, Jul 15, 2:52 PM · Machine-Learning-Team
isarantopoulos added a comment to T369359: Simplify dependencies in hf image.

I re-deployed the 27b model today and it is running fine:

Mon, Jul 15, 2:50 PM · Machine-Learning-Team

Fri, Jul 12

isarantopoulos closed T363334: [httpbb] fix failing httpbb test in production enwiki-articletopic as Resolved.

Resolving this as it can't be reproduced.

Fri, Jul 12, 2:03 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos moved T363334: [httpbb] fix failing httpbb test in production enwiki-articletopic from Ready To Go to 2024-2025 Q1 Done on the Machine-Learning-Team board.
Fri, Jul 12, 2:02 PM · Lift-Wing, Machine-Learning-Team
isarantopoulos added a comment to T360455: Add Article Quality Model to LiftWing.

I see you've done a lot of great work on feature engineering and preprocessing so I don't mean to interfere with your work! My suggestion is a bit short sighted as I was looking at it from the perspective of deploying and updating a model. I was hoping to use a gradient boosting model and don't do any normalization (we'd still have to take care of extreme outliers). This way we wouldn't have to maintain a separate csv with the values used in preprocessing an we could still have interpretable features using feature importance attribute of these models.

Fri, Jul 12, 1:51 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
isarantopoulos added a comment to T363334: [httpbb] fix failing httpbb test in production enwiki-articletopic.

I've ran all tests for staging and prod. Staging is fine but I get this error on prod eqiad which seems transient:

httpbb --host inference.svc.eqiad.wmnet --https_port 30443 /srv/deployment/httpbb-tests/liftwing/production/*
Sending to inference.svc.eqiad.wmnet...
https://nlwiki-articlequality.revscoring-articlequality.wikimedia.org/v1/models/nlwiki-articlequality:predict (/srv/deployment/httpbb-tests/liftwing/production/test_revscoring-articlequality.yaml:38)
    ERROR: HTTPSConnectionPool(host='inference.svc.eqiad.wmnet', port=30443): Read timed out. (read timeout=10)
===
ERRORS: 114 requests attempted to inference.svc.eqiad.wmnet. Errors connecting to 1 host.
Fri, Jul 12, 12:23 PM · Lift-Wing, Machine-Learning-Team

Thu, Jul 11

isarantopoulos added a comment to T369359: Simplify dependencies in hf image.

Following up on some of the above:

Thu, Jul 11, 3:50 PM · Machine-Learning-Team
isarantopoulos added a comment to T369359: Simplify dependencies in hf image.

Tested the updated image in ml-staging using the GPU and got the following error:

2024-07-11 11:00:24.531 1 kserve INFO [storage.py:download():66] Copying contents of /mnt/models to local
2024-07-11 11:00:24.531 1 kserve INFO [storage.py:download():110] Successfully copied /mnt/models to None
2024-07-11 11:00:24.531 1 kserve INFO [storage.py:download():111] Model downloaded in 0.0003374351654201746 seconds.
2024-07-11 11:00:24.532 1 kserve INFO [__main__.py:load_model():204] Loading generative model for task 'text_generation' in torch.bfloat16
2024-07-11 11:00:24.755 1 kserve INFO [generative_model.py:load():206] Decoder-only model detected. Setting padding side to left.
2024-07-11 11:00:25.362 1 kserve INFO [generative_model.py:load():223] Successfully loaded tokenizer
Loading checkpoint shards: 100%|██████████| 12/12 [00:18<00:00,  1.51s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
WARNING:accelerate.big_modeling:You shouldn't move a model that is dispatched using accelerate hooks.
2024-07-11 11:00:44.203 1 kserve ERROR [__main__.py:<module>():259] Failed to start model server: You can't move a model that has some modules offloaded to cpu or disk

Building the same image locally on a m1 doesn't cause this issue which is weird, but it is likely caused from one of the dependencies having different version on m1.

Thu, Jul 11, 2:50 PM · Machine-Learning-Team
isarantopoulos created P66283 (An Untitled Masterwork).
Thu, Jul 11, 11:46 AM