Google Summer of Code 2024

Kubeflow Community is excited to announce that we are applying this year to participate in the the Google Summer of Code 2024

Below are the “ideas/features” from different projects in the Kubeflow that being submitted for seeking proposals from GSoC Students.

Registration

Please go to Google Summer of Code 2024 and sign up as student and look up Kubeflow organization and select the projects you want to submit proposals. Note, you can submit more than one proposal. Contributor Applications open March 20, 2024 and end on Apr 4th 2024. So start writing your proposals.

Mentors

Kubeflow community is an unparalleled opportunity to make a lasting impact and share your wealth of experience. As a mentor, you’ll have the chance to guide and shape the next generation of talent, while also collaborating with like-minded individuals to elevate projects to new heights. With dedicated support mechanisms like community recognition and feedback channels, you’ll feel valued and empowered every step of the way. Join us in this exciting journey of mentorship and collaboration! If you want submit an project idea reach out Ramesh Reddy on slack.

Students

Welcome to an incredible chance to dive into the vibrant Kubeflow community! Join us and become a valued contributor to our rapidly expanding network dedicated to AI/ML technologies on Kubernetes. Kubeflow mentors come with years of valuable experience and a passion for community growth and project enhancement, they’re the guiding stars of our program. Immerse yourself in a dynamic environment where you’ll not only gain insights from top industry experts but also sharpen your skills in open source collaboration, time management, and presentation prowess alongside making impactful technical contributions to projects of your interest. Our dedicated mentors will be by your side every step of the way, offering invaluable guidance and support throughout the project. Get ready to embark on an exciting journey of growth and learning!

If you need to reach out our mentors, join Kubeflow Slack and look up your mentor or find project specific channel on Kubeflow slack

Project Ideas for the 2024

Project 1: Kubeflow Notebooks 2.0

Kubeflow Notebook is a widely used component of Kubeflow that allows Data Scientists and ML Engineers to run web-based IDEs (JupyterLab, VSCode, RStudio) on Kubernetes clusters.

There is currently an effort to create the next major version of Kubeflow Notebooks.

The main idea is to change the Kubeflow Notebook CRD so that it is no longer just a wrapper around a Kubernetes PodSpec.

This foundational change enables users to:

Update existing notebooks after spawning, to change their “pod config” (CPU/GPU/RAM), “volumes” (storage), and “image” (what packages are installed) from options that are defined by their admin.
Make spawning notebooks less confusing for end-users. Pod configs stop being about specific parts of the PodSpec (e.g. tolerations, requests, limits), and become a drop-down list of user-friendly names (e.g. “Big GPU Notebook - A100 - 128GB”), similar to cloud “instance types”.
Give admins more control over how workspaces are spawned, and the lifecycle of the “options” which are available to users. For example, admins can now “redirect” existing image/pod configs to new ones, but delay the application of these updates until the next pod restart (during which, the interface will display a warning to users that a change is pending).
Support new web-based IDEs without needing to specifically integrate with them. Cluster admins can define a custom “kind” for their internal app, or even make “flavors” of existing apps (like Jupyter and VSCode) with the packages and pod-sizes required for specific teams in their organization.

You would be part of the larger effort, and involved in one or more code deliverables:

See Kubeflow Notebooks docs: https://www.kubeflow.org/docs/components/notebooks/overview/
See Kubeflow Notebooks 2.0 GitHub proposal: https://github.com/kubeflow/kubeflow/issues/7156
See Kubeflow Notebooks 2.0 design document: https://docs.google.com/document/d/1_zk06zebbaTBdJ8TdU07Ibky25hqHGARXjVcsp2qEnU/edit

Skills required: Kubernetes Controllers (Golang - Kubebuilder) AND/OR Web Development (JS - Angular, Python - Flask)

Difficulty: medium/high

Length: 250 hrs

Mentors: Mathew Wicks, Kimonas Sitorchos, Julius von Kohout

Project 2: Rootless Kubeflow Container Images (Istio Ambient Mesh)

Kubeflow uses Istio as a service mesh, which by default requires “root level” network permissions for its init-containers. We want to reduce the number of privileged containers required to run Kubeflow, so are investigating using the Istio CNI, and eventually the Istio Ambient mesh.

You would be involved in testing and investigating the impacts of these changes, and helping push the integration forwards.

See the proposal for more information: https://github.com/kubeflow/manifests/blob/master/proposals/20200913-rootlessKubeflow.md

Skills required: Istio, Kubernetes, YAML

Difficulty: medium

Length: 250 hrs

Mentors: Kimonas Sitorchos, Julius von Kohout

Project 3: Triage and Categorize Kubeflow GitHub Issues & PRs

The Kubeflow project needs help to triage, categorize, and highlight important Issues/PRs from the https://github.com/kubeflow/kubeflow GitHub repo. There are around 200 open Issues and 200 open PRs, in addition to many Issues/PRs that have been lost to time (closed automatically due to inactivity).

Specifically, your goal would be to:

Decide which Issues/PRs are still relevant
Categorize Issues/PRs by type
De-duplicate multiple Issues for the same request
Suggest which ones are the most important.
Help find “good first issues” for new members:
- https://github.com/kubeflow/manifests/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22
Review which PRs are likely safe to merge (especially dependabot ones)

Skills required: GitHub, Kubernetes, YAML, Python, GO, JS

Difficulty: medium

Length: 250 hrs

Mentors: Mathew Wicks, Kimonas Sitorchos, Julius von Kohout

Project 4: Implement LLM Tuning API for Katib

Recently, we implemented a new train Python SDK API in Kubeflow Training Operator to easily fine-tune LLMs on multiple GPUs with predefined datasets provider, model provider, and HuggingFace trainer.

To continue our roadmap around LLMOps in Kubeflow, we want to give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs. It requires making appropriate changes to the Katib Python SDK which allows users to set model, dataset, and HyperParameters that they want to optimize for LLM.

Skills required: Kubernetes, YAML, Python

Difficulty: medium

Length: 250 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuki Iwai

Project 5: Support Distributed Jax for Training Operator

Open issue: https://github.com/kubeflow/training-operator/issues/1619

We want to integrate Jax in Training Operator to run distributed training and fine-tuning jobs on Kubernetes using the Jax ML framework. We need to create a new Kubernetes Custom Resource for Jax (e.g. JaxJob) and update the Training Operator controller to support it. Potentially, we can integrate Jax with the Training Operator Python SDK to give Data Scientists simple APIs to create JaxJob on Kubernetes.

Skills required: Kubernetes, Go, YAML, Python

Difficulty: medium

Length: 250 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuki Iwai

Project 6: Push-based metrics collection for Katib

Open issue: https://github.com/kubeflow/katib/issues/577.

Katib implements Metrics Collector as a sidecar container to collect training metrics from the Trials once training is complete. This Metrics Collector waits until the training container is complete and parses training logs to get appropriate metrics like accuracy or loss to get evaluation results for the HyperParameter tuning algorithm.

Sometimes the container sidecar approach might not work for users. For example, if their Trial resources executor doesn’t support sidecar containers. For such use-cases, we want to implement a new API to the Katib Python SDK to allow users to push metrics directly from their training scripts to the Katib DB.

Skills required: Kubernetes, Go, YAML, Python

Difficulty: medium

Length: 250 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuki Iwai

Project 7: Automate docs generation for Kubeflow Python SDKs

Open issue: https://github.com/kubeflow/katib/issues/2081

Training Operator and Katib SDKs have a valid docstring for each public API that users are running. We want to automatically generate documentation for Kubeflow users from these docstrings, so users don’t need to read source code to understand APIs parameters.

Skills required: Python

Difficulty: medium

Length: 250 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuki Iwai, Shivay Lamba

Project 8: Support various parameter distributions like log-uniform in Katib

Open issue: https://github.com/kubeflow/katib/pull/2059

We need to enhance Katib Experiment APIs to support various parameter distributions like uniform, log-uniform, qlog-uniform to make Katib more native to other HyperParameter tuning frameworks like Hyperopt. Currently, Katib supports only uniform distribution of integer, float, and categorical HyperParameters.

Skills required: Kubernetes, Python, Go, YAML

Difficulty: medium

Length: [250 hrs]

Mentors: Andrey Velichkevich, Johnu George, Yuki Iwai

Project 9: PostgreSQL integration in Kubeflow Pipelines

Open issue: https://github.com/kubeflow/pipelines/issues/9813

Kubeflow Pipelines must store information about pipelines, experiments, runs, and artifacts in a database. Currently, the only database it supports is MySQL/MariaDB.

We plan to support PostgreSQL as an alternative to MySQL/MariaDB so users will be able to reuse existing databases, and PostgreSQL will be a good use case for supporting multiple databases.

Skills required: Kubernetes, Python, Go, YAML

Difficulty: medium

Length: 250 hrs

Mentors: Ricardo Martinelli, Shivay Lamba

Project 10: Enhancing KF Model Registry Python client for seamless ML imports from alternative registries

We aim to extend the capabilities of the KF Model Registry Python client by enabling smooth imports from various machine learning registries. While import from HuggingFace is already implemented (and can be used as a basis) we seek to integrate support for MLFlow, and other popular registry formats.

Skills required: Python, ML model serialization formats, YAML, Kubernetes/Kubeflow as a plus

Difficulty: medium

Length: 250 hrs

Mentors: Matteo Mortari, Andrea Lamparelli

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified February 1, 2024: GSOC 2024 proposal (#3667) (c587277)