Skip to content

Conversation

sighingnow
Copy link
Collaborator

Require the flashmla patch vllm-project/FlashMLA#7 to be landed first.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with decoding metadata for dense MLA's FP8 K/V cache by introducing a specialized operator. The changes in the Python code correctly route the execution to this new operator when appropriate. However, there is a critical issue in the CMake configuration where the flashmla dependency is pointed to a personal fork. This practice introduces significant risks and should be rectified by merging the required changes into the official upstream repository and updating the commit hash accordingly.

Comment on lines +21 to +22
GIT_REPOSITORY https://github.com/sighingnow/FlashMLA
GIT_TAG 7af725e6c2a3f0262e5b8573c715411a6d895cae
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Pointing the GIT_REPOSITORY to a personal fork (sighingnow/FlashMLA) introduces a significant dependency risk. For project stability, security, and long-term maintainability, dependencies should point to official repositories. The required changes should be merged into the official vllm-project/FlashMLA repository first. Afterward, this pull request can be updated to use the new commit hash from the official repository.

        GIT_REPOSITORY https://github.com/vllm-project/FlashMLA
        GIT_TAG <new_commit_hash_from_official_repo>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant