Skip to content

[Question]: cannot import name 'BM25Retriever' from 'pipelines.nodes' (/usr/local/python3.7.0/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/__init__.py) #6076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PNightOwlY opened this issue Jun 2, 2023 · 3 comments
Assignees
Labels
question Further information is requested triage

Comments

@PNightOwlY
Copy link

请提出你的问题

docker pull registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda10.2-cudnn7
nvidia-docker run -d --name paddlenlp_pipelines_gpu --net host -ti registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda10.2-cudnn7

安装的gpu镜像,pip list | grep paddle 查看paddle的版本为
paddle-bfloat 0.1.7
paddle2onnx 0.9.8
paddlefsl 1.1.0
paddlenlp 2.3.0.dev0
paddleocr 2.5.0.3
paddlepaddle-gpu 2.3.1

运行多路召回的example 无法找到对应的BM25Retriever node

@PNightOwlY PNightOwlY added the question Further information is requested label Jun 2, 2023
@github-actions github-actions bot added the triage label Jun 2, 2023
@w5688414
Copy link
Contributor

w5688414 commented Jun 2, 2023

您好,多路召回在0.5版本才加入,需要您升级成0.5版本后才可以使用。Docker镜像需要按照教程,用最新的Paddle的Docker重新打一个。
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/docker

@w5688414 w5688414 self-assigned this Jun 2, 2023
@PNightOwlY
Copy link
Author

您好,多路召回在0.5版本才加入,需要您升级成0.5版本后才可以使用。Docker镜像需要按照教程,用最新的Paddle的Docker重新打一个。
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/docker

谢谢回复!我通过下载paddle的安装包,然后把缺失的环境包都替换了,也成功了!

但是我昨天遇到一个效果问题,我在医疗这个数据集上进行了base和nano的测试,发现base的效果要比nano的效果差,请问这是什么原因呢?

base的配置

version: '1.1.0'

components:    # define all the building-blocks for Pipeline
  - name: DocumentStore
    type: ElasticsearchDocumentStore  # consider using MilvusDocumentStore or WeaviateDocumentStore for scaling to large number of documents
    params:
      host: 172.18.159.16
      port: 9200
      index: ccks_base_encoder
      embedding_dim: 768
  - name: Retriever
    type: DensePassageRetriever
    params:
      document_store: DocumentStore    # params can reference other components defined in the YAML
      top_k: 10
      query_embedding_model: rocketqa-zh-base-query-encoder
      passage_embedding_model: rocketqa-zh-base-para-encoder
      embed_title: False
  - name: Ranker       # custom-name for the component; helpful for visualization & debugging
    type: ErnieRanker    # pipelines Class name for the component
    params:
      model_name_or_path: rocketqa-base-cross-encoder
      top_k: 3
  - name: TextFileConverter
    type: TextConverter
  - name: ImageFileConverter
    type: ImageToTextConverter
  - name: PDFFileConverter
    type: PDFToTextConverter
  - name: DocxFileConverter
    type: DocxToTextConverter
  - name: Preprocessor
    type: PreProcessor
    params:
      split_by: word
      split_length: 1000
  - name: FileTypeClassifier
    type: FileTypeClassifier

pipelines:
  - name: query    # a sample extractive-qa Pipeline
    type: Query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Ranker
        inputs: [Retriever]
  - name: indexing
    type: Indexing
    nodes:
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextFileConverter
        inputs: [FileTypeClassifier.output_1]
      - name: PDFFileConverter
        inputs: [FileTypeClassifier.output_2]
      - name: DocxFileConverter
        inputs: [FileTypeClassifier.output_4]
      - name: ImageFileConverter
        inputs: [FileTypeClassifier.output_6]
      - name: Preprocessor
        inputs: [PDFFileConverter, TextFileConverter, DocxFileConverter, ImageFileConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

nano

version: '1.1.0'

components:    # define all the building-blocks for Pipeline
  - name: DocumentStore
    type: ElasticsearchDocumentStore  # consider using MilvusDocumentStore or WeaviateDocumentStore for scaling to large number of documents
    params:
      host: 172.18.159.16
      port: 9200
      index: ccks_encoder
      embedding_dim: 312
  - name: Retriever
    type: DensePassageRetriever
    params:
      document_store: DocumentStore    # params can reference other components defined in the YAML
      top_k: 10
      query_embedding_model: rocketqa-zh-nano-query-encoder
      passage_embedding_model: rocketqa-zh-nano-para-encoder
      embed_title: False
  - name: Ranker       # custom-name for the component; helpful for visualization & debugging
    type: ErnieRanker    # pipelines Class name for the component
    params:
      model_name_or_path: rocketqa-nano-cross-encoder
      top_k: 3
  - name: TextFileConverter
    type: TextConverter
  - name: ImageFileConverter
    type: ImageToTextConverter
  - name: PDFFileConverter
    type: PDFToTextConverter
  - name: DocxFileConverter
    type: DocxToTextConverter
  - name: Preprocessor
    type: PreProcessor
    params:
      split_by: word
      split_length: 1000
  - name: FileTypeClassifier
    type: FileTypeClassifier

pipelines:
  - name: query    # a sample extractive-qa Pipeline
    type: Query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Ranker
        inputs: [Retriever]
  - name: indexing
    type: Indexing
    nodes:
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextFileConverter
        inputs: [FileTypeClassifier.output_1]
      - name: PDFFileConverter
        inputs: [FileTypeClassifier.output_2]
      - name: DocxFileConverter
        inputs: [FileTypeClassifier.output_4]
      - name: ImageFileConverter
        inputs: [FileTypeClassifier.output_6]
      - name: Preprocessor
        inputs: [PDFFileConverter, TextFileConverter, DocxFileConverter, ImageFileConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

@w5688414
Copy link
Contributor

w5688414 commented May 8, 2024

有具体的数据不?我们评估的是base比nano强,您可以再检查一下

@paddle-bot paddle-bot bot closed this as completed May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triage
Projects
None yet
Development

No branches or pull requests

3 participants