Skip to content

chore(wren-ai-service): minor updates #1695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@
from src.pipelines.generation.utils.sql import (
SQL_GENERATION_MODEL_KWARGS,
SQLGenPostProcessor,
calculated_field_instructions,
construct_ask_history_messages,
construct_instructions,
metric_instructions,
sql_generation_system_prompt,
)
from src.pipelines.retrieval.sql_functions import SqlFunction
Expand All @@ -34,9 +36,12 @@
{{ document }}
{% endfor %}

{% if instructions %}
### INSTRUCTIONS ###
{{ instructions }}
{% if calculated_field_instructions %}
{{ calculated_field_instructions }}
{% endif %}

{% if metric_instructions %}
{{ metric_instructions }}
{% endif %}

{% if sql_functions %}
Expand All @@ -56,9 +61,15 @@
{% endfor %}
{% endif %}

{% if instructions %}
### USER INSTRUCTIONS ###
{% for instruction in instructions %}
{{ loop.index }}. {{ instruction }}
{% endfor %}
{% endif %}

### QUESTION ###
User's Follow-up Question: {{ query }}
Current Time: {{ current_time }}

### REASONING PLAN ###
{{ sql_generation_reasoning }}
Expand Down Expand Up @@ -87,11 +98,12 @@ def prompt(
sql_generation_reasoning=sql_generation_reasoning,
instructions=construct_instructions(
configuration,
has_calculated_field,
has_metric,
instructions,
),
current_time=configuration.show_current_time(),
calculated_field_instructions=calculated_field_instructions
if has_calculated_field
else "",
metric_instructions=metric_instructions if has_metric else "",
sql_samples=sql_samples,
sql_functions=sql_functions,
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,14 @@
from src.core.provider import LLMProvider
from src.pipelines.generation.utils.sql import (
construct_instructions,
sql_generation_reasoning_system_prompt,
)
from src.web.v1.services import Configuration
from src.web.v1.services.ask import AskHistory

logger = logging.getLogger("wren-ai-service")


sql_generation_reasoning_system_prompt = """
### TASK ###
You are a helpful data analyst who is great at thinking deeply and reasoning about the user's question and the database schema, and you provide a step-by-step reasoning plan in order to answer the user's question.

### INSTRUCTIONS ###
1. Think deeply and reason about the user's question and the database schema, and should consider the user's query history.
2. Give a step by step reasoning plan in order to answer user's question.
3. The reasoning plan should be in the language same as the language user provided in the input.
4. Make sure to consider the current time provided in the input if the user's question is related to the date/time.
5. Don't include SQL in the reasoning plan.
6. Each step in the reasoning plan must start with a number, a title(in bold format in markdown), and a reasoning for the step.
7. If SQL SAMPLES are provided, make sure to consider them in the reasoning plan.
8. If INSTRUCTIONS section is provided, please follow them strictly.
9. Do not include ```markdown or ``` in the answer.
10. A table name in the reasoning plan must be in this format: `table: <table_name>`.
11. A column name in the reasoning plan must be in this format: `column: <table_name>.<column_name>`.

### FINAL ANSWER FORMAT ###
The final answer must be a reasoning plan in plain Markdown string format
"""

sql_generation_reasoning_user_prompt_template = """
### DATABASE SCHEMA ###
{% for document in documents %}
Expand All @@ -57,8 +37,10 @@
{% endif %}

{% if instructions %}
### INSTRUCTIONS ###
{{ instructions }}
### USER INSTRUCTIONS ###
{% for instruction in instructions %}
{{ loop.index }}. {{ instruction }}
{% endfor %}
{% endif %}

### User's QUERY HISTORY ###
Expand All @@ -71,7 +53,6 @@

### QUESTION ###
User's Question: {{ query }}
Current Time: {{ current_time }}
Language: {{ language }}

Let's think step by step.
Expand All @@ -98,7 +79,6 @@ def prompt(
configuration=configuration,
instructions=instructions,
),
current_time=configuration.show_current_time(),
language=configuration.language,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,12 @@

### Instructions ###
- **Follow the user's previous questions:** If there are previous questions, try to understand the user's current question as following the previous questions.
- **Consider Both Inputs:** Combine the user's current question and their previous questions together to identify the user's true intent.
- **Consider Context of Inputs:** Combine the user's current question, their previous questions, and the user's instructions together to identify the user's true intent.
- **Rephrase Question":** Rewrite follow-up questions into full standalone questions using prior conversation context."
- **Concise Reasoning:** The reasoning must be clear, concise, and limited to 20 words.
- **Language Consistency:** Use the same language as specified in the user's output language for the rephrased question and reasoning.
- **Vague Queries:** If the question is vague or does not related to a table or property from the schema, classify it as `MISLEADING_QUERY`.
- **Time-related Queries:** Don't rephrase time-related information in the user's question.

### Intent Definitions ###

Expand Down Expand Up @@ -120,8 +121,10 @@
{% endif %}

{% if instructions %}
### INSTRUCTIONS ###
{{ instructions }}
### USER INSTRUCTIONS ###
{% for instruction in instructions %}
{{ loop.index }}. {{ instruction }}
{% endfor %}
{% endif %}

### USER GUIDE ###
Expand All @@ -141,7 +144,6 @@
{% endif %}

User's current question: {{query}}
Current Time: {{ current_time }}
Output Language: {{ language }}

Let's think step by step
Expand Down Expand Up @@ -273,7 +275,6 @@ def prompt(
instructions=instructions,
configuration=configuration,
),
current_time=configuration.show_current_time(),
docs=wren_ai_docs,
)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import logging
import sys
from datetime import datetime
from typing import Any

import orjson
Expand All @@ -22,7 +21,6 @@ def prompt(
mdl: dict,
previous_questions: list[str],
language: str,
current_date: str,
max_questions: int,
max_categories: int,
prompt_builder: PromptBuilder,
Expand All @@ -37,7 +35,6 @@ def prompt(
models=[] if previous_questions else mdl.get("models", []),
previous_questions=previous_questions,
language=language,
current_date=current_date,
max_questions=max_questions,
max_categories=max_categories,
)
Expand Down Expand Up @@ -222,8 +219,6 @@ class QuestionResult(BaseModel):
Categories: {{categories}}
{% endif %}

Current Date: {{current_date}}

Please generate {{max_questions}} insightful questions for each of the {{max_categories}} categories based on the provided data model. Both the questions and category names should be translated into {{language}}{% if user_question %} and be related to the user's question{% endif %}. The output format should maintain the structure but with localized text.
"""

Expand Down Expand Up @@ -255,7 +250,6 @@ async def run(
previous_questions: list[str] = [],
categories: list[str] = [],
language: str = "en",
current_date: str = datetime.now().strftime("%Y-%m-%d %A %H:%M:%S"),
max_questions: int = 5,
max_categories: int = 3,
**_,
Expand All @@ -268,7 +262,6 @@ async def run(
"previous_questions": previous_questions,
"categories": categories,
"language": language,
"current_date": current_date,
"max_questions": max_questions,
"max_categories": max_categories,
**self._components,
Expand All @@ -286,7 +279,6 @@ async def run(
previous_questions=[],
categories=[],
language="en",
current_date=datetime.now().strftime("%Y-%m-%d %A %H:%M:%S"),
max_questions=5,
max_categories=3,
)
26 changes: 19 additions & 7 deletions wren-ai-service/src/pipelines/generation/sql_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
from src.pipelines.generation.utils.sql import (
SQL_GENERATION_MODEL_KWARGS,
SQLGenPostProcessor,
calculated_field_instructions,
construct_instructions,
metric_instructions,
sql_generation_system_prompt,
)
from src.pipelines.retrieval.sql_functions import SqlFunction
Expand All @@ -28,9 +30,12 @@
{{ document }}
{% endfor %}

{% if instructions %}
### INSTRUCTIONS ###
{{ instructions }}
{% if calculated_field_instructions %}
{{ calculated_field_instructions }}
{% endif %}

{% if metric_instructions %}
{{ metric_instructions }}
{% endif %}

{% if sql_functions %}
Expand All @@ -50,9 +55,15 @@
{% endfor %}
{% endif %}

{% if instructions %}
### USER INSTRUCTIONS ###
{% for instruction in instructions %}
{{ loop.index }}. {{ instruction }}
{% endfor %}
{% endif %}

### QUESTION ###
User's Question: {{ query }}
Current Time: {{ current_time }}

{% if sql_generation_reasoning %}
### REASONING PLAN ###
Expand Down Expand Up @@ -83,12 +94,13 @@ def prompt(
sql_generation_reasoning=sql_generation_reasoning,
instructions=construct_instructions(
configuration,
has_calculated_field,
has_metric,
instructions,
),
calculated_field_instructions=calculated_field_instructions
if has_calculated_field
else "",
metric_instructions=metric_instructions if has_metric else "",
sql_samples=sql_samples,
current_time=configuration.show_current_time(),
sql_functions=sql_functions,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,33 +10,15 @@

from src.core.pipeline import BasicPipeline
from src.core.provider import LLMProvider
from src.pipelines.generation.utils.sql import construct_instructions
from src.pipelines.generation.utils.sql import (
construct_instructions,
sql_generation_reasoning_system_prompt,
)
from src.web.v1.services import Configuration

logger = logging.getLogger("wren-ai-service")


sql_generation_reasoning_system_prompt = """
### TASK ###
You are a helpful data analyst who is great at thinking deeply and reasoning about the user's question and the database schema, and you provide a step-by-step reasoning plan in order to answer the user's question.

### INSTRUCTIONS ###
1. Think deeply and reason about the user's question and the database schema.
2. Give a step by step reasoning plan in order to answer user's question.
3. The reasoning plan should be in the language same as the language user provided in the input.
4. Make sure to consider the current time provided in the input if the user's question is related to the date/time.
5. Don't include SQL in the reasoning plan.
6. Each step in the reasoning plan must start with a number, a title(in bold format in markdown), and a reasoning for the step.
7. If SQL SAMPLES section is provided, make sure to consider them in the reasoning plan.
8. If INSTRUCTIONS section is provided, please follow them strictly.
9. Do not include ```markdown or ``` in the answer.
10. A table name in the reasoning plan must be in this format: `table: <table_name>`.
11. A column name in the reasoning plan must be in this format: `column: <table_name>.<column_name>`.

### FINAL ANSWER FORMAT ###
The final answer must be a reasoning plan in plain Markdown string format
"""

sql_generation_reasoning_user_prompt_template = """
### DATABASE SCHEMA ###
{% for document in documents %}
Expand All @@ -54,13 +36,14 @@
{% endif %}

{% if instructions %}
### INSTRUCTIONS ###
{{ instructions }}
### USER INSTRUCTIONS ###
{% for instruction in instructions %}
{{ loop.index }}. {{ instruction }}
{% endfor %}
{% endif %}

### QUESTION ###
User's Question: {{ query }}
Current Time: {{ current_time }}
Language: {{ language }}

Let's think step by step.
Expand All @@ -85,7 +68,6 @@ def prompt(
instructions=instructions,
configuration=configuration,
),
current_time=configuration.show_current_time(),
language=configuration.language,
)

Expand Down
Loading
Loading