[Inference Providers] Update partners API documentation (#1717)

SBrandeis · hanouticelina · web-flow · commit de7b94e25f91 · 2025-04-29T16:18:03.000+02:00
* inference providers documentation updates

* mention updates in JS client

* Apply suggestions from code review

Co-authored-by: célina &lt;hanouticelina@gmail.com&gt;

---------

Co-authored-by: célina &lt;hanouticelina@gmail.com&gt;
diff --git a/docs/inference-providers/register-as-a-provider.md b/docs/inference-providers/register-as-a-provider.md
@@ -154,14 +154,46 @@ Create a new mapping item, with the following body (JSON-encoded):
 - `hfModel` is the model id on the Hub's side.
 - `providerModel` is the model id on your side (can be the same or different).
 
-In the future, we will add support for a new parameter (ping us if it's important to you now):
+The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.
+
+### Using a tag-filter to map several HF models to a single inference endpoint
+
+We also support mapping HF models based on their `tags`. Using tag filters, you can automatically map multiple HF models to a single inference endpoint on your side.
+For example, any model tagged with both `lora` and `base_model:adapter:black-forest-labs/FLUX.1-dev` can be mapped to your Flux-dev LoRA inference endpoint.
+
+
+<Tip>
+
+Important: Make sure that the JS client library can handle LoRA weights for your provider. Check out [fal's implementation](https://github.com/huggingface/huggingface.js/blob/904964c9f8cd10ed67114ccb88b9028e89fd6cad/packages/inference/src/providers/fal-ai.ts#L78-L124) for more details. 
+
+</Tip>
+
+The API is as follows:
+
+```http
+POST /api/partners/{provider}/models
+```
+Create a new mapping item, with the following body (JSON-encoded):
+
 ```json
 {
-    "hfFilter": ["string"]
-    // ^Power user move: register a "tag" slice of HF in one go.
-    // Example: tag == "base_model:adapter:black-forest-labs/FLUX.1-dev" for all Flux-dev LoRAs
+    "type": "tag-filter", // required
+    "task": "WidgetType", // required
+    "tags": ["string"], // required: any HF model with all of those tags will be mapped to providerModel
+    "providerModel": "string", // required: the partner's "model id" i.e. id on your side
+    "adapterType": "lora", // required: only "lora" is supported at the moment
+    "status": "live" | "staging" // Optional: defaults to "staging". "staging" models are only available to members of the partner's org, then you switch them to "live" when they're ready to go live
 }
 ```
+
+- `task`, also known as `pipeline_tag` in the HF ecosystem, is the type of model / type of API
+(examples: "text-to-image", "text-generation", but you should use "conversational" for chat models)
+- `tags` is the set of model tags to match. For example, to match all LoRAs of Flux, you can use: `["lora", "base_model:adapter:black-forest-labs/FLUX.1-dev"]` 
+- `providerModel` is the model ID on your side (can be the same or different from the HF model ID).
+- `adapterType` is a literal value that helps client libraries interpret how to call your API. The only supported value at the moment is `"lora"`.
+
+The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.
+
 #### Authentication
 
 You need to be in the _provider_ Hub organization (e.g. https://huggingface.co/togethercomputer
@@ -178,26 +210,31 @@ huggingface.js/inference call of the corresponding task i.e. the API specs are v
 ### Delete a mapping item
 
 ```http
-DELETE /api/partners/{provider}/models?hfModel=namespace/model-name
+DELETE /api/partners/{provider}/models/{mapping ID}
 ```
 
+Where `mapping ID` is the mapping's id obtained upon creation.
+You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).
+
 ### Update a mapping item's status
 
 Call this HTTP PUT endpoint:
 
 ```http
-PUT /api/partners/{provider}/models/status
+PUT /api/partners/{provider}/models/{mapping ID}/status
 ```
 
 With the following body (JSON-encoded):
 
 ```json
 {
-    "hfModel": "namespace/model-name", // The name of the model on HF
     "status": "live" | "staging" // The new status, one of "staging" or "live"
 }   
 ```
 
+Where `mapping ID` is the mapping's id obtained upon creation.
+You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).
+
 ### List the whole mapping
 
 ```http
@@ -217,26 +254,41 @@ Here is an example of response:
 {
     "text-to-image": {
         "black-forest-labs/FLUX.1-Canny-dev": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
             "providerId": "black-forest-labs/FLUX.1-canny",
             "status": "live"
         },
         "black-forest-labs/FLUX.1-Depth-dev": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
             "providerId": "black-forest-labs/FLUX.1-depth",
             "status": "live"
+        },
+        "tag-filter=base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0,lora": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
+            "status": "live",
+            "providerId": "sdxl-lora-mutualized",
+            "adapterType": "lora",
+            "tags": [
+                "base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0",
+                "lora"
+            ]
         }
     },
     "conversational": {
         "deepseek-ai/DeepSeek-R1": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
             "providerId": "deepseek-ai/DeepSeek-R1",
             "status": "live"
         }
     },
     "text-generation": {
         "meta-llama/Llama-2-70b-hf": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
             "providerId": "meta-llama/Llama-2-70b-hf",
             "status": "live"
         },
         "mistralai/Mixtral-8x7B-v0.1": {
+            "_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
             "providerId": "mistralai/Mixtral-8x7B-v0.1",
             "status": "live"
         }
@@ -264,9 +316,11 @@ provide the cost for each request via an HTTP API you host on your end.
 We ask that you expose an API that supports a HTTP POST request.
 The body of the request is a JSON-encoded object containing a list of request IDs for which we
 request the cost.
+The authentication system should be the same as your Inference service; for example, a bearer token.
 
 ```http
 POST {your URL here}
+Authorization: {authentication info - eg "Bearer token"}
 Content-Type: application/json
 
 {
@@ -297,7 +351,7 @@ Content-Type: application/json
 
 ### Price Unit
 
-We require the price to be an **integer** number of **nano-USDs** (10^-9 USD).
+We require the price to be a **non-negative integer** number of **nano-USDs** (10^-9 USD).
 
 ### How to define the request ID