Skip to content

Commit de7b94e

Browse files
[Inference Providers] Update partners API documentation (#1717)
* inference providers documentation updates * mention updates in JS client * Apply suggestions from code review Co-authored-by: célina <[email protected]> --------- Co-authored-by: célina <[email protected]>
1 parent 2258ae7 commit de7b94e

File tree

1 file changed

+62
-8
lines changed

1 file changed

+62
-8
lines changed

docs/inference-providers/register-as-a-provider.md

Lines changed: 62 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -154,14 +154,46 @@ Create a new mapping item, with the following body (JSON-encoded):
154154
- `hfModel` is the model id on the Hub's side.
155155
- `providerModel` is the model id on your side (can be the same or different).
156156

157-
In the future, we will add support for a new parameter (ping us if it's important to you now):
157+
The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.
158+
159+
### Using a tag-filter to map several HF models to a single inference endpoint
160+
161+
We also support mapping HF models based on their `tags`. Using tag filters, you can automatically map multiple HF models to a single inference endpoint on your side.
162+
For example, any model tagged with both `lora` and `base_model:adapter:black-forest-labs/FLUX.1-dev` can be mapped to your Flux-dev LoRA inference endpoint.
163+
164+
165+
<Tip>
166+
167+
Important: Make sure that the JS client library can handle LoRA weights for your provider. Check out [fal's implementation](https://github.com/huggingface/huggingface.js/blob/904964c9f8cd10ed67114ccb88b9028e89fd6cad/packages/inference/src/providers/fal-ai.ts#L78-L124) for more details.
168+
169+
</Tip>
170+
171+
The API is as follows:
172+
173+
```http
174+
POST /api/partners/{provider}/models
175+
```
176+
Create a new mapping item, with the following body (JSON-encoded):
177+
158178
```json
159179
{
160-
"hfFilter": ["string"]
161-
// ^Power user move: register a "tag" slice of HF in one go.
162-
// Example: tag == "base_model:adapter:black-forest-labs/FLUX.1-dev" for all Flux-dev LoRAs
180+
"type": "tag-filter", // required
181+
"task": "WidgetType", // required
182+
"tags": ["string"], // required: any HF model with all of those tags will be mapped to providerModel
183+
"providerModel": "string", // required: the partner's "model id" i.e. id on your side
184+
"adapterType": "lora", // required: only "lora" is supported at the moment
185+
"status": "live" | "staging" // Optional: defaults to "staging". "staging" models are only available to members of the partner's org, then you switch them to "live" when they're ready to go live
163186
}
164187
```
188+
189+
- `task`, also known as `pipeline_tag` in the HF ecosystem, is the type of model / type of API
190+
(examples: "text-to-image", "text-generation", but you should use "conversational" for chat models)
191+
- `tags` is the set of model tags to match. For example, to match all LoRAs of Flux, you can use: `["lora", "base_model:adapter:black-forest-labs/FLUX.1-dev"]`
192+
- `providerModel` is the model ID on your side (can be the same or different from the HF model ID).
193+
- `adapterType` is a literal value that helps client libraries interpret how to call your API. The only supported value at the moment is `"lora"`.
194+
195+
The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.
196+
165197
#### Authentication
166198

167199
You need to be in the _provider_ Hub organization (e.g. https://huggingface.co/togethercomputer
@@ -178,26 +210,31 @@ huggingface.js/inference call of the corresponding task i.e. the API specs are v
178210
### Delete a mapping item
179211

180212
```http
181-
DELETE /api/partners/{provider}/models?hfModel=namespace/model-name
213+
DELETE /api/partners/{provider}/models/{mapping ID}
182214
```
183215

216+
Where `mapping ID` is the mapping's id obtained upon creation.
217+
You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).
218+
184219
### Update a mapping item's status
185220

186221
Call this HTTP PUT endpoint:
187222

188223
```http
189-
PUT /api/partners/{provider}/models/status
224+
PUT /api/partners/{provider}/models/{mapping ID}/status
190225
```
191226

192227
With the following body (JSON-encoded):
193228

194229
```json
195230
{
196-
"hfModel": "namespace/model-name", // The name of the model on HF
197231
"status": "live" | "staging" // The new status, one of "staging" or "live"
198232
}
199233
```
200234

235+
Where `mapping ID` is the mapping's id obtained upon creation.
236+
You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).
237+
201238
### List the whole mapping
202239

203240
```http
@@ -217,26 +254,41 @@ Here is an example of response:
217254
{
218255
"text-to-image": {
219256
"black-forest-labs/FLUX.1-Canny-dev": {
257+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
220258
"providerId": "black-forest-labs/FLUX.1-canny",
221259
"status": "live"
222260
},
223261
"black-forest-labs/FLUX.1-Depth-dev": {
262+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
224263
"providerId": "black-forest-labs/FLUX.1-depth",
225264
"status": "live"
265+
},
266+
"tag-filter=base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0,lora": {
267+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
268+
"status": "live",
269+
"providerId": "sdxl-lora-mutualized",
270+
"adapterType": "lora",
271+
"tags": [
272+
"base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0",
273+
"lora"
274+
]
226275
}
227276
},
228277
"conversational": {
229278
"deepseek-ai/DeepSeek-R1": {
279+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
230280
"providerId": "deepseek-ai/DeepSeek-R1",
231281
"status": "live"
232282
}
233283
},
234284
"text-generation": {
235285
"meta-llama/Llama-2-70b-hf": {
286+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
236287
"providerId": "meta-llama/Llama-2-70b-hf",
237288
"status": "live"
238289
},
239290
"mistralai/Mixtral-8x7B-v0.1": {
291+
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
240292
"providerId": "mistralai/Mixtral-8x7B-v0.1",
241293
"status": "live"
242294
}
@@ -264,9 +316,11 @@ provide the cost for each request via an HTTP API you host on your end.
264316
We ask that you expose an API that supports a HTTP POST request.
265317
The body of the request is a JSON-encoded object containing a list of request IDs for which we
266318
request the cost.
319+
The authentication system should be the same as your Inference service; for example, a bearer token.
267320

268321
```http
269322
POST {your URL here}
323+
Authorization: {authentication info - eg "Bearer token"}
270324
Content-Type: application/json
271325
272326
{
@@ -297,7 +351,7 @@ Content-Type: application/json
297351

298352
### Price Unit
299353

300-
We require the price to be an **integer** number of **nano-USDs** (10^-9 USD).
354+
We require the price to be a **non-negative integer** number of **nano-USDs** (10^-9 USD).
301355

302356
### How to define the request ID
303357

0 commit comments

Comments
 (0)