Data Import

Bulk-load records from a file into your API's Elasticsearch index — create a new model on the fly or populate an existing one.

What is Data Import?

Data Import lets you upload a structured file (CSV, Excel, or an archive) and have SearchAPI read each row, create the necessary Elasticsearch documents, and make the records immediately queryable via your API endpoints. The import runs as an asynchronous job — you can monitor progress in real time on the same page, then review any row-level errors in the job detail view.

Supported File Formats

The following formats are accepted. The maximum file size is 100 MB.

CSV — plain text, one row per record. You must specify the delimiter character (default: ,).
XLSX — Excel workbook. The first sheet is used.
ZIP / TAR / TAR.GZ / TAR.XZ — compressed archives. The archive must contain a single CSV or XLSX file.

Import Modes

Before uploading a file, choose one of two Import Modes:

Create New Model + Mapping + Endpoint — SearchAPI reads the file headers, infers a field type for each column, and lets you review and adjust the mapping before the job starts. A new model and a new endpoint are created automatically in the selected API version, and all rows are indexed.
Import Into Existing Model — maps each field of an existing model to a column in your file. Records are upserted into the model's current Elasticsearch index. Useful for refreshing or extending data without changing the model definition.

Selecting API and Version

Navigate to Data Import in the sidebar and click + New Import. Select the target API and API Version from the dropdowns. The version dropdown only shows published versions, as records can only be indexed into a deployed Elasticsearch index.

New Import — select API, version and import mode

File Settings

After selecting the import mode, configure the following settings before or alongside choosing your file:

CSV Delimiter — a single character that separates columns (e.g. ,, ;, \t). Required when the file is a CSV. Has no effect for XLSX or archives.
File has header row — checked by default. When enabled, the first row of the file is treated as column names; when unchecked, the system generates positional names (col_0, col_1, …).
File — click the file picker and select your file. Uploading a file immediately triggers a header-read request so the field configuration table can be populated.

New Model: Field Configuration

In Create New Model mode, two additional inputs appear before the field table:

Model Name — the name of the new model (max 50 characters, letters, digits and underscores only).
Endpoint Path — the URL path segment for the new endpoint (e.g. /orders, max 80 characters).

Once a file is loaded, the system reads up to 1 000 sample rows, infers a field type for each column, and renders a configuration row per column:

Field Name — editable. Defaults to the sanitised column header (lowercase, alphanumeric and underscores only). Must be unique across all fields.
Field Type — inferred from sample data. Can be changed to any of the supported types: text, keyword, integer, long, float, date, boolean, json, object, nested.
Primary Key — exactly one field must be marked as the primary key. Its type must be integer or keyword. The system pre-selects a column named id if one exists and its type qualifies.
Required — marks the field as required in the model definition. Primary key fields are always required.

New Model — field configuration table after file upload

Generate Unique Auto-Increment ID

If your file has no natural primary key column, check Generate unique auto increment id. The system will create an auto-increment primary key under the name you specify in the Primary Key Column field (default: id). The column must not be present in the file headers. When this option is active, no field in the configuration table needs to be marked as a primary key.

Object Fields (Relational)

A field configured as type object creates a relational link to another model. Your file column must contain the primary key value of the related record — the import process uses that value to look up the related record and embed it as a full object in the indexed document.

For example, if a products file has a category_id column mapped to an object field pointing to a categories model:

category_id: 5

The system looks up the record with primary key 5 in the categories model and embeds it as an object containing the relation model's primary key plus all fields marked as visible in related models. Fields not marked as visible are not included. The indexed document will look like:

{
  "id": 1,
  "name": "Running Shoes",
  "category": {
    "id": 5,
    "name": "Sports",
    "slug": "sports"
  }
}

If the primary key value in a row does not match any record in the relation model, that row fails with an error and is logged — other rows are not affected. The source column type and the relation model's primary key type must also be compatible (integers match integers; text and keyword match each other). The relation model must have 10 000 records or fewer: the system preloads all its records into memory before writing begins, so exceeding this limit causes the import to be rejected at validation time — before any rows are written.

Nested Fields

A field configured as type nested stores an array of embedded objects inside a single document. Like object fields, the system looks up related records from a relation model and embeds them — the difference is that nested fields hold multiple related records per row, stored as an Elasticsearch nested array.

The source column must contain a JSON value (detected automatically during header loading). The system supports two column formats:

Format 1 — flat array of IDs

The column contains a JSON array of plain scalar values. Each value is the primary key of a related record. Leave the Nested Primary Key field empty when using this format.

tag_ids: [1, 5, 9]

The system looks up records with primary keys 1, 5, and 9 in the relation model and embeds each one as an object containing the relation model's primary key plus all fields marked as visible in related models. The indexed document will look like:

{
  "id": 42,
  "title": "Laptop Stand",
  "tags": [
    { "id": 1, "name": "electronics", "color": "blue" },
    { "id": 5, "name": "accessories", "color": "gray" },
    { "id": 9, "name": "sale",        "color": "red"  }
  ]
}

Format 2 — array of objects with an ID key

The column contains a JSON array of objects, where each object holds a foreign key alongside other properties. Set the Nested Primary Key to the key name that holds the foreign key (e.g. tag_id). The system extracts that key from each object, looks up the related record, and embeds it — any extra keys in the source object are ignored.

tags: [{"tag_id": 1, "qty": 3}, {"tag_id": 5, "qty": 1}]

The system reads tag_id from each object, looks up those records, and produces the same embedded result — the qty key from the source is not carried over:

{
  "id": 42,
  "title": "Laptop Stand",
  "tags": [
    { "id": 1, "name": "electronics", "color": "blue" },
    { "id": 5, "name": "accessories", "color": "gray" }
  ]
}

If any ID in the array does not match a record in the relation model, the entire row fails with an error and is logged — other rows are not affected. The relation model must have 10 000 records or fewer: the system preloads all its records into memory before writing begins, so exceeding this limit causes the import to be rejected at validation time — before any rows are written.

Each source column can only be mapped to one field. Object and nested fields each consume one column exclusively.

Aggregation Fields

In Create New Model mode, the Aggregation Fields selector lets you choose which fields should be enabled for aggregation queries on the GET list endpoint that is created automatically with the import.

The selector is populated once your field configuration is complete and lists two categories of options:

Scalar fields (text, keyword, integer, etc.) — appear directly by their field name (e.g. status, category).
Sub-fields of object / nested relations — appear using dot notation: parentField.subField (e.g. category.name). All fields of the related model that are marked as visible in related models are listed, as well as the relation model's primary key.

Aggregation fields are registered on the GET list endpoint immediately when the import job is created — before any rows are written. For more information on how aggregation works on endpoints, see Endpoints.

New Model — selecting aggregation fields before starting the import

Existing Model: Column Mapping

In Import Into Existing Model mode, select a model from the Model dropdown. The system lists every field defined in that model and asks you to assign a file column to each one.

Constraints for column mapping:

Every field must be mapped to a column — no field can be left unmapped.
Each file column can only be mapped to one field.
Nested fields require the mapped column to be a JSON column, and you can optionally specify the nested primary key from the keys detected in that column.
If Generate unique auto increment id is checked, the primary key field is excluded from the mapping — the system will generate a new ID for each row.

Starting the Import

Click Start Import. The system validates all inputs, creates an import job, and returns a job ID immediately. A toast notification confirms that the job has been queued (e.g. "Import queued (Job #42)"). The Job Status panel appears below the form and begins polling for progress every 2.5 seconds.

In Create New Model mode, the model, endpoint, and Elasticsearch index are created synchronously before the job is queued. Only the row-level data writing runs in the background. This means the endpoint is registered before any rows are written.

Job status panel — import queued after clicking Start Import

Job Progress

The Job Status panel shows real-time progress while the import is running:

Status — current job status (see below).
Row counters — processed / total | success | failed.
Start / Finish times — ISO timestamps shown once available.

Job Statuses

queued — job is waiting in the processing queue.
running — rows are actively being written to Elasticsearch.
completed — all rows were written successfully.
completed_with_errors — the job finished but one or more rows failed. Successful rows are indexed; failed rows are logged.
failed — a fatal error stopped the job before it could finish. No rows were written after the point of failure.

Error Rows

When a job ends with status completed_with_errors or has a non-zero failed row count, a Load Error Rows button appears in the Job Status panel. Clicking it fetches up to 100 error rows and displays them in a table with the following columns:

Row — the 1-based row number in the original file.
Primary Key — the primary key value of the failing row, if available.
Error — a human-readable message explaining why the row failed.

Quality Gate — automatic job termination

To prevent runaway imports from filling the index with garbage data, the system monitors error patterns during the run and stops the job early if either of two thresholds is crossed. When a quality gate fires the job transitions to failed — not completed_with_errors — and the reason is recorded in the error summary.

100 consecutive row errors — if 100 rows in a row all fail without a single successful row in between, the job is stopped immediately. A successfully written row resets the counter to zero.
80 % error rate — checked every 1 000 processed rows. If at least 80 % of all rows processed so far have failed, the job is stopped. This check only activates after the first 1 000 rows have been processed.

The system stores up to 1 000 error rows in the database. If more rows fail beyond that limit the errors are still counted in the failed-row total but their details are not saved and will not appear in the error rows table.

Import Job List

Navigate to Data Import in the sidebar to see all previous import jobs. The list updates automatically every 3 seconds when any job is still queued or running. Each row shows the API, model, status or live progress bar, row counters, and start/end timestamps. Click the inspect icon to open the Job Detail page.

Import Job List — live progress bar on running jobs

Import Job Detail

The Job Detail page provides the complete record of a single import job, including:

Import Job Detail — stat cards, metadata and error rows table

Stat cards — total rows, successful rows, failed rows, progress percentage.
Full metadata — job ID, API, import mode, model, endpoint path, primary key column, generate-unique-ID flag, has-header flag, delimiter, file name and type.
Error summary — a badge indicating whether a fatal error occurred and its message.
Error rows table — row number, primary key, error message, raw row data, and timestamp for every failed row.