> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withleaf.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Soil Sampling Overview

> Upload soil sample files in 30+ formats and receive normalized output with standardized analyte values, units, and extraction methods in GeoJSON and canonical JSON formats.

Leaf's Soil Sampling service accepts soil lab data in over 30 file formats and normalizes it into a standard canonical format. You upload `.zip` archives containing shapefiles, CSVs, XML reports, or proprietary formats. Leaf identifies the format, extracts the soil data, and returns a flat GeoJSON result plus a canonical JSON result when that output is available.

<Warning>
  The Soil Sampling service is currently available by invitation only.
</Warning>

## How it works

The service uses an asynchronous batch model. You upload one or more files, then poll for results.

1. **Upload** one or more `.zip` files to the batch endpoint along with a Leaf user ID. Each file becomes an entry within the batch.
2. **Processing.** Leaf classifies the file format, extracts soil sample data, normalizes analyte names and units, and produces both output formats.
3. **Retrieval.** Poll the batch status endpoint. When an entry reaches `COMPLETED`, `downloadStandardGeojson` contains the flat GeoJSON result URL. `downloadCanonicalJson` contains the hierarchical result URL when that output is available.

Most files complete processing within a few minutes. Poll every 5 seconds for the first minute, then back off to every 15–30 seconds. Stop when the batch status is `COMPLETED`, `PARTIALLY_COMPLETED`, or `FAILED`.

## Key concepts

A **batch** is a container for one or more uploaded files, created by a single POST request. Its status reflects the aggregate state of all entries.

An **entry** is one file within a batch. Each entry is processed independently. A batch with three files has three entries, each of which may complete or fail on its own.

Entries move through `PROCESSING` → `COMPLETED` or `FAILED`. Batches follow the same pattern, plus `PARTIALLY_COMPLETED` when some entries succeed and others fail.

| Status                | Level         | Meaning                                  |
| --------------------- | ------------- | ---------------------------------------- |
| `PROCESSING`          | Entry / Batch | Upload received, conversion in progress  |
| `COMPLETED`           | Entry / Batch | All entries converted successfully       |
| `PARTIALLY_COMPLETED` | Batch only    | Some entries completed, some failed      |
| `FAILED`              | Entry / Batch | Conversion failed (check `errorMessage`) |

## Output formats

Completed entries return result URLs in the API response. All entries that reach `COMPLETED` will have a `downloadStandardGeojson` URL. Most supported formats also produce `downloadCanonicalJson`; for formats that don't, the field is `null`.

`downloadStandardGeojson` is a flat GeoJSON FeatureCollection. Each Feature represents one soil sample at one depth. A sample with multiple depth layers (e.g., surface + subsoil) produces multiple Features sharing the same `sampleId`. This format works well for mapping, GIS tools, and spatial queries.

When present, `downloadCanonicalJson` is the full hierarchical data model. It contains an array of `SoilSamplingEvent` objects preserving the natural tree structure: event → samples → depth layers → analyte results. It includes context not present in the GeoJSON: lab information, provenance (source file, format family, converter version), fertilizer recommendations, and category classification for each analyte result. Use this format when you need the complete data model or when you're building data pipelines that benefit from structured nesting.

### GeoJSON properties

| Property       | Type   | Description                                   |
| -------------- | ------ | --------------------------------------------- |
| `eventId`      | string | Unique identifier for the sampling event      |
| `eventDate`    | string | Sampling date (YYYY-MM-DD), or null           |
| `eventCode`    | string | Lab report number or job ID, or null          |
| `sampleId`     | string | Unique identifier for this sample             |
| `sampleNumber` | string | Lab sample number (e.g. "1", "A-1"), or null  |
| `depthLabel`   | string | Human-readable depth (e.g. "0-6 in"), or null |
| `depthTop`     | number | Top of sampling depth, or null                |
| `depthBottom`  | number | Bottom of sampling depth, or null             |
| `depthUnit`    | string | Depth unit ("in" or "cm"), or null            |
| `growerName`   | string | Grower name, or null                          |
| `farmName`     | string | Farm name, or null                            |
| `fieldName`    | string | Field name, or null                           |

Depth fields are all null when the source data does not specify sampling depth. Field context properties (`growerName`, `farmName`, `fieldName`) are omitted entirely when the source has no field metadata.

### Analyte properties

Each analyte result adds up to three properties per Feature:

| Pattern            | Type   | Description                                    |
| ------------------ | ------ | ---------------------------------------------- |
| `{analyte}`        | number | The measured value (e.g. `pH`, `P`, `K`, `OM`) |
| `{analyte}_unit`   | string | Unit of measurement, present when known        |
| `{analyte}_method` | string | Extraction method, present when known          |

A Mehlich-3 phosphorus result at 42 ppm produces:

```json theme={null}
{
  "P": 42.0,
  "P_unit": "ppm",
  "P_method": "MEHLICH_3"
}
```

A pH value with no known method or unit produces just `"pH": 6.4`.

### Common analytes

| Analyte                  | Property | Typical units |
| ------------------------ | -------- | ------------- |
| pH                       | `pH`     | (unitless)    |
| Organic matter           | `OM`     | %             |
| Phosphorus               | `P`      | ppm           |
| Potassium                | `K`      | ppm           |
| Calcium                  | `Ca`     | ppm           |
| Magnesium                | `Mg`     | ppm           |
| Cation exchange capacity | `CEC`    | meq/100g      |
| Buffer pH                | `BpH`    | (unitless)    |
| Nitrate-nitrogen         | `NO3_N`  | ppm           |

The full set of analytes depends on the input format and lab. Leaf normalizes over 750 column name variations into approximately 70 standard analyte properties.

### GeoJSON example

A two-sample FeatureCollection from a shapefile with Mehlich-3 extraction:

```json theme={null}
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-89.4523, 40.1234]
      },
      "properties": {
        "eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        "eventDate": "2025-10-15",
        "sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654321",
        "sampleNumber": "1",
        "depthId": "d1a2b3c4-e5f6-7890-abcd-111111111111",
        "growerName": "Smith Farms",
        "farmName": "North 40",
        "fieldName": "Section 12",
        "pH": 6.4,
        "OM": 3.2,
        "OM_unit": "%",
        "P": 42.0,
        "P_unit": "ppm",
        "P_method": "MEHLICH_3",
        "K": 185.0,
        "K_unit": "ppm",
        "K_method": "MEHLICH_3",
        "CEC": 14.2,
        "CEC_unit": "meq/100g"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-89.4531, 40.1242]
      },
      "properties": {
        "eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        "eventDate": "2025-10-15",
        "sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654322",
        "sampleNumber": "2",
        "depthId": "d1a2b3c4-e5f6-7890-abcd-222222222222",
        "growerName": "Smith Farms",
        "farmName": "North 40",
        "fieldName": "Section 12",
        "pH": 6.8,
        "OM": 2.9,
        "OM_unit": "%",
        "P": 38.0,
        "P_unit": "ppm",
        "P_method": "MEHLICH_3",
        "K": 210.0,
        "K_unit": "ppm",
        "K_method": "MEHLICH_3",
        "CEC": 15.1,
        "CEC_unit": "meq/100g"
      }
    }
  ]
}
```

### Canonical JSON structure

The canonical JSON output is an array of `SoilSamplingEvent` objects. Each event represents one sampling trip, date, or lab report from the source file.

```
SoilSamplingEvent
├── field_context     (grower, farm, field names and IDs)
├── lab               (lab name, received/processed dates)
├── provenance        (source file, format family, converter version)
├── samples[]
│   ├── geometry      (GeoJSON Point or Polygon)
│   └── depths[]
│       └── results[] (analyte, value, unit, method, category)
└── recommendations[] (fertilizer recommendations, when present)
```

Each `AnalyteResult` in the canonical format includes a `category` field that classifies the measurement:

| Category      | Meaning                             | Examples             |
| ------------- | ----------------------------------- | -------------------- |
| `analyte`     | Direct lab measurement              | pH, P, K, Ca, Mg, OM |
| `index`       | Calculated index                    | P-Index, K-Index     |
| `derived`     | Ratio or derived value              | BS-Ca, BS-K, SAR     |
| `sensor`      | Field instrument reading            | EC (Veris), Red, IR  |
| `passthrough` | Unrecognized column preserved as-is | Varies               |

The `provenance` block on every event records exactly which source file and converter produced the output. Useful for auditing and tracing data lineage.

## File size limits

| Limit                | Value              |
| -------------------- | ------------------ |
| Maximum file size    | 50 MB per file     |
| Maximum request size | 200 MB per request |

## What to do next

* [Supported Formats](/soil/supported-formats) — Full catalog of accepted soil data formats.
* [API Reference: Soil Sampling](/api-reference/soil) — Endpoint reference for uploading files, checking status, and retrieving results.
