Documentation Index
Fetch the complete documentation index at: https://docs.withleaf.io/llms.txt
Use this file to discover all available pages before exploring further.
Leaf’s Soil Sampling service accepts soil lab data in over 30 file formats and normalizes it into a standard canonical format. You upload .zip archives containing shapefiles, CSVs, XML reports, or proprietary formats. Leaf identifies the format, extracts the soil data, and returns a flat GeoJSON result plus a canonical JSON result when that output is available.
The Soil Sampling service is currently available by invitation only.
How it works
The service uses an asynchronous batch model. You upload one or more files, then poll for results.
- Upload one or more
.zip files to the batch endpoint along with a Leaf user ID. Each file becomes an entry within the batch.
- Processing. Leaf classifies the file format, extracts soil sample data, normalizes analyte names and units, and produces both output formats.
- Retrieval. Poll the batch status endpoint. When an entry reaches
COMPLETED, downloadStandardGeojson contains the flat GeoJSON result URL. downloadCanonicalJson contains the hierarchical result URL when that output is available.
Most files complete processing within a few minutes. Poll every 5 seconds for the first minute, then back off to every 15–30 seconds. Stop when the batch status is COMPLETED, PARTIALLY_COMPLETED, or FAILED.
Key concepts
A batch is a container for one or more uploaded files, created by a single POST request. Its status reflects the aggregate state of all entries.
An entry is one file within a batch. Each entry is processed independently. A batch with three files has three entries, each of which may complete or fail on its own.
Entries move through PROCESSING → COMPLETED or FAILED. Batches follow the same pattern, plus PARTIALLY_COMPLETED when some entries succeed and others fail.
| Status | Level | Meaning |
|---|
PROCESSING | Entry / Batch | Upload received, conversion in progress |
COMPLETED | Entry / Batch | All entries converted successfully |
PARTIALLY_COMPLETED | Batch only | Some entries completed, some failed |
FAILED | Entry / Batch | Conversion failed (check errorMessage) |
Completed entries return result URLs in the API response. All entries that reach COMPLETED will have a downloadStandardGeojson URL. Most supported formats also produce downloadCanonicalJson; for formats that don’t, the field is null.
downloadStandardGeojson is a flat GeoJSON FeatureCollection. Each Feature represents one soil sample at one depth. A sample with multiple depth layers (e.g., surface + subsoil) produces multiple Features sharing the same sampleId. This format works well for mapping, GIS tools, and spatial queries.
When present, downloadCanonicalJson is the full hierarchical data model. It contains an array of SoilSamplingEvent objects preserving the natural tree structure: event → samples → depth layers → analyte results. It includes context not present in the GeoJSON: lab information, provenance (source file, format family, converter version), fertilizer recommendations, and category classification for each analyte result. Use this format when you need the complete data model or when you’re building data pipelines that benefit from structured nesting.
GeoJSON properties
| Property | Type | Description |
|---|
eventId | string | Unique identifier for the sampling event |
eventDate | string | Sampling date (YYYY-MM-DD), or null |
eventCode | string | Lab report number or job ID, or null |
sampleId | string | Unique identifier for this sample |
sampleNumber | string | Lab sample number (e.g. “1”, “A-1”), or null |
depthLabel | string | Human-readable depth (e.g. “0-6 in”), or null |
depthTop | number | Top of sampling depth, or null |
depthBottom | number | Bottom of sampling depth, or null |
depthUnit | string | Depth unit (“in” or “cm”), or null |
growerName | string | Grower name, or null |
farmName | string | Farm name, or null |
fieldName | string | Field name, or null |
Depth fields are all null when the source data does not specify sampling depth. Field context properties (growerName, farmName, fieldName) are omitted entirely when the source has no field metadata.
Analyte properties
Each analyte result adds up to three properties per Feature:
| Pattern | Type | Description |
|---|
{analyte} | number | The measured value (e.g. pH, P, K, OM) |
{analyte}_unit | string | Unit of measurement, present when known |
{analyte}_method | string | Extraction method, present when known |
A Mehlich-3 phosphorus result at 42 ppm produces:
{
"P": 42.0,
"P_unit": "ppm",
"P_method": "MEHLICH_3"
}
A pH value with no known method or unit produces just "pH": 6.4.
Common analytes
| Analyte | Property | Typical units |
|---|
| pH | pH | (unitless) |
| Organic matter | OM | % |
| Phosphorus | P | ppm |
| Potassium | K | ppm |
| Calcium | Ca | ppm |
| Magnesium | Mg | ppm |
| Cation exchange capacity | CEC | meq/100g |
| Buffer pH | BpH | (unitless) |
| Nitrate-nitrogen | NO3_N | ppm |
The full set of analytes depends on the input format and lab. Leaf normalizes over 750 column name variations into approximately 70 standard analyte properties.
GeoJSON example
A two-sample FeatureCollection from a shapefile with Mehlich-3 extraction:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-89.4523, 40.1234]
},
"properties": {
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"eventDate": "2025-10-15",
"sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654321",
"sampleNumber": "1",
"depthId": "d1a2b3c4-e5f6-7890-abcd-111111111111",
"growerName": "Smith Farms",
"farmName": "North 40",
"fieldName": "Section 12",
"pH": 6.4,
"OM": 3.2,
"OM_unit": "%",
"P": 42.0,
"P_unit": "ppm",
"P_method": "MEHLICH_3",
"K": 185.0,
"K_unit": "ppm",
"K_method": "MEHLICH_3",
"CEC": 14.2,
"CEC_unit": "meq/100g"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-89.4531, 40.1242]
},
"properties": {
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"eventDate": "2025-10-15",
"sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654322",
"sampleNumber": "2",
"depthId": "d1a2b3c4-e5f6-7890-abcd-222222222222",
"growerName": "Smith Farms",
"farmName": "North 40",
"fieldName": "Section 12",
"pH": 6.8,
"OM": 2.9,
"OM_unit": "%",
"P": 38.0,
"P_unit": "ppm",
"P_method": "MEHLICH_3",
"K": 210.0,
"K_unit": "ppm",
"K_method": "MEHLICH_3",
"CEC": 15.1,
"CEC_unit": "meq/100g"
}
}
]
}
Canonical JSON structure
The canonical JSON output is an array of SoilSamplingEvent objects. Each event represents one sampling trip, date, or lab report from the source file.
SoilSamplingEvent
├── field_context (grower, farm, field names and IDs)
├── lab (lab name, received/processed dates)
├── provenance (source file, format family, converter version)
├── samples[]
│ ├── geometry (GeoJSON Point or Polygon)
│ └── depths[]
│ └── results[] (analyte, value, unit, method, category)
└── recommendations[] (fertilizer recommendations, when present)
Each AnalyteResult in the canonical format includes a category field that classifies the measurement:
| Category | Meaning | Examples |
|---|
analyte | Direct lab measurement | pH, P, K, Ca, Mg, OM |
index | Calculated index | P-Index, K-Index |
derived | Ratio or derived value | BS-Ca, BS-K, SAR |
sensor | Field instrument reading | EC (Veris), Red, IR |
passthrough | Unrecognized column preserved as-is | Varies |
The provenance block on every event records exactly which source file and converter produced the output. Useful for auditing and tracing data lineage.
File size limits
| Limit | Value |
|---|
| Maximum file size | 50 MB per file |
| Maximum request size | 200 MB per request |
What to do next