Skip to main content
Leaf’s Soil Sampling service accepts soil lab data in over 30 file formats and normalizes it into a standard canonical format. You upload .zip archives containing shapefiles, CSVs, XML reports, or proprietary formats. Leaf identifies the format, extracts the soil data, and returns a flat GeoJSON result plus a canonical JSON result when that output is available.
The Soil Sampling service is currently available by invitation only.

How it works

The service uses an asynchronous batch model. You upload one or more files, then poll for results.
  1. Upload one or more .zip files to the batch endpoint along with a Leaf user ID. Each file becomes an entry within the batch.
  2. Processing. Leaf classifies the file format, extracts soil sample data, normalizes analyte names and units, and produces both output formats.
  3. Retrieval. Poll the batch status endpoint. When an entry reaches COMPLETED, downloadStandardGeojson contains the flat GeoJSON result URL. downloadCanonicalJson contains the hierarchical result URL when that output is available.
Most files complete processing within a few minutes. Poll every 5 seconds for the first minute, then back off to every 15–30 seconds. Stop when the batch status is COMPLETED, PARTIALLY_COMPLETED, or FAILED.

Key concepts

A batch is a container for one or more uploaded files, created by a single POST request. Its status reflects the aggregate state of all entries. An entry is one file within a batch. Each entry is processed independently. A batch with three files has three entries, each of which may complete or fail on its own. Entries move through PROCESSINGCOMPLETED or FAILED. Batches follow the same pattern, plus PARTIALLY_COMPLETED when some entries succeed and others fail.
StatusLevelMeaning
PROCESSINGEntry / BatchUpload received, conversion in progress
COMPLETEDEntry / BatchAll entries converted successfully
PARTIALLY_COMPLETEDBatch onlySome entries completed, some failed
FAILEDEntry / BatchConversion failed (check errorMessage)

Output formats

Completed entries return result URLs in the API response. All entries that reach COMPLETED will have a downloadStandardGeojson URL. Most supported formats also produce downloadCanonicalJson; for formats that don’t, the field is null. downloadStandardGeojson is a flat GeoJSON FeatureCollection. Each Feature represents one soil sample at one depth. A sample with multiple depth layers (e.g., surface + subsoil) produces multiple Features sharing the same sampleId. This format works well for mapping, GIS tools, and spatial queries. When present, downloadCanonicalJson is the full hierarchical data model. It contains an array of SoilSamplingEvent objects preserving the natural tree structure: event → samples → depth layers → analyte results. It includes context not present in the GeoJSON: lab information, provenance (source file, format family, converter version), fertilizer recommendations, and category classification for each analyte result. Use this format when you need the complete data model or when you’re building data pipelines that benefit from structured nesting.

GeoJSON properties

PropertyTypeDescription
eventIdstringUnique identifier for the sampling event
eventDatestringSampling date (YYYY-MM-DD), or null
eventCodestringLab report number or job ID, or null
sampleIdstringUnique identifier for this sample
sampleNumberstringLab sample number (e.g. “1”, “A-1”), or null
depthLabelstringHuman-readable depth (e.g. “0-6 in”), or null
depthTopnumberTop of sampling depth, or null
depthBottomnumberBottom of sampling depth, or null
depthUnitstringDepth unit (“in” or “cm”), or null
growerNamestringGrower name, or null
farmNamestringFarm name, or null
fieldNamestringField name, or null
Depth fields are all null when the source data does not specify sampling depth. Field context properties (growerName, farmName, fieldName) are omitted entirely when the source has no field metadata.

Analyte properties

Each analyte result adds up to three properties per Feature:
PatternTypeDescription
{analyte}numberThe measured value (e.g. pH, P, K, OM)
{analyte}_unitstringUnit of measurement, present when known
{analyte}_methodstringExtraction method, present when known
A Mehlich-3 phosphorus result at 42 ppm produces:
{
  "P": 42.0,
  "P_unit": "ppm",
  "P_method": "MEHLICH_3"
}
A pH value with no known method or unit produces just "pH": 6.4.

Common analytes

AnalytePropertyTypical units
pHpH(unitless)
Organic matterOM%
PhosphorusPppm
PotassiumKppm
CalciumCappm
MagnesiumMgppm
Cation exchange capacityCECmeq/100g
Buffer pHBpH(unitless)
Nitrate-nitrogenNO3_Nppm
The full set of analytes depends on the input format and lab. Leaf normalizes over 750 column name variations into approximately 70 standard analyte properties.

GeoJSON example

A two-sample FeatureCollection from a shapefile with Mehlich-3 extraction:
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-89.4523, 40.1234]
      },
      "properties": {
        "eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        "eventDate": "2025-10-15",
        "sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654321",
        "sampleNumber": "1",
        "depthId": "d1a2b3c4-e5f6-7890-abcd-111111111111",
        "growerName": "Smith Farms",
        "farmName": "North 40",
        "fieldName": "Section 12",
        "pH": 6.4,
        "OM": 3.2,
        "OM_unit": "%",
        "P": 42.0,
        "P_unit": "ppm",
        "P_method": "MEHLICH_3",
        "K": 185.0,
        "K_unit": "ppm",
        "K_method": "MEHLICH_3",
        "CEC": 14.2,
        "CEC_unit": "meq/100g"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-89.4531, 40.1242]
      },
      "properties": {
        "eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        "eventDate": "2025-10-15",
        "sampleId": "f1e2d3c4-b5a6-7890-fedc-ba0987654322",
        "sampleNumber": "2",
        "depthId": "d1a2b3c4-e5f6-7890-abcd-222222222222",
        "growerName": "Smith Farms",
        "farmName": "North 40",
        "fieldName": "Section 12",
        "pH": 6.8,
        "OM": 2.9,
        "OM_unit": "%",
        "P": 38.0,
        "P_unit": "ppm",
        "P_method": "MEHLICH_3",
        "K": 210.0,
        "K_unit": "ppm",
        "K_method": "MEHLICH_3",
        "CEC": 15.1,
        "CEC_unit": "meq/100g"
      }
    }
  ]
}

Canonical JSON structure

The canonical JSON output is an array of SoilSamplingEvent objects. Each event represents one sampling trip, date, or lab report from the source file.
SoilSamplingEvent
├── field_context     (grower, farm, field names and IDs)
├── lab               (lab name, received/processed dates)
├── provenance        (source file, format family, converter version)
├── samples[]
│   ├── geometry      (GeoJSON Point or Polygon)
│   └── depths[]
│       └── results[] (analyte, value, unit, method, category)
└── recommendations[] (fertilizer recommendations, when present)
Each AnalyteResult in the canonical format includes a category field that classifies the measurement:
CategoryMeaningExamples
analyteDirect lab measurementpH, P, K, Ca, Mg, OM
indexCalculated indexP-Index, K-Index
derivedRatio or derived valueBS-Ca, BS-K, SAR
sensorField instrument readingEC (Veris), Red, IR
passthroughUnrecognized column preserved as-isVaries
The provenance block on every event records exactly which source file and converter produced the output. Useful for auditing and tracing data lineage.

File size limits

LimitValue
Maximum file size50 MB per file
Maximum request size200 MB per request

What to do next

Last modified on April 17, 2026