> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withleaf.io/llms.txt
> Use this file to discover all available pages before exploring further.

# File Conversion

> How Leaf converts raw machine files into standard GeoJSON or GeoParquet, including pipeline stages, status tracking, summaries, and cleanup rules.

This page covers what happens after Leaf receives a machine data file, whether from a provider connection or a manual upload. Every file passes through the same conversion pipeline, producing standardized point data and a summary.

## Pipeline stages

Each file moves through these stages in order:

1. **originalFile** — The raw proprietary file as received. Stored for reference.
2. **rawGeojson** — The proprietary format is parsed into Leaf's raw GeoJSON representation.
3. **standardGeojson** — The point data is standardized to Leaf's public schema and units. This is the primary point-level output returned by the file resource.
4. **filteredGeojson** — If filtered output is available, Leaf exposes a filtered point dataset as a separate public artifact rather than as a cleanup step name.
5. **summary** — Aggregate statistics (avg, min, max, totals) are calculated from the point data. Includes a geometry representing the spatial coverage.
6. **units** — A map of property names to their units for the file.
7. **propertiesPNGs** — PNG images generated from numeric properties.
8. **zippedPNGs** — A zip bundle containing the generated PNG images.

Each stage runs independently and has its own status.

## Tracking file status

Use `GET /files/{id}/status` to check where a file is in the pipeline. The response is a map keyed by the public step names returned by the API:

```json theme={null}
{
  "originalFile":     { "status": "processed", "message": "ok" },
  "rawGeojson":       { "status": "processed", "message": "ok" },
  "standardGeojson":  { "status": "processed", "message": "ok" },
  "filteredGeojson":  { "status": "processed", "message": "ok" },
  "propertiesPNGs":   { "status": "processed", "message": "ok" },
  "zippedPNGs":       { "status": "processed", "message": "ok" },
  "summary":          { "status": "processed", "message": "ok" },
  "units":            { "status": "processed", "message": "ok" }
}
```

Some keys may be absent when a file has not reached that step or when that output is not produced for the file.

Possible status values: `processed`, `failed`, `skipped`.

If a stage fails, the `message` field contains details about what went wrong.

## File metadata

A converted file (`GET /files/{id}`) includes:

| Field                     | Description                                                              |
| ------------------------- | ------------------------------------------------------------------------ |
| `id`                      | Unique file ID                                                           |
| `leafUserId`              | Owner Leaf user                                                          |
| `provider`                | Source provider (or `Leaf` for manual uploads)                           |
| `fileFormat`              | Original format (e.g., `AGDATA`, `CN1`, `ISO11783`, `SHAPEFILE`)         |
| `fileName`                | Original file name                                                       |
| `operationType`           | `planted`, `harvested`, `applied`, or `tillage`                          |
| `downloadOriginalFile`    | Authenticated download URL for the original proprietary file             |
| `downloadStandardGeojson` | Authenticated download URL for the standardized data                     |
| `summary`                 | Embedded summary object with aggregate stats and geometry                |
| `fields`                  | Field IDs this file has been matched to                                  |
| `sourceFiles`             | If this file was created by merging, the IDs of the source machine files |
| `batchId`                 | If uploaded via batch API, the batch ID                                  |

<Warning>
  Always use the `download`-prefixed URLs (e.g., `downloadStandardGeojson`) for file downloads. These point to `api.withleaf.io` and require authentication. Direct S3 URLs are being deprecated.
</Warning>

## File summary

The summary (`GET /files/{id}/summary`) is a GeoJSON Feature with aggregate properties and a geometry representing the spatial coverage of the operation. The properties vary by operation type.

Common properties across all types:

| Property                | Description                                     |
| ----------------------- | ----------------------------------------------- |
| `operationType`         | `planted`, `harvested`, `applied`, or `tillage` |
| `startTime` / `endTime` | Time range of the operation                     |
| `totalArea`             | Total area covered                              |
| `totalDistance`         | Total distance traveled                         |
| `elevation`             | Elevation statistics                            |
| `speed`                 | Speed statistics                                |
| `crop`                  | Crop type(s)                                    |
| `machinery`             | Machine and implement info                      |
| `originalOperationType` | The operation type as reported by the provider  |
| `totalFuelUsed`         | Total fuel consumed (when available)            |

The summary geometry is built from a buffer of the operation points, creating a polygon that approximates the coverage area.

## Data cleanup

When `cleanupStandardGeojson` is enabled, Leaf removes points that fail these validation rules:

| Property             | Valid when         |
| -------------------- | ------------------ |
| `wetMass`            | > 0.0              |
| `wetMassPerArea`     | > 0.0              |
| `wetVolume`          | > 0.0              |
| `wetVolumePerArea`   | > 0.0              |
| `harvestMoisture`    | > 0.0 and \< 100.0 |
| `appliedRate`        | > 0.0              |
| `seedRate`           | > 0.0              |
| `tillageDepthActual` | >= 0.0             |
| `recordingStatus`    | = "On"             |
| `crop`               | != "unknown"       |
| `products`           | >= 0.0             |

You can customize which rules apply and their thresholds using the `cleanupRules` configuration.

## Filtered GeoJSON and outlier removal

If `operationsFilteredGeojson` is enabled, Leaf produces an additional filtered version of the data. The filter removes:

* Points with `speed` less than 0.5 m/s (all operation types)

For harvest data, outlier removal can also be applied. Points where the harvested volume is more than 3 standard deviations from the mean are excluded. This threshold is configurable via `operationsOutliersLimit`. Disable outlier removal entirely with `operationsRemoveOutliers`.

At the operation level, the filtered GeoJSON is used as the basis for generating V2 images, which use a fixed color ramp with 7 quantile-based classes. See [Field Operations](/machine-data/field-operations) for details on operation images.

## Processing timing

Files from provider connections process immediately on first sync, then at least every 24 hours. Event-driven providers trigger processing sooner.

Manually uploaded files begin processing as soon as Leaf receives the upload.

Processing time depends on data volume. Expect initial results within a few minutes.

<Note>
  Leaf archives files to slower storage after 180 days of no access. Contact support if you need to retrieve archived files or require a different retention period.
</Note>

## What to do next

* [Field Operations](/machine-data/field-operations) — How converted files are merged into field operations.
* [Sample Output](/machine-data/sample-output) — Example file and operation responses.
* [Units](/machine-data/units) — Unit reference for all numeric properties.
