Skip to content

openpeeps/openparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A tiny collection of high-performance parsers and dumpers
πŸ‘‘ Written in Nim language

nimble install openparser

API reference
Github Actions Github Actions

About

OpenParser is a collection of parsers and dumpers (serializers) for various data formats, written in Nim language. It provides a simple and efficient way to parse and dump data in different formats, such as JSON, TOML, YAML, BSON, CSV, FBE and more

😍 Key Features

  • Parse JSON, CSV, YAML, TOML documents and more
  • BSON encoding and decoding from JsonNode objects
  • DotEnv parser for .env files
  • Fast Binary Encoding and Decoding
  • i18n GNU Gettext PO and MO file parsing and dumping
  • Zero-copy JSON parsing via Memfiles for high performance and low memory usage
  • Direct-to-object parsing for JSON, YAML and TOML
  • CSV zero-copy parsing for large files
  • RSS & Atom feed reader and writer
  • Regex Engine with SIMD acceleration
  • Context-aware error reporting while deserializing data
  • Custom Hooks API for parsing and dumping
  • Scientific notation support
  • Dot notation access for nested data structures

Why?

Initially I wanted to create a simple JSON parser with fine-grained control over the parsing process (jsonl, custom hooks, error reporting, zero-copy tokenization), then I thought it would be fun to add a YAML parser that parses YAML documents in the same way as JSON. Once I started talking with the chatbot I ended up creating a collection of parsers and dumpers for various data formats.

Note

Importing openparser directly will produce a compile-time error, you need to import the specific module for the data format you want to use, e.g. openparser/json for JSON parsing and dumping or openparser/csv for CSV parsing.

Parse JSON

OpenParser provide a simple and efficient module for parsing JSON data using the zero-copy parsing approach, which allows you to parse JSOn data without copying it into memory, making it faster and more memory-efficient.

Note

OpenParser's JSON parser is exporting the std/json module by default.

fromJson string into JsonNode or Nim data structures

Here a simple example taking a stringified JSON and parsing it into a JsonNode tree structure:

import openparser/json

let data = """{"name":"Albush","age":40,"address":{"street":"456 Elm St","city":"Othertown","zip":67890},"friends":[]}"""

let jsonNode: JsonNode = fromJson(data)
echo jsonNode["name"].getStr # Albush
echo jsonNode["age"].getInt # 40

toJson serialize Nim data structures into JSON strings

toJson function allows you to serialize Nim data structures into JSON strings:

import openparser/json

var data = %*{
  "name": "Alice",
  "age": 30,
  "isMember": true,
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "zip": 12345
  },
  "friends": ["Bob"]
}

echo toJson(data) # {"name":"Alice"...}

toJson pretty-printing

A todo for the future is to add support for pretty printing JSON while serializing, which would allow you to generate more human-readable JSON output with indentation and line breaks.

JSON custom hooks

Here an example of how to use a custom parseHook to parse JSON data into Nim types that are not natively supported by the default parser:

import std/times

import openparser/json
import semver

proc parseHook*(parser: var JsonParser, v: var Semver) =
  v = parseVersion(parser.curr.value)
  parser.walk() # move the parser forward after parsing the value

proc parseHook*(parser: var JsonParser, v: var Time) =
  v = parseTime(str, "yyyy-MM-dd'T'hh:mm:ss'.'ffffffz", local())
  parser.walk() # move the parser forward after parsing the value

To determine the field name being parsed in the parseHook, you can use the currentField property available in the JsonParser object. This is a Option[string] that holds the name of the current field being parsed, if available:

if parser.currentField.isSome:
  let fieldName = parser.currentField.get()
  echo "Parsing field: ", fieldName

CSV documents

OpenParser can parse large CSV files efficiently without loading the entire file into memory, making it ideal for processing big datasets.

For example, here will use a ~680MB CSV dataset from Kaggle - TripAdvisor European restaurants that contains around 1 million rows and 42 columns.

import openparser/csv

var i = 0
let t = cpuTime()
parseFile("tripadvisor_european_restaurants.csv",
  proc(fields: openArray[CsvFieldSlice], row: int): bool =
    inc i
    for field in fields:
      discard # do something with the fields, e.g. print them
    true
)

let elapsed = cpuTime() - t

echo "Parsed ", i, " rows in ", elapsed, " seconds"
# ~0.783363 seconds on my machine
# memory usage should be minimal due to zero-copy parsing with memfiles

Parse YAML

Parse YAML documents into a YamlNode tree structure or directly into Nim data structures using custom hooks, similar to JSON parsing and dumping.

import openparser/yaml
let yamlData = """
name: Alice
age: 30
isMember: true
address:
  street: 123 Main St
  city: Anytown
  zip: 12345
friends:
  - Bob
  - Charlie
"""

let yamlNode: YamlNode = fromYaml(yamlData)
let yamlNode2: Person = fromYaml(yamlData, Person) # using custom hooks to parse directly into Nim data structures

TOML Documents

Another work-in-progress parser and dumper module, this one provides support for working with TOML documents. It parses the TOML input into a TomlNode tree structure or directly into Nim data structures using custom hooks.

SIMD-accelerated Regex engine

OpenParser includes a regex engine that provides support for regular expresion matching and searching, with SIMD acceleration for improved performance.

import openparser/regex
echo regex.match("hello world", "hello") # true

BSON encoding and decoding

You can combine OpenParser's JSON parsing capabilities with BSON encoding and decoding to efficiently convert between JSON and BSON formats

import openparser/[json, bson]

# Convert JSON to BSON
let jsonData = """{"name":"Alice","age":30,"isMember":true}"""
let bsonDoc: seq[byte] = fromJson(jsonData).toBson()

# To convert BSON back to JSON
let jsonNode: JsonNode = fromBson(bsonDoc)
echo jsonNode["name"].getStr # Alice

Error Reporting

Most of the included parsers provide context-aware error reporting, including a snippet of the data around the error location, making it easier to identify and fix issues in the JSON input.

JSON error reporting example

{"name":"Alice","age":"isMember":true}
                                ^
Error (1:33) Unexpected token `:`

Roadmap

  • JSON depth/size limit to prevent DoS attacks
  • JSON schema validation support
  • JSON skippable fields
  • JSON custom field mapping

Note

Some implementations are made with the chatbot (dotenv, fbe, gettext) and may be buggy or incomplete, contributions are welcome to improve them!

❀ Contributions & Support

🎩 License

MIT license. Made by Humans from OpenPeeps.
Copyright OpenPeeps & Contributors β€” All rights reserved.

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Generated from openpeeps/pistachio