GitHub - openpeeps/openparser: OpenPeeps collection of tiny parsers and dumpers: JSON, CSV, TOML, YAML, RSS, BSON, FBE, DotEnv, Gettext (po/mo) aand more!

A tiny collection of high-performance parsers and dumpers
👑 Written in Nim language

nimble install openparser

About

OpenParser is a collection of parsers and dumpers (serializers) for various data formats, written in Nim language. It provides a simple and efficient way to parse and dump data in different formats, such as JSON, TOML, YAML, BSON, CSV, FBE and more

😍 Key Features

Parse JSON, CSV, YAML, TOML documents and more
BSON encoding and decoding from JsonNode objects
DotEnv parser for .env files
Fast Binary Encoding and Decoding
i18n GNU Gettext PO and MO file parsing and dumping
Zero-copy JSON parsing via Memfiles for high performance and low memory usage
Direct-to-object parsing for JSON, YAML and TOML
CSV zero-copy parsing for large files
RSS & Atom feed reader and writer
Regex Engine with SIMD acceleration
Context-aware error reporting while deserializing data
Custom Hooks API for parsing and dumping
Scientific notation support
Dot notation access for nested data structures

Why?

Initially I wanted to create a simple JSON parser with fine-grained control over the parsing process (jsonl, custom hooks, error reporting, zero-copy tokenization), then I thought it would be fun to add a YAML parser that parses YAML documents in the same way as JSON. Once I started talking with the chatbot I ended up creating a collection of parsers and dumpers for various data formats.

Note

Importing openparser directly will produce a compile-time error, you need to import the specific module for the data format you want to use, e.g. openparser/json for JSON parsing and dumping or openparser/csv for CSV parsing.

Parse JSON

OpenParser provide a simple and efficient module for parsing JSON data using the zero-copy parsing approach, which allows you to parse JSOn data without copying it into memory, making it faster and more memory-efficient.

Note

OpenParser's JSON parser is exporting the std/json module by default.

`fromJson` string into JsonNode or Nim data structures

Here a simple example taking a stringified JSON and parsing it into a JsonNode tree structure:

import openparser/json

let data = """{"name":"Albush","age":40,"address":{"street":"456 Elm St","city":"Othertown","zip":67890},"friends":[]}"""

let jsonNode: JsonNode = fromJson(data)
echo jsonNode["name"].getStr # Albush
echo jsonNode["age"].getInt # 40

`toJson` serialize Nim data structures into JSON strings

toJson function allows you to serialize Nim data structures into JSON strings:

import openparser/json

var data = %*{
  "name": "Alice",
  "age": 30,
  "isMember": true,
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "zip": 12345
  },
  "friends": ["Bob"]
}

echo toJson(data) # {"name":"Alice"...}

`toJson` pretty-printing

A todo for the future is to add support for pretty printing JSON while serializing, which would allow you to generate more human-readable JSON output with indentation and line breaks.

JSON custom hooks

Here an example of how to use a custom parseHook to parse JSON data into Nim types that are not natively supported by the default parser:

import std/times

import openparser/json
import semver

proc parseHook*(parser: var JsonParser, v: var Semver) =
  v = parseVersion(parser.curr.value)
  parser.walk() # move the parser forward after parsing the value

proc parseHook*(parser: var JsonParser, v: var Time) =
  v = parseTime(str, "yyyy-MM-dd'T'hh:mm:ss'.'ffffffz", local())
  parser.walk() # move the parser forward after parsing the value

To determine the field name being parsed in the parseHook, you can use the currentField property available in the JsonParser object. This is a Option[string] that holds the name of the current field being parsed, if available:

if parser.currentField.isSome:
  let fieldName = parser.currentField.get()
  echo "Parsing field: ", fieldName

Check the unit tests for JSON parsing and dumping with custom hooks.
JSON API Reference

CSV documents

OpenParser can parse large CSV files efficiently without loading the entire file into memory, making it ideal for processing big datasets.

For example, here will use a ~680MB CSV dataset from Kaggle - TripAdvisor European restaurants that contains around 1 million rows and 42 columns.

import openparser/csv

var i = 0
let t = cpuTime()
parseFile("tripadvisor_european_restaurants.csv",
  proc(fields: openArray[CsvFieldSlice], row: int): bool =
    inc i
    for field in fields:
      discard # do something with the fields, e.g. print them
    true
)

let elapsed = cpuTime() - t

echo "Parsed ", i, " rows in ", elapsed, " seconds"
# ~0.783363 seconds on my machine
# memory usage should be minimal due to zero-copy parsing with memfiles

Check the unit tests for CSV parsing.
CSV API Reference

Parse YAML

Parse YAML documents into a YamlNode tree structure or directly into Nim data structures using custom hooks, similar to JSON parsing and dumping.

import openparser/yaml
let yamlData = """
name: Alice
age: 30
isMember: true
address:
  street: 123 Main St
  city: Anytown
  zip: 12345
friends:
  - Bob
  - Charlie
"""

let yamlNode: YamlNode = fromYaml(yamlData)
let yamlNode2: Person = fromYaml(yamlData, Person) # using custom hooks to parse directly into Nim data structures

Check the unit tests
YAML API Reference

TOML Documents

Another work-in-progress parser and dumper module, this one provides support for working with TOML documents. It parses the TOML input into a TomlNode tree structure or directly into Nim data structures using custom hooks.

SIMD-accelerated Regex engine

OpenParser includes a regex engine that provides support for regular expresion matching and searching, with SIMD acceleration for improved performance.

import openparser/regex
echo regex.match("hello world", "hello") # true

BSON encoding and decoding

You can combine OpenParser's JSON parsing capabilities with BSON encoding and decoding to efficiently convert between JSON and BSON formats

import openparser/[json, bson]

# Convert JSON to BSON
let jsonData = """{"name":"Alice","age":30,"isMember":true}"""
let bsonDoc: seq[byte] = fromJson(jsonData).toBson()

# To convert BSON back to JSON
let jsonNode: JsonNode = fromBson(bsonDoc)
echo jsonNode["name"].getStr # Alice

Check the unit tests
BSON API Reference

Error Reporting

Most of the included parsers provide context-aware error reporting, including a snippet of the data around the error location, making it easier to identify and fix issues in the JSON input.

JSON error reporting example

{"name":"Alice","age":"isMember":true}
                                ^
Error (1:33) Unexpected token `:`

Roadmap

JSON depth/size limit to prevent DoS attacks
JSON schema validation support
JSON skippable fields
JSON custom field mapping

Note

Some implementations are made with the chatbot (dotenv, fbe, gettext) and may be buggy or incomplete, contributions are welcome to improve them!

❤ Contributions & Support

🐛 Found a bug? Create a new Issue
👋 Wanna help? Fork it!
😎 Get €20 in cloud credits from Hetzner

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
openparser.nimble		openparser.nimble

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

😍 Key Features

Why?

Parse JSON

`fromJson` string into JsonNode or Nim data structures

`toJson` serialize Nim data structures into JSON strings

`toJson` pretty-printing

JSON custom hooks

CSV documents

Parse YAML

TOML Documents

SIMD-accelerated Regex engine

BSON encoding and decoding

Error Reporting

JSON error reporting example

Roadmap

❤ Contributions & Support

🎩 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

😍 Key Features

Why?

Parse JSON

fromJson string into JsonNode or Nim data structures

toJson serialize Nim data structures into JSON strings

toJson pretty-printing

JSON custom hooks

CSV documents

Parse YAML

TOML Documents

SIMD-accelerated Regex engine

BSON encoding and decoding

Error Reporting

JSON error reporting example

Roadmap

❤ Contributions & Support

🎩 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`fromJson` string into JsonNode or Nim data structures

`toJson` serialize Nim data structures into JSON strings

`toJson` pretty-printing

Packages