Single post

jump to replies

Why I prefer human-readable file formats

When I say human-readable file format, I'm referring to text-based files that can be opened, read, and understood without the need for any specific software or proprietary interface. They include formats like Markdown, JSON, YAML, INI, TOML, CSV/TSV and even fixed-width text files where the content and its structure are visible, transparent, and editable in a simple text editor [...]

https://adele.pages.casa/md/blog/why-I-prefer-human-readable-file-formats.md

9 visible replies; 1 more reply hidden or not public

back to top
trystimuli , @tryst@imu.li
(open profile)

@adele the generality of the text editor (and unix text processing tools) is very nice, and when working within the current ecosystem i always choose them.

and yet i also find text formats harmful. the vast majority of programming languages, config files, and exchange formats reinforce english-supremacy. different people have different layout preferences - spacing habits, column widths, indentation styles, and in programming naming conventions - why must they compromise on those to collaborate? and those compromises inevitably favor the parties with more power.

so i am intrigued by the possibility of general self-describing binary formats that do not embed privileged perspectives. i suspect a similarly general ecosystem of tools could be built around something with more structure than text (which is, after all, also a binary format), and perhaps be even better.

LisPi , @lispi314@udongein.xyz
(open profile)
@tryst @adele >> you can always inspect your configuration files, data exports, or documentation with nothing more than cat, less, or any basic text editor.

Those programs do a lot of underappreciated work (particularly with obscure codings).

In truth, all those file formats are still unreadable without access to the coding specification and a program compatible with it.

I expect SQLite to remain as inspectable as those for the rest of my lifespan at minimum. I actually have the full specification for it on my computer at this moment, unlike that of Unicode & UTF-8. Nevermind whatever text coding was popular in Japan in 1995.

>> while many proprietary database formats from the same era require archaeological efforts to decode.

The proprietariness is the actual problem. Unlike what salesmen seem to believe, when I hear "proprietary technonology" I don't hear "desirable value added" I hear "maintenance nightmare".

>> Human-readable doesn't mean inefficient. Text-based formats are often surprisingly compact, especially when compressed.

Given it ultimately comes down to representation, not necessarily no (Emacs can edit a gzipped text file transparently). But for the same reason, it doesn't need to be coded as plaintext to have a human-friendly yet efficient representation.

The UNIX tools have massive issues around efficient structured use of data though, and when one attempts to use them to do so anyway one has to introduce multiple redundant steps of serialization & deserialization. That problem has been noted for decades.

> exchange formats reinforce english-supremacy

That is certainly something I noticed.

> so i am intrigued by the possibility of general self-describing binary formats that do not embed privileged perspectives. i suspect a similarly general ecosystem of tools could be built around something with more structure than text (which is, after all, also a binary format), and perhaps be even better.

I share this interest and belief.

I also believe that the supremacy of textfiles and tools for working with them is a result of UNIX's supremacy, since the "everything is a file" paradigm is something it pushed.

There were other object-based systems that could've become just as ubiquitous were circumstances just slightly different and which wouldn't have had the structured data problems UNIX's file bytestream-oriented tools typically have.
Oook , @oook@im-in.space
(open profile)

@tryst If it is your own file/data you write it in the language of your choice.

I don't think there is any way to make a data file language agnostic anyway. It is funny because I just helped my partner to translate a text in 2 other languages for her work and you quickly realize there are concepts in a language that can't be translated literally in another.

There is a reason we are talking about this in english while neither @adele nor I are native english speaker. I kind of hate culture monopolies but it is still convenient to have a few languages we can default to to reach a broader public.

Doğa Armangil , @arma@ieji.de
(open profile)

@adele Your profile says you are into #lowtech so you may want to stick with JSON etc. But otherwise I would suggest taking a look at RDF, which is currently the most comprehensive data representation solution that exists in IT.

RDF covers all bases, from file and messaging formats to databases, which are called "RDF stores" in RDF parlance.

graphdb.ontotext.com/documenta

For anyone who might be wondering what this RDF thing is all of a sudden:

RDF is the most tangible outcome of the Semantic Web.

Doğa Armangil , @arma@ieji.de
(open profile)

@adele oh and I forgot: in-memory RDF datasets.

🔗 rdf.js.org/dataset-spec/

Are there any more bases to cover? I don't think so :hmmyes:

Also this:

#RDF stores don't support transactions, so what to do?

1️⃣ Most applications don't need transactions.
2️⃣ If you absolutely need transactions, my suggestion would be to outsource that aspect to a #blockchain, that's their job. Yet another thing to learn, I know …

Tim Ward ⭐🇪🇺🔶 #FBPE , @TimWardCam@c.im
(open profile)

@adele "Editable in a simple text editor" is a non-trivial additional condition.

Remember those MSVC control files which were in some sort of XML ... but included a checksum such that the file wouldn't load if you edited it unless you managed to discover the checksum algorithm and calculate a new one?