Date: prev next · Thread: first prev next last
2016 Archives by date, by thread · List index


Hello,

I do a fair bit of work where I move data between LibreOffice, MySQL,
Vertica, Google Docs, Hadoop, Python, and a few other systems. The
formatting of TSV files is ad-hoc. Each system has little differences in
how strings are escaped, and similar. In addition, there is no way to
preserve metadata.

I drafted a modest proposed spec for standardizing TSV files by
standardizing types, and adding metadata, and was hoping to solicit
feedback on that proposal:

http://www.tsvx.org/

I'm trying to maintain the parts of TSV which make it great -- simplicity,
human-readability, and rapid single-pass parsing, but add enough structure
to eliminate all the scripting that goes on when moving data between
systems, as well as to eliminate some of the brittleness (TSV files break
if a column is added, and one-pass parsing breaks if an unexpected type is
found 10GB down).

Since this touches closely on LibreOffice, and if it becomes standards,
it's something we'd all have to live with, I was hoping to solicit some
feedback on this from LibreOffice developers.

github issues (https://github.com/pmitros/tsvx/issues) are the preferred
way of communicating, but I'll monitor this thread, and personal email is
okay as well.

Piotr

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.