Hello,
I do a fair bit of work where I move data between LibreOffice, MySQL,
Vertica, Google Docs, Hadoop, Python, and a few other systems. The
formatting of TSV files is ad-hoc. Each system has little differences in
how strings are escaped, and similar. In addition, there is no way to
preserve metadata.
I drafted a modest proposed spec for standardizing TSV files by
standardizing types, and adding metadata, and was hoping to solicit
feedback on that proposal:
http://www.tsvx.org/
I'm trying to maintain the parts of TSV which make it great -- simplicity,
human-readability, and rapid single-pass parsing, but add enough structure
to eliminate all the scripting that goes on when moving data between
systems, as well as to eliminate some of the brittleness (TSV files break
if a column is added, and one-pass parsing breaks if an unexpected type is
found 10GB down).
Since this touches closely on LibreOffice, and if it becomes standards,
it's something we'd all have to live with, I was hoping to solicit some
feedback on this from LibreOffice developers.
github issues (https://github.com/pmitros/tsvx/issues) are the preferred
way of communicating, but I'll monitor this thread, and personal email is
okay as well.
Piotr
Context
- A proposal for standardizing TSV files · Piotr Mitros
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.