What encoding is used? – The Document Foundation Mailing List Archives

julien2412 <serval2412 -AT- yahoo.fr>

Sat, 12 Apr 2014 15:32:20 -0700 (PDT)

Hello, The use of cppcheck-htmlreport to convert raw cppcheck reports errors to html fails for some files because of the encodings. Here's an example message: cppcheck/htmlreport/cppcheck-htmlreport", line 287, in <module> content = input_file.read() File "/usr/lib/python2.7/codecs.py", line 296, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 3546: invalid start byte Here's the list of files which give this problem: ./hwpfilter/source/hcode.cxx ./hwpfilter/source/hwpread.cxx ./hwpfilter/source/hbox.h ./hwpfilter/source/formula.cxx ./hwpfilter/source/hwpfile.cxx ./hwpfilter/source/hwpeq.cxx ./chart2/source/view/charttypes/Splines.cxx (was containing 2 "ü" but was detected as iso-8859-1 and not as utf8 by "file -i"), now converted (see http://cgit.freedesktop.org/libreoffice/core/commit/?id=42d494e7925249c36f62206e7268d849437e219d) ./hwpfilter/source/hbox.cxx ./hwpfilter/source/hinfo.cxx I gave a try to ./hwpfilter/source/hinfo.cxx Initial view on vi (Debian testing x86-64, French) 56 /** 57 * ¹®¼Á¤º¸¸¦ ÀÐ¾îµéÀÌ´Â ÇÔ¼ö ( 128 bytes ) 58 * ¹®¼Á¤º¸´Â ÆÄÀÏÀÎ½ÄÁ¤º¸( 30 bytes ) ´ÙÀ½¿¡ À§Ä¡ÇÑ Á¤º¸ÀÌ´Ù. 59 */ 60 bool HWPInfo::Read(HWPFile & hwpf) since README from hwpfilter indicates "Hangul Word Processor" and "Korea", I gave a try with "iconv -f EUC-KR -t utf8 hwpfilter/source/hinfo.cxx > stdout.txt", I retrieved this: 56 /** 57 * 문서정보를 읽어들이는 함수 ( 128 bytes ) 58 * 문서정보는 파일인식정보( 30 bytes ) 다음에 위치한 정보이다. 59 */ 60 bool HWPInfo::Read(HWPFile & hwpf) I gave a try to Google translate which detected the language as Korean (hopefully! :-)) and translated this: "Function to read the document information" which seems ok according to the name of the function. Remark : I don't know what means "( 128 bytes )" or "( 30 bytes)", is it a pb in conversion? Anyway, would this conversion be ok on these files or might we lose some information? Of course, I prefer cppcheck to fail the html conversion of some reports than losing important information in these files. Perhaps too, it's a cppcheck bug or Python bug which should be fixed. Any idea? Julien -- View this message in context: http://nabble.documentfoundation.org/What-encoding-is-used-tp4105106.html Sent from the Dev mailing list archive at Nabble.com.

Context

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.