delta/gettext-tiny.git - github.com: sabotage-linux/gettext-tiny.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	poparser: missing free of internal buffersnewpoparser	xhe	2020-01-23	1	-0/+2
\|
*	poparser: remove unnecessary pointer casts	xhe	2020-01-23	1	-4/+4
\| \| \| \| \|	suggested by https://github.com/sabotage-linux/gettext-tiny/commit/fa416e2e897614314abcb711069ad19af2a44354#r36914796.
*	msgmerge: avoid printing too long line	xhe	2020-01-23	1	-6/+21
\| \| \| \| \| \| \| \| \| \| \|	this is one reason of https://github.com/sabotage-linux/gettext-tiny/issues/50. One line is generally short, and [8192] is big enough for our usage. But after cmake invokes msgmerge, lines are joined. So we printf some super long lines into po files. And again, cmake invokes msgfmt to use these updated po files. So we meet these super long lines.
*	poparser: seperate malloc for buffers	xhe	2020-01-22	1	-6/+13
\| \| \| \| \| \| \| \| \|	poparser allocated one single big buffer for all string buffers, and this actually makes debugging difficult. When buffer overflowed, i can not tell which buffer is damaged directly, since then're actually different parts of one single buffer. So, let's alloc space seperately.
*	poparser: support feeds anything[2/2]	xhe	2020-01-22	4	-245/+291
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this is the second patch. It mainly changes the working method of feed_line(). Previously, feed_line() assumes the input is a complete whole line. Also it uses the buffer passed by init() to do iconv things. Now, feed_line() assumes nothing about the input. It accumulates strings until a complete line is feeded. A postive return value says the poparser is expecting more strings and it's not an error. But after when you feed all the data, poparser should consumes all tokens exactly, so the last call to feed_line() should return 0. And that introduce the 2nd issue. Incomplete strings will not be consumed , so it needs to be stored. Thus, we alloc an internal buffer. When we do not need to do iconv, we just copy new strings after old strings. Or let iconv do the copy. This also elimates the need of passing an iconv buffer to poparser. now msgfmt and msgmerge use fread. and poparser_feed_line() is renamed as poparser_feed().
*	poparser: support feeds anything[1/2]	xhe	2020-01-22	1	-196/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this is the first patch. It mainly solves the case like: `msgid "ss" "gg"`. This is a valid case(proved by GNU msgfmt). Just like C compiler, we should concat strings, treat that string as `"ssgg"`. Previous feed_line assume there's only one `"xxx"` string in one line. For example, when it parses `msgid "ss"`, whole line is marked as parsed, so `"gg"` is ignored. Now it turns into a loop and will consume exactly the part it parsed, so that all strings can be consumed.
*	Makefile: fix install failure if path contains "m4/" string (#49)	Giulio Benetti	2020-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	Actually Makefile install recipe substitutes every occurence of "m4/" in file name of the target of the rule($@), in an absolute path there could more than one "m4/" occurence, so install will fail. Let's change $(subst ...) with $(patsubst ...) substituting only last occurence of "m4/" pattern. Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
*	msgfmt: support keyword, template, output options	xhe	2019-11-30	1	-0/+6
\| \| \| \| \| \| \|	follow https://github.com/sabotage-linux/gettext-tiny/issues/47. 'keyword=' is simply ignored. msgfmt will generate the output based on the template file and other sources. so 'template=' is the input file.
*	data/Makefile.in: add missing maintainer targets	xhe	2019-06-29	1	-11/+42
\| \| \| \| \| \| \| \|	follow https://github.com/sabotage-linux/gettext-tiny/issues/44. Similiar to #43. I do think update-mo is good, though most targets @ismaell requested are just stubs. Since gettext-tiny is not for maintainer.
*	autopoint: support dist target to release tarballs	xhe	2019-06-28	1	-2/+37
\| \| \| \| \| \| \| \|	follow https://github.com/sabotage-linux/gettext-tiny/issues/43. closed https://github.com/sabotage-linux/gettext-tiny/issues/43. though it's not our goal to support releasing, it's not so much code.
*	gettext-tiny: Fix format not a string literal error (#41)	Vadim Kochan	2019-04-06	1	-6/+19
\| \| \| \| \| \| \|	Add 'format_arg' attribute for the functions which may return string as formatted parameter, otherwise it fails to compile with corresponding compiler checking flag enabled. Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
*	xgettext: parse arguments correctly	xhe	2019-04-05	1	-2/+1
\| \| \| \| \| \| \| \| \|	`*) break` will stop parsing immediately after the first argument is parsed, leaving all other arguments ignored. To prevent shifting arguments more than we have, we should check if there's any argument left. Otherwise, script will exit with `can not shift anymore`.
*	msgfmt: missing breakstubmode	xhe	2019-02-05	1	-1/+2
\|
*	msgfmt: clean code, prepare to handle other modes	xhe	2019-02-05	1	-42/+94
\| \| \| \| \| \|	msgfmt only focus on po->mo mode(and a stub for .msg). now as we want to generate .desktop and .xml too, i'd like to handle these modes explicitly.
*	poparser: not check overflow for msgid_pluralnew_merge	xhe	2019-01-16	1	-4/+0
\| \| \| \|	it's allowed to have nplurals=1, but with msgid_plural and msgstr[0]
*	poparser: msgid_plural is with nplurals > 2	xhe	2019-01-16	1	-1/+1
\| \| \| \|	i made a exactly wrong check there
*	msgfmt: enable strict mode with -c argument	xhe	2019-01-16	2	-5/+13
\| \| \| \|	with the -c argument, msgfmt should check po files more strict.
*	poparser: default to non-strict mode	xhe	2019-01-16	2	-2/+3
\| \| \| \| \| \|	there're many po files in invalid-form, which will not actually trigger segfault or what in most cases. gnu msgfmt chose to ignore these issues without explicit option, so do i.
*	poparser: charset str should stop at escape chars	xhe	2019-01-16	1	-1/+1
\| \| \| \| \| \| \|	as we will pass raw strings directly at the first stage, we may meet an escaped '\n', e.g. "\\n". because, there's no '\\' in charset strings, we can use '\\' as a terminator to avoid taking useless parts. useless parts will cause a unsupported_charset error.
*	poparser: convert codecs at both two stages	xhe	2019-01-16	1	-13/+12
\| \| \| \| \| \| \| \| \|	following https://github.com/sabotage-linux/gettext-tiny/commit/5539eff5d507c619735156fa9e44e0bcf1436695#diff-9657ea17ae7a6e3b985dc974d613664e. the previous commit did not fix the issue. because the stage one will not copy anything into message_t. so this time, raw strings are directly passed to feed_hdr().
*	poparser: add po_error_last to po_error type	xhe	2019-01-16	2	-1/+2
\| \| \| \| \|	so we can directly get the problem from the return value of msgfmt.c:process().
*	poparser: comment may start a new block too	xhe	2019-01-16	1	-0/+5
\| \| \| \| \| \| \|	i assume a block is completely parsed just before next msgid/ctxt. but the reality is comment may start a new block too. and if poparser dont treat comments as the start of new block, those comments will be parsed into as flags of old block.
*	poparser: clean msg comment flags	xhe	2019-01-16	1	-0/+1
\| \| \| \|	or the next msg will ineherit flag like PO_FUZZY
*	poparser: should check length instead of pointer	xhe	2019-01-16	1	-1/+1
\| \| \| \| \|	at stage two, it's a buffer allocated by poparser_finish. it will never be a NULL pointer, so may trigger an abort.
*	poparser: disable 'first' flag after a msgblock	xhe	2019-01-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	there're two points: 1. only disable 'first' flag at parse stage. but i want it same at both two stages when i tried to invoke this flag. 2. disable 'first' flag once after poparser_feed_hdr() was finished. i originally assume it's consistent in the parse process of a whole block. this is causing 'msgstr abort', because there's a syntax check relying on this flag. so now, it keeps consistent in the whole process.
*	poparser: enable 'first' flag at both two stages	xhe	2019-01-16	1	-0/+1
\| \| \| \| \|	it should be set to true at the beginning of each stage, or we may abort.
*	msgfmt: do not return the result of remove()	xhe	2019-01-16	1	-3/+1
\| \| \| \| \| \|	what we want to do is deleting wrong output files. but if remove() is successfully executed, msgfmt will return 0 instead of ret. this badly damaged various cases.
*	poparser: avoid 32-bit truncation by unsigned	xhe	2019-01-16	1	-1/+2
\| \| \| \| \| \|	as rofl0r said, unsigned on x86/x64 is 32 bits, and will truncate the len. though the result of sizeof() should just be a very small number( 12), and truncation will not effect the check result.
*	poparser: convert codecs at both two stages	xhe	2019-01-16	1	-4/+4
\| \| \| \| \| \|	as iconv may increase the length of strings, so we may have a larger length at the parse stage than the first stage. that leads to a memory corruption.
*	poparser: set correct length after parsing codecs	xhe	2019-01-16	1	-2/+5
\| \| \| \|	retval of iconv is not length, we need to calc that length by pointers.
*	poparser: return if the buf is too small	xhe	2019-01-16	1	-0/+6
\| \| \| \| \|	if line_len, e.g. inputbytes_left is not 0, that means the output buf is large enough.
*	poparser: avoid comparisons of different signs	xhe	2019-01-16	2	-3/+3
\|
*	poparser: avoid memory corruption by terminator	xhe	2019-01-16	1	-1/+1
\| \| \| \|	we did not add NULL terminator into the length of strings.
*	poparser: avoid invalid memory access	xhe	2019-01-16	1	-1/+5
\| \| \| \| \| \| \| \|	1. y-x is larger than the charset string by 8. we should write to [y-x-8] instead. it may lead to a memory corruption. 2. though, i've checked before: the maxiumum length of charset string should be 11. let's avoid meeting a unknown charset, or an invalid one.
*	msgfmt&poparser: correct length at str table	xhe	2019-01-16	3	-10/+11
\| \| \| \| \|	then length of str table does not include the NULL terminator while translation includes.
*	msgfmt: add missing messages count variable	xhe	2019-01-16	1	-0/+2
\|
*	msgfmt: add missing '+' operator	xhe	2019-01-16	1	-1/+1
\|
*	poparser: simplified sysdep() more	xhe	2019-01-16	3	-25/+26
\| \| \| \| \| \| \| \|	this time, msg->sysdep became a total num of cases simply. just invoke sysdep() X times with a third argument from 0 to X-1, and you are done. in short, i move the code of transforming flags to the num of cases from msgfmt.c to poparser.c
*	poparser: a more easy-to-understand sysdep()	xhe	2019-01-16	3	-53/+52
\| \| \| \| \| \| \| \| \| \| \| \| \|	as rofl0r said, maybe the previous sysdep(), is a bit confusing. so, i modified a new version. this time, msg->sysdep[] became a bit flag msg->sysdep_flag. it can tell which kind of sysdep string occurs. using this infomation, you can tell the total num of cases on your own. assuming that num is X, you can use new sysdep() to deal with every case simply by invoking the function X times with a third argument from 0 to X-1. and you get X different expanded strings.
*	msgfmt: skip impossible cases	xhe	2019-01-16	1	-0/+5
\| \| \| \| \|	mentioned by rofl0r. it's not possible to have a case where lu is both priu32 and 64. we skip the case by adding one.
*	poparpser: avoid endless loop	xhe	2019-01-16	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	when a string like 'xx %8' appears, strchr will capture a '%'. but it's not a sysdep string, so variable `x` is not changed. then strchr will repeat this process over and over again. now, we first copy the content before '%'(including '%'), and refer x to the first character after '%'. then we match and replace sysdep strings. or just continue, since x is now behind '%', strchr wont repeat the process again.
*	poparser: optimize by saving strlen() results	xhe	2019-01-16	1	-6/+10
\|
*	poparser: avoid invalid memory access	xhe	2019-01-16	1	-1/+1
\| \| \| \| \| \| \|	mentioned by rofl0r. if y is shorter than strlen(sysdep_str[n]), it may corrupt the memory. we better use strncmp instead.
*	msgmerge: pretty the output with breaklines	xhe	2019-01-16	1	-0/+2
\|
*	poparser: not to skip sysdeps other than %<PRIu32>	xhe	2019-01-16	1	-14/+18
\| \| \| \| \| \| \| \| \| \| \|	follow https://github.com/sabotage-linux/gettext-tiny/issues/39#issuecomment-445036044. it's obvious that, strstr will search for `%<PRIu32>` first, if there's one, then we get there and skip all other sysdep strings before the first `%<PRIu32>`. but what we want is, to search the first sysdep string. so, our new stragegy is to search for `%` instead. such that, we will always match the first sysdep string.
*	msgmerge: give enough space for escape()	xhe	2019-01-16	1	-0/+2
\| \| \| \|	escape will need up to two times larger space.
*	stringescape: add missing escape chars to escape()	xhe	2019-01-16	1	-4/+14
\| \| \| \| \| \| \| \| \|	escape() should handle '\b' and '\a' as well. also, escape() execute some wrong actions. like transform '\?' into '\\\?', which should be '\\?'. specially, we do not need to en/decode ['] into [\\\']. but for safety, still keep that in unescape().
*	msgmerge: use the same size as msgfmt	xhe	2019-01-16	1	-1/+1
\| \| \| \| \|	as the bufer of msgfmt was overflowed before, we better take a bigger buffer for msgmerge, too.
*	complete rewrite of poparser, msgmerge/msgfmt ported	xhe	2019-01-16	4	-629/+669
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	as stated in https://github.com/sabotage-linux/gettext-tiny/issues/39, the old parser is not good enough to handle all the po files. Similiar issues occurred over and over again because our dirty hacks in the project. so, i propose and implement this new parser, which: 1. needs to parse a po file two times. the first time will acquire the maximum width of every entry. the second time will copy the well-prepared contents into struct po_message, and pass it to the callback function. 2. every struct po_message contains all the information of one translation: msgid, msgctxt, msgid_plural, and msgstrs. comments may be added later. the logic of code is quite simple, nothing special need to explain. the special points are: 1. the first time, new parser gives no infomation about what the string is like. neither will the new parser give the exact size(sysdeped), nor you can calculate the exact size on your own. only xxx_len, strlen, sy- sdep in po_message_t is available. xxx_len is the length of the corressponding entry, strlen is almost the same. 2. sysdep present how many cases the string could be expanded to. since you know the length of the original string and the original string is always longer than the converted one, you can get a safe buffer size to work at the second stage. 3. poparser_sysdep(), a function like unescape(), with a bit flag as the third argument. that is, three bits correspond to st_priu32, st_priu64, st_priumax. since there're only up to two cases for every kind of sysdep, you could count from 0 to msg->sysdep-1, and poparser_sysdep will iterate every possible case eventually.
*	stringescape: correct break condition of unescape	xhe	2019-01-16	2	-5/+7
\| \| \| \| \|	unescape only makes the string shorter, so 1 byte for terminator 'NULL' is enough. also, i tidy the code a little.