| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
suggested by
https://github.com/sabotage-linux/gettext-tiny/commit/fa416e2e897614314abcb711069ad19af2a44354#r36914796.
|
|
|
|
|
|
|
|
|
|
|
| |
this is one reason of https://github.com/sabotage-linux/gettext-tiny/issues/50.
One line is generally short, and [8192] is big enough for our usage. But
after cmake invokes msgmerge, lines are joined. So we printf some super
long lines into po files.
And again, cmake invokes msgfmt to use these updated po files. So we
meet these super long lines.
|
|
|
|
|
|
|
|
|
| |
poparser allocated one single big buffer for all string buffers, and
this actually makes debugging difficult.
When buffer overflowed, i can not tell which buffer is damaged directly,
since then're actually different parts of one single buffer. So, let's
alloc space seperately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this is the second patch. It mainly changes the working method of
feed_line().
Previously, feed_line() assumes the input is a complete whole line. Also
it uses the buffer passed by init() to do iconv things.
Now, feed_line() assumes nothing about the input. It accumulates strings
until a complete line is feeded. A postive return value says the
poparser is expecting more strings and it's not an error.
But after when you feed all the data, poparser should consumes all
tokens exactly, so the last call to feed_line() should return 0.
And that introduce the 2nd issue. Incomplete strings will not be consumed
, so it needs to be stored. Thus, we alloc an internal buffer. When we
do not need to do iconv, we just copy new strings after old strings. Or
let iconv do the copy. This also elimates the need of passing an iconv
buffer to poparser.
now msgfmt and msgmerge use fread. and poparser_feed_line() is renamed as
poparser_feed().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this is the first patch. It mainly solves the case like:
`msgid "ss" "gg"`.
This is a valid case(proved by GNU msgfmt). Just like C compiler, we
should concat strings, treat that string as `"ssgg"`.
Previous feed_line assume there's only one `"xxx"` string in one line.
For example, when it parses `msgid "ss"`, whole line is marked as
parsed, so `"gg"` is ignored.
Now it turns into a loop and will consume exactly the part it parsed, so
that all strings can be consumed.
|
|
|
|
|
|
|
|
|
| |
Actually Makefile install recipe substitutes every occurence of "m4/" in
file name of the target of the rule($@), in an absolute path there could
more than one "m4/" occurence, so install will fail. Let's change
$(subst ...) with $(patsubst ...) substituting only last occurence of
"m4/" pattern.
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
|
|
|
|
|
|
|
| |
follow https://github.com/sabotage-linux/gettext-tiny/issues/47.
'keyword=' is simply ignored. msgfmt will generate the output based on
the template file and other sources. so 'template=' is the input file.
|
|
|
|
|
|
|
|
| |
follow https://github.com/sabotage-linux/gettext-tiny/issues/44.
Similiar to #43. I do think update-mo is good, though most targets
@ismaell requested are just stubs. Since gettext-tiny is not for
maintainer.
|
|
|
|
|
|
|
|
| |
follow https://github.com/sabotage-linux/gettext-tiny/issues/43.
closed https://github.com/sabotage-linux/gettext-tiny/issues/43.
though it's not our goal to support releasing, it's not so much code.
|
|
|
|
|
|
|
| |
Add 'format_arg' attribute for the functions which may return string
as formatted parameter, otherwise it fails to compile with corresponding
compiler checking flag enabled.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
|
|
|
|
|
|
|
|
|
| |
`*) break` will stop parsing immediately after the first argument is
parsed, leaving all other arguments ignored.
To prevent shifting arguments more than we have, we should check if
there's any argument left. Otherwise, script will exit with `can not
shift anymore`.
|
| |
|
|
|
|
|
|
| |
msgfmt only focus on po->mo mode(and a stub for .msg). now as we want to
generate .desktop and .xml too, i'd like to handle these modes
explicitly.
|
|
|
|
| |
it's allowed to have nplurals=1, but with msgid_plural and msgstr[0]
|
|
|
|
| |
i made a exactly wrong check there
|
|
|
|
| |
with the -c argument, msgfmt should check po files more strict.
|
|
|
|
|
|
| |
there're many po files in invalid-form, which will not actually trigger
segfault or what in most cases. gnu msgfmt chose to ignore these issues
without explicit option, so do i.
|
|
|
|
|
|
|
| |
as we will pass raw strings directly at the first stage, we may meet an
escaped '\n', e.g. "\\n". because, there's no '\\' in charset strings,
we can use '\\' as a terminator to avoid taking useless parts. useless
parts will cause a unsupported_charset error.
|
|
|
|
|
|
|
|
|
| |
following
https://github.com/sabotage-linux/gettext-tiny/commit/5539eff5d507c619735156fa9e44e0bcf1436695#diff-9657ea17ae7a6e3b985dc974d613664e.
the previous commit did not fix the issue. because the stage one will
not copy anything into message_t. so this time, raw strings are directly
passed to feed_hdr().
|
|
|
|
|
| |
so we can directly get the problem from the return value of
msgfmt.c:process().
|
|
|
|
|
|
|
| |
i assume a block is completely parsed just before next msgid/ctxt. but the
reality is comment may start a new block too. and if poparser dont treat
comments as the start of new block, those comments will be parsed into
as flags of old block.
|
|
|
|
| |
or the next msg will ineherit flag like PO_FUZZY
|
|
|
|
|
| |
at stage two, it's a buffer allocated by poparser_finish. it will never
be a NULL pointer, so may trigger an abort.
|
|
|
|
|
|
|
|
|
|
|
| |
there're two points:
1. only disable 'first' flag at parse stage. but i want it same at both
two stages when i tried to invoke this flag.
2. disable 'first' flag once after poparser_feed_hdr() was finished.
i originally assume it's consistent in the parse process of a whole
block. this is causing 'msgstr abort', because there's a syntax check
relying on this flag. so now, it keeps consistent in the whole process.
|
|
|
|
|
| |
it should be set to true at the beginning of each stage, or we may
abort.
|
|
|
|
|
|
| |
what we want to do is deleting wrong output files. but if remove() is
successfully executed, msgfmt will return 0 instead of ret. this badly
damaged various cases.
|
|
|
|
|
|
| |
as rofl0r said, unsigned on x86/x64 is 32 bits, and will truncate the
len. though the result of sizeof() should just be a very small number(
12), and truncation will not effect the check result.
|
|
|
|
|
|
| |
as iconv may increase the length of strings, so we may have a larger
length at the parse stage than the first stage. that leads to a memory
corruption.
|
|
|
|
| |
retval of iconv is not length, we need to calc that length by pointers.
|
|
|
|
|
| |
if line_len, e.g. inputbytes_left is not 0, that means the output buf is
large enough.
|
| |
|
|
|
|
| |
we did not add NULL terminator into the length of strings.
|
|
|
|
|
|
|
|
| |
1. y-x is larger than the charset string by 8. we should write to
[y-x-8] instead. it may lead to a memory corruption.
2. though, i've checked before: the maxiumum length of charset string
should be 11. let's avoid meeting a unknown charset, or an invalid one.
|
|
|
|
|
| |
then length of str table does not include the NULL terminator while
translation includes.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
this time, msg->sysdep became a total num of cases simply. just invoke
sysdep() X times with a third argument from 0 to X-1, and you are done.
in short, i move the code of transforming flags to the num of cases from
msgfmt.c to poparser.c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as rofl0r said, maybe the previous sysdep(), is a bit confusing. so, i
modified a new version.
this time, msg->sysdep[] became a bit flag msg->sysdep_flag. it can tell
which kind of sysdep string occurs.
using this infomation, you can tell the total num of cases on your own.
assuming that num is X, you can use new sysdep() to deal with every case
simply by invoking the function X times with a third argument from 0 to
X-1. and you get X different expanded strings.
|
|
|
|
|
| |
mentioned by rofl0r. it's not possible to have a case where lu is both priu32 and 64.
we skip the case by adding one.
|
|
|
|
|
|
|
|
|
|
|
| |
when a string like 'xx %8' appears, strchr will capture a '%'. but it's
not a sysdep string, so variable `x` is not changed. then strchr will
repeat this process over and over again.
now, we first copy the content before '%'(including '%'), and refer x to
the first character after '%'. then we match and replace sysdep strings.
or just continue, since x is now behind '%', strchr wont repeat the
process again.
|
| |
|
|
|
|
|
|
|
| |
mentioned by rofl0r.
if y is shorter than strlen(sysdep_str[n]), it may corrupt the memory.
we better use strncmp instead.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
follow https://github.com/sabotage-linux/gettext-tiny/issues/39#issuecomment-445036044.
it's obvious that, strstr will search for `%<PRIu32>` first, if there's
one, then we get there and skip all other sysdep strings before the first
`%<PRIu32>`. but what we want is, to search the first sysdep string.
so, our new stragegy is to search for `%` instead. such that, we will
always match the first sysdep string.
|
|
|
|
| |
escape will need up to two times larger space.
|
|
|
|
|
|
|
|
|
| |
escape() should handle '\b' and '\a' as well. also, escape() execute
some wrong actions. like transform '\?' into '\\\?', which should be
'\\?'.
specially, we do not need to en/decode ['] into [\\\']. but for safety,
still keep that in unescape().
|
|
|
|
|
| |
as the bufer of msgfmt was overflowed before, we better take a bigger
buffer for msgmerge, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as stated in https://github.com/sabotage-linux/gettext-tiny/issues/39,
the old parser is not good enough to handle all the po files. Similiar
issues occurred over and over again because our dirty hacks in the
project.
so, i propose and implement this new parser, which:
1. needs to parse a po file two times. the first time will acquire
the maximum width of every entry. the second time will copy the
well-prepared contents into struct po_message, and pass it to the
callback function.
2. every struct po_message contains all the information of one
translation: msgid, msgctxt, msgid_plural, and msgstrs. comments may be
added later.
the logic of code is quite simple, nothing special need to explain. the
special points are:
1. the first time, new parser gives no infomation about what the string
is like. neither will the new parser give the exact size(sysdeped), nor
you can calculate the exact size on your own. only xxx_len, strlen, sy-
sdep in po_message_t is available. xxx_len is the length of the
corressponding entry, strlen is almost the same.
2. sysdep present how many cases the string could be expanded to. since
you know the length of the original string and the original string is
always longer than the converted one, you can get a safe buffer size to
work at the second stage.
3. poparser_sysdep(), a function like unescape(), with a bit flag
as the third argument. that is, three bits correspond to st_priu32,
st_priu64, st_priumax. since there're only up to two cases for every
kind of sysdep, you could count from 0 to msg->sysdep-1, and
poparser_sysdep will iterate every possible case eventually.
|
|
|
|
|
| |
unescape only makes the string shorter, so 1 byte for terminator 'NULL'
is enough. also, i tidy the code a little.
|