OPNsense Forum

International Forums => Japanese - 日本語 => Topic started by: dotike on March 16, 2015, 05:52:05 am

Title: Status: re-generating EN canonical .pot file
Post by: dotike on March 16, 2015, 05:52:05 am
Re-generating EN canonical .pot file, building from the ground up.
By the end, a completely fresh, and code-accurate .pot file will exist- (but will still need some hacking to make the UI php/gettext pick up the right .pot file)

The old .pot files were discarded, (fresh start for OPNsense), a new English gettext .pot file needs to be created as a canonical base, (for all other languages to follow).  Perhaps if other languages dominate, 'en' won't be canonical- but at the moment, it most certainly is.

再生成するゼロから構築、EN標準的な.POTファイルを。
年末までに、完全に新鮮な、とコードの正確な.POTファイルが既存します(まだUIのPHPを作るためにいくつかのハッキングが必要になります/右.POTファイルを拾うのgettext)

古い.POTファイルは、(新たなスタートOPNsense用)、(他のすべての言語が従うことのために)ファイル.POT新しい英語のgettextは、標準的なベースとして作成する必要があり、廃棄した。他の言語が支配している場合、おそらく、「EN」canonical-されませんが、現時点では、それは最も確かにある。

--
Before work can really be unleashed on JA translation, the EN base really needs to get done.
作業は本当にJA翻訳に解き放たれる前に、ENベースは本当に片付ける必要がある。

This en first pass is in progress here, I'll keep chipping at it as I can make time this week:
このEN最初のパスは、私はこの週時間を作ることができますように私はそれを欠けておこう、ここで進行中です。

https://github.com/dotike/opnsense.core.ja_JP.UTF8/commits/381eeb644fc223412e17927aff98a925f3477022/src/share/locale/en/LC_MESSAGES/OPNsense.pot
Title: Re: Status: re-generating EN canonical .pot file
Post by: franco on March 25, 2015, 08:35:18 am
Hi Ike,

how is it going, do you need any help with the infrastructure or commits merged into core.git?  Keep up the good work. :)


Cheers,
Franco
Title: Re: Status: re-generating EN canonical .pot file
Post by: dotike on April 12, 2015, 02:36:48 am
Hi Franco,

I just cleaned  up my repo, (I got pretty messy in there, hack hack hack...) don't need help with the repo itself though-

When I finish the EN base, I'll gladly submit the single file as a merge request, from this branch:
https://github.com/dotike/opnsense.core.ja_JP.UTF8/tree/locale.EN.canonical

--
Basic status is this:

I've now been through 3 or 4 full passes, catching all the big issues now- and it's just down to a race to finish the last manual pass through the English .pot file.

- The branch I'm working from is now 28 days old, and I'll need to check the diff for any relevant changes to the translation file.
(Incedentally, I found a number of odd gettext calls in the source which I'll address later)

- After a lot of slogging through the source code, I could generate much of the English .pot file content in an automated manner.  Yet, as with problems like this, comprehensive parsers take far longer to write than just editing the file- there are so many edge cases for how the strings are handled in the codebase.

Expected outcome: when I'm done with this pass, a functional .en POT file will be ready to push upstream, and I'll submit a pull request as soon as that's ready.  Yet, there will be a few things still to do from there:

--
When the English file is done:

- Japanese, (and any other languages) can begin translation work
- OPNsense core will want to maintain the English file, as the canonical base for all other languages
- The gettext machinery will need to be hooked up in to make it all work

Best,
.ike
Title: Re: Status: re-generating EN canonical .pot file
Post by: dotike on April 12, 2015, 02:38:33 am
A quick followup, here's the English file I'm working on:

https://github.com/dotike/opnsense.core.ja_JP.UTF8/blob/locale.EN.canonical/src/share/locale/en/LC_MESSAGES/OPNsense.pot

Title: Re: Status: re-generating EN canonical .pot file
Post by: franco on April 12, 2015, 09:28:56 am
Hi Isaac,

- The branch I'm working from is now 28 days old, and I'll need to check the diff for any relevant changes to the translation file.
(Incedentally, I found a number of odd gettext calls in the source which I'll address later)

I read about those in your commits. The gettext() usage is quite bizarre at times. If you can point me to the lines or add pull requests for those that'd be lovely.

Speaking of gettext, I was thinking we should maybe consider doing the "__()" trick consistently. For one "_()" is just easier on the eyes and the double underscore makes it possible to replace the translation function if we ever decide to switch to non-gettext variants. What do you think?

- After a lot of slogging through the source code, I could generate much of the English .pot file content in an automated manner.  Yet, as with problems like this, comprehensive parsers take far longer to write than just editing the file- there are so many edge cases for how the strings are handled in the codebase.

I'm all for ditching edge cases. Maybe we also need guidelines on what is translated and how dynamic strings and html are embedded (or not). A wiki page or README on this would certainly help to keep the English translations high in quality. I'm not a professional in that regard as well, so I believe that would be a chance to improve and share knowledge.

Expected outcome: when I'm done with this pass, a functional .en POT file will be ready to push upstream, and I'll submit a pull request as soon as that's ready.  Yet, there will be a few things still to do from there:

Looking forward to it. :)

- OPNsense core will want to maintain the English file, as the canonical base for all other languages
- The gettext machinery will need to be hooked up in to make it all work

Agreed, we can definitely do that.


Cheers,
Franco
Title: Re: Status: re-generating EN canonical .pot file
Post by: dotike on April 12, 2015, 10:26:35 am
Hi Isaac,

Quote from: dotike on Today at 02:36:48 AM

Quote
Quote
    - The branch I'm working from is now 28 days old, and I'll need to check the diff for any relevant changes to the translation file.
    (Incedentally, I found a number of odd gettext calls in the source which I'll address later)

I read about those in your commits. The gettext() usage is quite bizarre at times. If you can point me to the lines or add pull requests for those that'd be lovely.

Indeed- after this hurdle is past, I'll start doing them small and granular.
Watching yall' commit in the last month, I realized that I may be a bit premature submitting changes- since yall' are deleting/reworking bits, it'd be a shame for me to waste time cleaning up lines of code which will disappear.

Quote
Speaking of gettext, I was thinking we should maybe consider doing the "__()" trick consistently. For one "_()" is just easier on the eyes and the double underscore makes it possible to replace the translation function if we ever decide to switch to non-gettext variants. What do you think?

I'm not sure I know/understand that trick, what is it?  If it makes it easier to read or parse, I'm all for it.  I'd love to one day see something with less GNU take the place of gettext, but I'm totally not thinking about that right now.

The only thing I can say is that my experiences is that many hands may touch the code, so if it's not extremely clear what the convention is, it'll get botched.  (A knucklehead like myself can grep for all gettext calls and get 99% of the way there with what I'm doing, for example.)

Quote
Quote
Quote from: dotike on Today at 02:36:48 AM

    - After a lot of slogging through the source code, I could generate much of the English .pot file content in an automated manner.  Yet, as with problems like this, comprehensive parsers take far longer to write than just editing the file- there are so many edge cases for how the strings are handled in the codebase.

I'm all for ditching edge cases. Maybe we also need guidelines on what is translated and how dynamic strings and html are embedded (or not). A wiki page or README on this would certainly help to keep the English translations high in quality. I'm not a professional in that regard as well, so I believe that would be a chance to improve and share knowledge.

Certainly- I'm no I18N expert, but I think basic coding policies are always good.  But some of the trouble cases are so obvious, they merely require sane eyes to clean up.  For example, here's a fun one:

Code: [Select]
core:src/www/interfaces.php:2150
Frankly, as much as this line is freaky- with the age and number of hands that have been on this code- I'm surprised at it's quality.  I've unfortunately seen much much worse, professionally speaking.

I've got about 450 "bad" lines left in the EN .pot, (Stopping around line 6953 of 27061 tonight), which leaves me plenty of time to ponder some basic things to contribute to README/quickstart guidelines...  I'll try to keep some notes as I go.

Quote
Quote
Quote from: dotike on Today at 02:36:48 AM

    Expected outcome: when I'm done with this pass, a functional .en POT file will be ready to push upstream, and I'll submit a pull request as soon as that's ready.  Yet, there will be a few things still to do from there:


Looking forward to it. :)

You can bet your ass at this point I'm looking forward to finishing this file too :)

Quote
Quote
Quote from: dotike on Today at 02:36:48 AM

    - OPNsense core will want to maintain the English file, as the canonical base for all other languages
    - The gettext machinery will need to be hooked up in to make it all work


Agreed, we can definitely do that.

Sweet!
In the future, so you don't feel burdened- I (and I hope others), will certainly help to keep things up to date too.
I'm mostly focused on keeping track of Japanese as time moves on, but keeping track of the English file is of course fundamental to that effort.

But for now, so glad you guys are hacking on the important bits :)

/salute
Title: Re: Status: re-generating EN canonical .pot file
Post by: franco on April 12, 2015, 11:22:29 am
Watching yall' commit in the last month, I realized that I may be a bit premature submitting changes- since yall' are deleting/reworking bits, it'd be a shame for me to waste time cleaning up lines of code which will disappear.

Please keep doing that. I realise there is much to clean up but having more patches maybe makes us more aware of problems or stale code than without reviewing said patches. I'll do my best to merge pull requests manually and selectively no matter how far the code base has progressed.

I'm not sure I know/understand that trick, what is it?  If it makes it easier to read or parse, I'm all for it.  I'd love to one day see something with less GNU take the place of gettext, but I'm totally not thinking about that right now.

I think wordpress uses "__()" for all translations and redefines it to gettext(). gettext() itself defines _() so even though it's shorter if gettext() goes away we have to replace _() too or maybe do compat glue that potentially breaks while syncing the PHP shared objects like gettext or a sensible replacement. "__()" we can define via PHP and move to any function we want. That is e.g. great for trying replacements from a dev perspective without breaking the code or ugly hacks as only the call inside __() needs to be adjusted leaving the rest of the code as is.

Certainly- I'm no I18N expert, but I think basic coding policies are always good.  But some of the trouble cases are so obvious, they merely require sane eyes to clean up.  For example, here's a fun one:

Code: [Select]
core:src/www/interfaces.php:2150
Frankly, as much as this line is freaky- with the age and number of hands that have been on this code- I'm surprised at it's quality.  I've unfortunately seen much much worse, professionally speaking.

I looked at   and thought well, that's not so bad--not realising the string spans multiple lines including HTML. :D

I think you are right. The translations have been maintained with fair quality as it stood the test of time (and the Portuguese is really good and thorough as far as I remember).

Sweet!
In the future, so you don't feel burdened- I (and I hope others), will certainly help to keep things up to date too.
I'm mostly focused on keeping track of Japanese as time moves on, but keeping track of the English file is of course fundamental to that effort.

I understand. We constantly fiddle with the wording and clarity of the text and it makes a lot of sense to take care of the English translation file directly and steadily. I'm all for small incremental changes as opposed to large drop ins that put the workload on the other translations (e.g. right before a release; makes no sense at all).


Cheers,
Franco
Title: Re: Status: re-generating EN canonical .pot file
Post by: dotike on April 18, 2015, 03:46:26 am
こにちわ、

以下の機械翻訳を言い訳をしてください、

ちょうどヘッドアップ - 英語.POTファイル上の最初のパスが完了しました。
--
Just a heads up - The first pass on the English .pot file is complete.

https://github.com/opnsense/core/pull/144

Best,
.ike

Title: Re: Status: re-generating EN canonical .pot file
Post by: dotike on May 12, 2015, 04:50:10 am
Hi All,

Just a heads up, the EN canonical .pot file is completed, and the OPNsense team has built tooling around helping manage keeping it up to date!

https://github.com/opnsense/core/blob/master/src/share/locale/en_US/LC_MESSAGES/OPNsense.pot

Best,
.ike