Goals are simplicity of changes and backwards compatibility.
To that aim I propose the following format
- Table chunk name: STRx, instead of STR. If the STRx chunk is present it has priority over the STR chunk.
- Requires the 1.22 map format IDs.
- All strings are encoded using UTF-8 (For consistency with SC:R).
- String offsets changed to uint32 from uint16.
- String count changed to uint32 (necessitates 32bit string IDs within SC - this I am not sure about. If 32bit string IDs are not possible, we limit the valid IDs to 16bit)
Other side effects of 32bit string IDs: Strings that are referenced within sections which store 16bit string indices must be stored in the first 65k strings. (Pointed out by jjf28). We can deal with this by reserving a certain string range for these strings in case we are in danger of overflowing into 32bit indices. (E.g. the last 2K indices are skipped when adding strings unless it is a unit name string). String recycling also needs to keep this in mind.
EUD triggers that are added via EUD compilers should be added to the end of the STRx section instead of the STR section - the rest should function as normal.
Before support is added to SC editors can write both STR and STRx, and keep editor only strings in STRx while leaving the corresponding indices empty in STR to save space.
I want to keep this simpler than the KSTR section just to minimize changes required when updating existing tools and especially the game, in order to hopefully get this earlier. If string parameters are required they can be stored in a second section, or between the end of string NULL byte and the next string. (Although the latter may lead to encoding/decoding errors)
Thoughts?
-------------------------------------------- UPDATE 2019.08.07 -----------------------------
Current thinking is this:
Code
u32 numEntries; // Number of strings in the section (Default: 1024, in-game indices are limited to 65535, stored as uint32 for alignment and indices above 65k may be used by the editor)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to. Should store string data as UTF8 encoded, game will likely attempt korean autodetection regardless but by forcing UTF8 we don't need to emulate that in the tools.
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to. Should store string data as UTF8 encoded, game will likely attempt korean autodetection regardless but by forcing UTF8 we don't need to emulate that in the tools.
This chunk requires 'VER ' to be set to one of the remastered formats
Post has been edited 3 time(s), last time on Aug 7 2019, 9:38 pm by Suicidal Insanity.