Skip to content

Add ability to report positions relative to root stream and add PositionInfo interface #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Feb 23, 2020

@@ -162,7 +191,7 @@ public void seek(long newPos) {
}

@Override
public int pos() {
public long pos() {
Copy link
Member

@generalmimon generalmimon Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you want to add support for >2GB offsets for RandomAccessFileKaitaiStream, but I'm afraid it's not that simple, it deserves more careful consideration. The problem is that long is not automatically coercible to int, and int is the target type for CalcIntType in KS. CalcIntType is the default integer type for all expressions - in fact, almost any integer value instance will have that type.

Switching _io.pos type to u8 (long) requires additional changes in KSC - whenever you use _io.pos in expression, the derived type must be long, not int. The same applies to _io.size - the fact that it's declared as long is a bit heedless. Right now, when you reference _io.pos in a value instance like this:

instances:
  v:
    value: _io.size

despite the _io.size is declared as

it will be truncated to int anyway:

    public Integer v() {
        if (this.v != null)
            return this.v;
        int _tmp = (int) (_io().size());
        this.v = _tmp;
        return this.v;
    }

Another solution would be switch target type of CalcIntType from int to long (see kaitai-io/kaitai_struct#510), but it also requires a careful consideration - as I understand, long cannot be used everywhere in Java, for example in array indices. Morever, Java conventions suggest to use int whenever possible, and it should be probably also more efficient at least on 32-bit platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. All that you say is applicable to the size, but pos, it seems, is not used in the array subscription expressions... At least none of the https://github.com/kaitai-io/kaitai_struct_tests fail, but I'm not known if these are ALL tests...

Also, it seems that pos() isn't used much in the computations: if you try to find usages of the pos in the https://github.com/kaitai-io/kaitai_struct_formats repo (RegExp: pos[^:_a-z]) you'll find only 16 occurrences that used only with conjunction with the size (which is already long) or plain number. So at least that change seems to not break anything.

Results from VS Code
Результаты: 16 - Файлы 12

kaitai_struct • formats\archive\gzip.ksy:
  64    - id: body
  65:     size: _io.size - _io.pos - 8
  66      doc: |

kaitai_struct • formats\archive\phar_without_stub.ksy:
  300        - id: data
  301:         size: _io.size - _io.pos - 8
  302          doc: |

kaitai_struct • formats\database\tsm.ksy:
  38          repeat: until
  39:         repeat-until: _io.pos == _io.size - 8
  40          type: index_header

kaitai_struct • formats\filesystem\vdi.ksy:
  159              type: geometry
  160:             if: _parent.version.major>=1 and _io.pos + 16 <= _io.size
  161  

kaitai_struct • formats\hardware\mifare\mifare_classic.ksy:
  33          -orig-id: abtData
  34:         size: _io.size - _io.pos - 16 # sizeof(trailer)
  35          type: filler

kaitai_struct • formats\log\systemd_journal.ksy:
  134        - id: padding
  135:         size: (8 - _io.pos) % 8
  136        - id: object_type

kaitai_struct • formats\media\android_opengl_shaders_cache.ksy:
  25        - id: alignment
  26:         size: "(_io.pos + 3) & ~3 - _io.pos"
  27          doc: garbage from memory

kaitai_struct • formats\media\blender_blend.ksy:
  114        - id: padding_1
  115:         size: (4 - _io.pos) % 4
  116  

  128        - id: padding_2
  129:         size: (4 - _io.pos) % 4
  130  

  139        - id: padding_3
  140:         size: (4 - _io.pos) % 4
  141  

kaitai_struct • formats\media\id3v2_3.ksy:
  26          repeat: until
  27:         repeat-until: _io.pos + _.size > header.size.value or _.is_invalid
  28        - id: padding

kaitai_struct • formats\media\id3v2_4.ksy:
  27          repeat: until
  28:         repeat-until: _io.pos + _.size.value > header.size.value or _.is_invalid
  29        - id: padding

kaitai_struct • formats\network\rtp_packet.ksy:
  37    - id: data
  38:     size: _io.size - _io.pos - len_padding
  39      doc: Payload without padding.

kaitai_struct • formats\windows\windows_resource_file.ksy:
  63        - id: padding1
  64:         size: (4 - _io.pos) % 4
  65        - id: format_version

  84        - id: padding2
  85:         size: (4 - _io.pos) % 4
  86      instances:

So, if you conclusion about pos() type wouldn't changed, just say and I'll revert this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@generalmimon, could you comment on this?

@Mingun
Copy link
Contributor Author

Mingun commented Nov 16, 2020

@generalmimon, thanks for review! I post some answers with explanation of my minds when I made these changes and if you find they unconvincing, just say and I'll apply your suggestions.

Also, over the time, I'm noticed, that usually things like Region called Span (ex: C#, Rust), so I'm going to change names accordingly

@Mingun Mingun requested a review from generalmimon November 16, 2020 14:16
@generalmimon
Copy link
Member

There's also an important thing to consider: creating a stream on processed bytes, for example with process: zlib (kudos to @GreyCat and his kaitai-io/kaitai_struct#331 (comment) that reminded me of it). The decompression gives you a brand new block of data and the bytes and stream positions don't map to anything in the original root file stream. In fact, you get another root stream.

@Mingun
Copy link
Contributor Author

Mingun commented Nov 17, 2020

Good point! But need to notice, that the such problem exists in the web-ide and maybe in the https://github.com/kaitai-io/kaitai_struct_visualizer.

To solve that problem we can track which root Span belongs to, for example, by assign unique identifiers to the each root, created by the process key, or for the each Span to store its parent Span:

class Span {
  private final long offset;
  private final long long start;
  private long end = -1;


  private final int rootCategory;
  //-OR-
  private final Span parent;
  ...
}

Each root Span will have root = null

@Mingun Mingun force-pushed the absolute-positions branch from 2a11087 to bac8e01 Compare November 19, 2020 17:04
@Mingun
Copy link
Contributor Author

Mingun commented Nov 19, 2020

Implementation of tags for different stream origins I'll push later. Anyway, it's needed only for correct interpretation of offsets, so maybe even another PR will be preferrable

@Mingun Mingun force-pushed the absolute-positions branch from 57362ea to de66269 Compare March 8, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants