[Bug] RowId mismatch in file and metadata

### Search before asking

- [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar.


### Paimon version

Master

### Compute Engine

Spark

### Minimal reproduce step

```
// first part
spark.sql("CREATE TABLE t (id INT, data INT) TBLPROPERTIES ('row-tracking.enabled' = 'true')")
spark.sql("INSERT INTO t SELECT /*+ REPARTITION(1) */ id, id AS data FROM range(1, 4)")

// second part
spark.sql("UPDATE t SET data = 22 WHERE id = 2")

// third part
spark.sql("INSERT INTO t VALUES (4, 4), (5, 5)")
spark.sql("SELECT *, _ROW_ID, _SEQUENCE_NUMBER FROM t").show
/* the result of select
+---+----+-------+----------------+
| id|data|_ROW_ID|_SEQUENCE_NUMBER|
+---+----+-------+----------------+
|  1|   1|      0|               1|
|  2|  22|      1|               2|
|  3|   3|      2|               1|
|  4|   4|      6|               3|
|  5|   5|      7|               3|
+---+----+-------+----------------+
*/
```

### What doesn't meet your expectations?

When the second part of the code above (the update operation) is executed, the original data is read from the old file and written to a new file, along with `_ROW_ID` and `_SEQUENCE_NUMBER`. At this point, the new file contains both `_ROW_ID` and `_SEQUENCE_NUMBER`, but the `firstRowId` in the file metadata is null. Later, during the commit phase, the `firstRowId` in the file metadata is assigned based on the `nextRowId` from the snapshot. This leads to a mismatch between the rowIds in the file and the metadata. As a result, if we want to query data by rowId, some records may be missed, because paimon core skips certain files according to the `firstRowId` in the metadata when generating a scan plan.
Additionally, when the third part of the code (the insert operation) is executed, this issue also causes the newly inserted rows to have unexpected `_ROW_ID`.
A visualization of this issue is provided below.
<img width="1478" height="492" alt="Image" src="https://github.com/user-attachments/assets/77c15db7-3109-4e2d-9d35-875f319a207a" />
This issue likewise exists for the merge into operation (when only `'row-tracking.enabled' = 'true'` is set). To resolve this issue, it may be necessary to assign the `firstRowId` in the metadata during the write phase for update and merge into scenarios, rather than delaying it until the commit phase.

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] RowId mismatch in file and metadata #6747

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] RowId mismatch in file and metadata #6747

Description

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions