diff --git a/docs/template-fs.md b/docs/template-fs.md index 13df35f..627ccec 100644 --- a/docs/template-fs.md +++ b/docs/template-fs.md @@ -1,6 +1,6 @@ # TemplateFs -The `TemplateFs` class is designed for working with Excel (`.xlsx`) templates. It supports extracting, modifying, and rebuilding Excel files. Typical use cases include placeholder substitution, sheet duplication, and row insertion. +The `TemplateFs` class is designed for working with Excel (`.xlsx`) templates extracted to the filesystem. It enables modifying templates, inserting rows, substituting placeholders, and saving either as a `Buffer` or directly into a writable stream. > ⚠️ **Experimental API** > Interface is subject to change in future versions. @@ -13,8 +13,12 @@ The `TemplateFs` class is designed for working with Excel (`.xlsx`) templates. I new TemplateFs(fileKeys: Set, destination: string) ``` -- `fileKeys` — a set of relative file paths that make up the Excel template. -- `destination` — a path to a directory where the template is extracted and edited. +- Input: + - `fileKeys` — a set of relative file paths representing the `.xlsx` file structure. + - `destination` — path to a directory where the template is extracted and modified. +- Output: `TemplateFs` instance +- Preconditions: None +- Postconditions: Instance is ready for use with provided files. > Prefer using the static method `TemplateFs.from()` to create instances. @@ -22,42 +26,65 @@ new TemplateFs(fileKeys: Set, destination: string) ## 📄 Properties -- `fileKeys: Set` — the set of template file paths involved in final assembly. -- `destination: string` — the working directory where files are extracted and edited. +- `fileKeys: Set` — set of template file paths used for rebuilding the `.xlsx`. +- `destination: string` — working directory for extracted and modified files. - `destroyed: boolean` — indicates whether the instance has been destroyed (read-only). --- ## 📚 Methods -### `copySheet(sourceName: string, newName: string): Promise` +### `copySheet` Creates a copy of an existing worksheet with a new name. -- `sourceName` — the name of the existing sheet. -- `newName` — the name for the new sheet. +- Input: + - `sourceName: string` — name of the existing sheet. + - `newName: string` — name for the new sheet. +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - `sourceName` exists + - `newName` does not exist +- Postconditions: + - New sheet created with content from source + - Sheet relationships updated - Throws if: - `sourceName` does not exist. - `newName` already exists. --- -### `substitute(sheetName: string, replacements: Record): Promise` +### `substitute` -Replaces placeholders of the form `${key}` with values from the `replacements` object. For arrays use placeholders with key `${table:key}` +Replaces placeholders of the form `${key}` with values from the `replacements` object. For arrays, use placeholders with key `${table:key}`. -- `sheetName` — the name of the worksheet. -- `replacements` — key-value map for substitution. +- Input: + - `sheetName: string` — name of worksheet + - `replacements: Record` — key-value map for substitution +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists +- Postconditions: + - Placeholders replaced with values + - Shared strings updated if needed ---- - -### `insertRows(data: { sheetName: string; startRowNumber?: number; rows: unknown[][] }): Promise` +### `insertRows` Inserts rows into a specified worksheet. -- `sheetName` — name of the worksheet. -- `startRowNumber` — starting row index (default: append to the end). -- `rows` — array of arrays, each representing a row of values. +- Input: + - `data: { sheetName: string; startRowNumber?: number; rows: unknown[][] }` +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists + - Row number valid if specified + - Cells within bounds +- Postconditions: + - Rows inserted at specified position + - Sheet data updated - Throws if: - The sheet does not exist. - The row number is invalid. @@ -65,61 +92,92 @@ Inserts rows into a specified worksheet. --- -### `insertRowsStream(data: { sheetName: string; startRowNumber?: number; rows: AsyncIterable }): Promise` +### `insertRowsStream` Streams and inserts rows into a worksheet, useful for handling large datasets. -- `sheetName` — name of the worksheet. -- `startRowNumber` — starting row index (default: append to the end). -- `rows` — an async iterable where each item is an array of cell values. +- Input: + - `data: { sheetName: string; startRowNumber?: number; rows: AsyncIterable }` +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists + - Row number valid if specified + - Cells within bounds +- Postconditions: + - Rows streamed and inserted + - Sheet data updated - Same error conditions as `insertRows`. --- -### `save(): Promise` +### `save` Generates a new Excel file and returns it as a `Buffer`. -- Returns: `Promise` — the full `.xlsx` file contents in memory. +- Input: None +- Output: `Promise` — the full `.xlsx` file contents in memory. +- Preconditions: + - Instance not destroyed +- Postconditions: + - Instance marked as destroyed + - Temporary files removed if necessary + - ZIP archive created - Throws if: - The instance has been destroyed. - - There was a failure while rebuilding the ZIP archive. + - Failure occurs while rebuilding the ZIP archive. --- -### `saveStream(output: Writable): Promise` +### `saveStream` -Writes the resulting Excel file to a writable stream. +Writes the resulting Excel file directly to a writable stream. -- `output` — any writable stream, e.g. a file or HTTP response. +- Input: + - `output: Writable` — target writable stream (e.g., file, HTTP response). +- Output: `Promise` +- Preconditions: + - Instance not destroyed +- Postconditions: + - Excel file streamed to output - Throws if: - The instance has been destroyed. - - There was a failure during streaming or rebuilding the ZIP archive. + - Streaming or rebuilding fails. --- -### `validate(): Promise` +### `validate` -Validates the template by checking all required files exist. +Validates the internal state by checking if all required files exist. -- Returns: `Promise` -- Throws: - - If the template instance has been destroyed. - - If any required files are missing. +- Input: None +- Output: `Promise` +- Preconditions: + - Instance not destroyed +- Postconditions: + - Missing files detected (if any) +- Throws if: + - The instance has been destroyed. + - Any required file is missing. --- -### `set(key: string, content: Buffer | string): Promise` +### `set` -Replaces the contents of a file in the template. +Replaces the content of a specific file in the template. -- `key` — the relative path of the file within the Excel package (e.g., `xl/worksheets/sheet1.xml`). -- `content` — the new file content as a `Buffer` or `string`. - -- Returns: `Promise` -- Throws: - - If the template instance has been destroyed. - - If the file does not exist in the template. +- Input: + - `key: string` — relative Excel path (e.g., `xl/worksheets/sheet1.xml`) + - `content: Buffer | string` — new file content +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - File exists +- Postconditions: + - File content updated +- Throws if: + - The instance has been destroyed. + - The file does not exist. --- @@ -214,5 +272,6 @@ await template.saveStream(outputStream); Methods perform validation: -- Ensures the instance hasn't been destroyed. -- Prevents concurrent modifications. +- Ensure the instance hasn't been destroyed. +- Prevent concurrent modifications. +- Ensure required files are present before saving. diff --git a/docs/template-memory.md b/docs/template-memory.md index c09f3e4..267221f 100644 --- a/docs/template-memory.md +++ b/docs/template-memory.md @@ -13,7 +13,10 @@ The `TemplateMemory` class is designed for working with Excel (`.xlsx`) template new TemplateMemory(files: Record) ``` -- `files` — a map of file paths to their contents as `Buffer`s, representing the `.xlsx` file structure. +- Input: `files` — a map of file paths to their contents as `Buffer`s, representing the `.xlsx` file structure. +- Output: `TemplateMemory` instance +- Preconditions: None +- Postconditions: Instance is ready for use with provided files > Prefer using the static method `TemplateMemory.from()` to create instances. @@ -28,34 +31,61 @@ new TemplateMemory(files: Record) ## 📚 Methods -### `copySheet(sourceName: string, newName: string): Promise` +### `copySheet` Creates a copy of an existing worksheet with a new name. -- `sourceName` — the name of the existing sheet. -- `newName` — the name for the new sheet. +- Input: + - `sourceName: string` - name of existing sheet + - `newName: string` - name for new sheet +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - `sourceName` exists + - `newName` does not exist +- Postconditions: + - New sheet created with content from source + - Sheet relationships updated - Throws if: - `sourceName` does not exist. - `newName` already exists. --- -### `substitute(sheetName: string, replacements: Record): Promise` +### `substitute` Replaces placeholders of the form `${key}` with values from the `replacements` object. For arrays, use placeholders with key `${table:key}`. -- `sheetName` — the name of the worksheet. -- `replacements` — key-value map for substitution. +- Input: + - `sheetName: string` - name of worksheet + - `replacements: Record` - key-value map for substitution +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists +- Postconditions: + - Placeholders replaced with values + - Shared strings updated if needed --- -### `insertRows(data: { sheetName: string; startRowNumber?: number; rows: unknown[][] }): Promise` +### `insertRows` Inserts rows into a specified worksheet. -- `sheetName` — name of the worksheet. -- `startRowNumber` — starting row index (default: append to the end). -- `rows` — array of arrays, each representing a row of values. +- Input: + - `sheetName: string` - name of worksheet + - `startRowNumber?: number` - starting row index (default: append to the end). + - `rows: unknown[][]` - array of arrays, each representing a row of values. +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists + - Row number valid if specified + - Cells within bounds +- Postconditions: + - Rows inserted at specified position + - Sheet data updated - Throws if: - The sheet does not exist. - The row number is invalid. @@ -63,50 +93,84 @@ Inserts rows into a specified worksheet. --- -### `insertRowsStream(data: { sheetName: string; startRowNumber?: number; rows: AsyncIterable }): Promise` +### `insertRowsStream` Streams and inserts rows into a worksheet, useful for handling large datasets. -- `sheetName` — name of the worksheet. -- `startRowNumber` — starting row index (default: append to the end). -- `rows` — an async iterable where each item is an array of cell values. +- Input: + - `sheetName: string` - name of worksheet + - `startRowNumber?: number` - starting row index (default: append to the end). + - `rows: AsyncIterable` - an async iterable where each item is an array of cell values. +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - Sheet exists + - Row number valid if specified + - Cells within bounds +- Postconditions: + - Rows streamed and inserted + - Sheet data updated - Same error conditions as `insertRows`. --- -### `save(): Promise` +### `save` Generates a new Excel file and returns it as a `Buffer`. -- Returns: `Promise` — the full `.xlsx` file contents in memory. +- Input: None +- Output: `Promise` — the full `.xlsx` file contents in memory. +- Preconditions: + - Instance not destroyed +- Postconditions: + - Instance marked as destroyed + - All buffers cleared + - ZIP archive created - Throws if: - The instance has been destroyed. - There was a failure while rebuilding the ZIP archive. --- -### `set(key: string, content: Buffer): Promise` +### `set` Replaces the content of a specific file in the template. -- `key` — the Excel path of the file (e.g., `xl/worksheets/sheet1.xml`). -- `content` — new file content as a Buffer. +- Input: + - `key: string` — the Excel path of the file (e.g., `xl/worksheets/sheet1.xml`). + - `content: Buffer` - new file content as a Buffer. +- Output: `Promise` +- Preconditions: + - Instance not destroyed + - File exists +- Postconditions: + - File content updated - Throws if: - The instance has been destroyed. - The file does not exist in the template. --- -### `mergeSheets(data: { additions: { sheetIndexes?: number[]; sheetNames?: string[] }; baseSheetIndex?: number; baseSheetName?: string; gap?: number }): void` +### `mergeSheets` Merges multiple worksheets into a single base worksheet. +- Input: - `additions` — defines the sheets to merge: - - `sheetIndexes` — array of 1-based sheet indexes to merge. - - `sheetNames` — array of sheet names to merge. -- `baseSheetIndex` — 1-based index of the base sheet to merge into (optional, default is 1). -- `baseSheetName` — name of the base sheet to merge into (optional). -- `gap` — number of empty rows to insert between merged sections (default: `0`). + - `additions.sheetIndexes?: number[]` — array of 1-based sheet indexes to merge. + - `additions.sheetNames?: string[]` — array of sheet names to merge. + - `baseSheetIndex?: number` — 1-based index of the base sheet to merge into (optional, default is 1). + - `baseSheetName?: string` — name of the base sheet to merge into (optional). + - `gap?: number` - number of empty rows to insert between merged sections (default: `0`). +- Output: `void` +- Preconditions: + - Instance not destroyed + - Valid sheet names/indexes + - Either baseSheetIndex or baseSheetName defined +- Postconditions: + - Sheets merged into base sheet + - Row numbers adjusted + - Merge cells updated - Throws if: - The instance is destroyed. - Invalid sheet names or indexes are provided. @@ -114,12 +178,22 @@ Merges multiple worksheets into a single base worksheet. --- -### `removeSheets(data: { sheetNames?: string[]; sheetIndexes?: number[] }): void` +### `removeSheets` Removes worksheets from the workbook. -- `sheetNames` — array of sheet names to remove. -- `sheetIndexes` — array of 1-based sheet indexes to remove. +- Input: + - `sheetNames?: string[]` - names of sheets to remove + - `sheetIndexes?: number[]` - 1-based indexes of sheets to remove +- Output: `void` +- Preconditions: + - Instance not destroyed + - Sheets exist + - Either sheetNames or sheetIndexes provided +- Postconditions: + - Sheets removed + - Workbook relationships updated + - Content types updated - Throws if: - The instance is destroyed. - Sheet names or indexes do not exist. diff --git a/package-lock.json b/package-lock.json index d0429d0..203b9ff 100644 --- a/package-lock.json +++ b/package-lock.json @@ -8,9 +8,6 @@ "name": "@js-ak/excel-toolbox", "version": "1.5.0", "license": "MIT", - "dependencies": { - "pako": "2.1.0" - }, "devDependencies": { "@semantic-release/changelog": "6.0.3", "@semantic-release/commit-analyzer": "13.0.0", @@ -20,7 +17,6 @@ "@semantic-release/release-notes-generator": "14.0.0", "@stylistic/eslint-plugin-ts": "4.2.0", "@types/node": "22.14.0", - "@types/pako": "2.0.3", "@vitest/coverage-v8": "3.1.2", "eslint": "9.24.0", "eslint-plugin-sort-destructure-keys": "2.0.0", @@ -2119,13 +2115,6 @@ "integrity": "sha512-ehPtgRgaULsFG8x0NeYJvmyH1hmlfsNLujHe9dQEia/7MAJYdzMSi19JtchUHjmBA6XC/75dK55mzZH+RyieSg==", "dev": true }, - "node_modules/@types/pako": { - "version": "2.0.3", - "resolved": "https://registry.npmjs.org/@types/pako/-/pako-2.0.3.tgz", - "integrity": "sha512-bq0hMV9opAcrmE0Byyo0fY3Ew4tgOevJmQ9grUhpXQhYfyLJ1Kqg3P33JT5fdbT2AjeAjR51zqqVjAL/HMkx7Q==", - "dev": true, - "license": "MIT" - }, "node_modules/@types/semver": { "version": "7.5.8", "resolved": "https://registry.npmjs.org/@types/semver/-/semver-7.5.8.tgz", @@ -7854,12 +7843,6 @@ "dev": true, "license": "BlueOak-1.0.0" }, - "node_modules/pako": { - "version": "2.1.0", - "resolved": "https://registry.npmjs.org/pako/-/pako-2.1.0.tgz", - "integrity": "sha512-w+eufiZ1WuJYgPXbV/PO3NCMEc3xqylkKHzp8bxp1uW4qaSNQUkwmLLEc3kKsfz8lpV1F8Ht3U1Cm+9Srog2ug==", - "license": "(MIT AND Zlib)" - }, "node_modules/parent-module": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/parent-module/-/parent-module-1.0.1.tgz", @@ -11078,12 +11061,6 @@ "integrity": "sha512-ehPtgRgaULsFG8x0NeYJvmyH1hmlfsNLujHe9dQEia/7MAJYdzMSi19JtchUHjmBA6XC/75dK55mzZH+RyieSg==", "dev": true }, - "@types/pako": { - "version": "2.0.3", - "resolved": "https://registry.npmjs.org/@types/pako/-/pako-2.0.3.tgz", - "integrity": "sha512-bq0hMV9opAcrmE0Byyo0fY3Ew4tgOevJmQ9grUhpXQhYfyLJ1Kqg3P33JT5fdbT2AjeAjR51zqqVjAL/HMkx7Q==", - "dev": true - }, "@types/semver": { "version": "7.5.8", "resolved": "https://registry.npmjs.org/@types/semver/-/semver-7.5.8.tgz", @@ -14993,11 +14970,6 @@ "integrity": "sha512-UEZIS3/by4OC8vL3P2dTXRETpebLI2NiI5vIrjaD/5UtrkFX/tNbwjTSRAGC/+7CAo2pIcBaRgWmcBBHcsaCIw==", "dev": true }, - "pako": { - "version": "2.1.0", - "resolved": "https://registry.npmjs.org/pako/-/pako-2.1.0.tgz", - "integrity": "sha512-w+eufiZ1WuJYgPXbV/PO3NCMEc3xqylkKHzp8bxp1uW4qaSNQUkwmLLEc3kKsfz8lpV1F8Ht3U1Cm+9Srog2ug==" - }, "parent-module": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/parent-module/-/parent-module-1.0.1.tgz", diff --git a/package.json b/package.json index 359e828..c6a46a7 100644 --- a/package.json +++ b/package.json @@ -70,7 +70,6 @@ "@semantic-release/release-notes-generator": "14.0.0", "@stylistic/eslint-plugin-ts": "4.2.0", "@types/node": "22.14.0", - "@types/pako": "2.0.3", "@vitest/coverage-v8": "3.1.2", "eslint": "9.24.0", "eslint-plugin-sort-destructure-keys": "2.0.0", @@ -80,8 +79,5 @@ "typescript": "5.8.3", "typescript-eslint": "8.29.0", "vitest": "3.1.2" - }, - "dependencies": { - "pako": "2.1.0" } } diff --git a/src/lib/merge-sheets-to-base-file-process-sync.ts b/src/lib/merge-sheets-to-base-file-process-sync.ts new file mode 100644 index 0000000..9fda7b9 --- /dev/null +++ b/src/lib/merge-sheets-to-base-file-process-sync.ts @@ -0,0 +1,122 @@ +import * as Utils from "./utils/index.js"; +import * as Xml from "./xml/index.js"; + +/** + * Merges rows from other Excel files into a base Excel file. + * + * This function is a process-friendly version of mergeSheetsToBaseFile. + * It takes a single object with the following properties: + * - additions: An array of objects with two properties: + * - files: A dictionary of file paths to their corresponding XML content + * - sheetIndexes: The 1-based indexes of the sheet to extract rows from + * - baseFiles: A dictionary of file paths to their corresponding XML content + * - baseSheetIndex: The 1-based index of the sheet in the base file to add rows to + * - gap: The number of empty rows to insert between each added section + * - sheetNamesToRemove: The names of sheets to remove from the output file + * - sheetsToRemove: The 1-based indices of sheets to remove from the output file + * + * The function returns a dictionary of file paths to their corresponding XML content. + */ +export function mergeSheetsToBaseFileProcessSync(data: { + additions: { files: Record; sheetIndexes: number[] }[]; + baseFiles: Record; + baseSheetIndex: number; + gap: number; + sheetNamesToRemove: string[]; + sheetsToRemove: number[]; +}): void { + const { + additions, + baseFiles, + baseSheetIndex, + gap, + sheetNamesToRemove, + sheetsToRemove, + } = data; + + const basePath = `xl/worksheets/sheet${baseSheetIndex}.xml`; + + if (!baseFiles[basePath]) { + throw new Error(`Base file does not contain ${basePath}`); + } + + const { + lastRowNumber, + mergeCells: baseMergeCells, + rows: baseRows, + xml, + } = Xml.extractRowsFromSheetSync(baseFiles[basePath]); + + const allRows = [...baseRows]; + const allMergeCells = [...baseMergeCells]; + let currentRowOffset = lastRowNumber + gap; + + for (const { files, sheetIndexes } of additions) { + for (const sheetIndex of sheetIndexes) { + const sheetPath = `xl/worksheets/sheet${sheetIndex}.xml`; + + if (!files[sheetPath]) { + throw new Error(`File does not contain ${sheetPath}`); + } + + const { mergeCells, rows } = Xml.extractRowsFromSheetSync(files[sheetPath]); + + const shiftedRows = Xml.shiftRowIndices(rows, currentRowOffset); + + const shiftedMergeCells = mergeCells.map(cell => { + const [start, end] = cell.ref.split(":"); + + if (!start || !end) { + return cell; + } + + const shiftedStart = Utils.shiftCellRef(start, currentRowOffset); + const shiftedEnd = Utils.shiftCellRef(end, currentRowOffset); + + return { ...cell, ref: `${shiftedStart}:${shiftedEnd}` }; + }); + + allRows.push(...shiftedRows); + allMergeCells.push(...shiftedMergeCells); + currentRowOffset += Utils.getMaxRowNumber(rows) + gap; + } + } + + const mergedXml = Xml.buildMergedSheet( + xml, + allRows, + allMergeCells, + ); + + baseFiles[basePath] = mergedXml; + + for (const sheetIndex of sheetsToRemove) { + const sheetPath = `xl/worksheets/sheet${sheetIndex}.xml`; + delete baseFiles[sheetPath]; + + if (baseFiles["xl/workbook.xml"]) { + baseFiles["xl/workbook.xml"] = Buffer.from(Utils.removeSheetFromWorkbook( + baseFiles["xl/workbook.xml"].toString(), + sheetIndex, + )); + } + + if (baseFiles["xl/_rels/workbook.xml.rels"]) { + baseFiles["xl/_rels/workbook.xml.rels"] = Buffer.from(Utils.removeSheetFromRels( + baseFiles["xl/_rels/workbook.xml.rels"].toString(), + sheetIndex, + )); + } + + if (baseFiles["[Content_Types].xml"]) { + baseFiles["[Content_Types].xml"] = Buffer.from(Utils.removeSheetFromContentTypes( + baseFiles["[Content_Types].xml"].toString(), + sheetIndex, + )); + } + } + + for (const sheetName of sheetNamesToRemove) { + Utils.removeSheetByName(baseFiles, sheetName); + } +} diff --git a/src/lib/merge-sheets-to-base-file-process.ts b/src/lib/merge-sheets-to-base-file-process.ts index a0e15fa..5b602fe 100644 --- a/src/lib/merge-sheets-to-base-file-process.ts +++ b/src/lib/merge-sheets-to-base-file-process.ts @@ -17,14 +17,14 @@ import * as Xml from "./xml/index.js"; * * The function returns a dictionary of file paths to their corresponding XML content. */ -export function mergeSheetsToBaseFileProcess(data: { +export async function mergeSheetsToBaseFileProcess(data: { additions: { files: Record; sheetIndexes: number[] }[]; baseFiles: Record; baseSheetIndex: number; gap: number; sheetNamesToRemove: string[]; sheetsToRemove: number[]; -}): void { +}): Promise { const { additions, baseFiles, @@ -45,7 +45,7 @@ export function mergeSheetsToBaseFileProcess(data: { mergeCells: baseMergeCells, rows: baseRows, xml, - } = Xml.extractRowsFromSheet(baseFiles[basePath]); + } = await Xml.extractRowsFromSheet(baseFiles[basePath]); const allRows = [...baseRows]; const allMergeCells = [...baseMergeCells]; @@ -59,7 +59,7 @@ export function mergeSheetsToBaseFileProcess(data: { throw new Error(`File does not contain ${sheetPath}`); } - const { mergeCells, rows } = Xml.extractRowsFromSheet(files[sheetPath]); + const { mergeCells, rows } = await Xml.extractRowsFromSheet(files[sheetPath]); const shiftedRows = Xml.shiftRowIndices(rows, currentRowOffset); diff --git a/src/lib/merge-sheets-to-base-file-sync.ts b/src/lib/merge-sheets-to-base-file-sync.ts index 4d1178a..4acd00d 100644 --- a/src/lib/merge-sheets-to-base-file-sync.ts +++ b/src/lib/merge-sheets-to-base-file-sync.ts @@ -1,7 +1,7 @@ import * as Utils from "./utils/index.js"; import * as Zip from "./zip/index.js"; -import { mergeSheetsToBaseFileProcess } from "./merge-sheets-to-base-file-process.js"; +import { mergeSheetsToBaseFileProcessSync } from "./merge-sheets-to-base-file-process-sync.js"; /** * Merge rows from other Excel files into a base Excel file. @@ -49,7 +49,7 @@ export function mergeSheetsToBaseFileSync(data: { }); } - mergeSheetsToBaseFileProcess({ + mergeSheetsToBaseFileProcessSync({ additions: additionsUpdated, baseFiles, baseSheetIndex, diff --git a/src/lib/merge-sheets-to-base-file.ts b/src/lib/merge-sheets-to-base-file.ts index 6bb5dc9..fec9c4a 100644 --- a/src/lib/merge-sheets-to-base-file.ts +++ b/src/lib/merge-sheets-to-base-file.ts @@ -49,7 +49,7 @@ export async function mergeSheetsToBaseFile(data: { }); } - mergeSheetsToBaseFileProcess({ + await mergeSheetsToBaseFileProcess({ additions: additionsUpdated, baseFiles, baseSheetIndex, diff --git a/src/lib/template/template-fs.ts b/src/lib/template/template-fs.ts index c61ada0..3532f07 100644 --- a/src/lib/template/template-fs.ts +++ b/src/lib/template/template-fs.ts @@ -156,7 +156,7 @@ export class TemplateFs { */ async #getSheetPathByName(sheetName: string): Promise { // Read XML workbook to find sheet name and path - const workbookXml = Xml.extractXmlFromSheet(await this.#readFile(this.#excelKeys.workbook)); + const workbookXml = await Xml.extractXmlFromSheet(await this.#readFile(this.#excelKeys.workbook)); const sheetMatch = workbookXml.match(Utils.sheetMatch(sheetName)); if (!sheetMatch || !sheetMatch[1]) { @@ -164,7 +164,7 @@ export class TemplateFs { } const rId = sheetMatch[1]; - const relsXml = Xml.extractXmlFromSheet(await this.#readFile(this.#excelKeys.workbookRels)); + const relsXml = await Xml.extractXmlFromSheet(await this.#readFile(this.#excelKeys.workbookRels)); const relMatch = relsXml.match(Utils.relationshipMatch(rId)); if (!relMatch || !relMatch[1]) { @@ -238,11 +238,11 @@ export class TemplateFs { let sheetContent = ""; if (this.fileKeys.has(sharedStringsPath)) { - sharedStringsContent = Xml.extractXmlFromSheet(await this.#readFile(sharedStringsPath)); + sharedStringsContent = await Xml.extractXmlFromSheet(await this.#readFile(sharedStringsPath)); } if (this.fileKeys.has(sheetPath)) { - sheetContent = Xml.extractXmlFromSheet(await this.#readFile(sheetPath)); + sheetContent = await Xml.extractXmlFromSheet(await this.#readFile(sheetPath)); const TABLE_REGEX = /\$\{table:([a-zA-Z0-9_]+)\.([a-zA-Z0-9_]+)\}/g; @@ -309,7 +309,7 @@ export class TemplateFs { // Read workbook.xml and find the source sheet const workbookXmlPath = this.#excelKeys.workbook; - const workbookXml = Xml.extractXmlFromSheet(await this.#readFile(workbookXmlPath)); + const workbookXml = await Xml.extractXmlFromSheet(await this.#readFile(workbookXmlPath)); // Find the source sheet const sheetMatch = workbookXml.match(Utils.sheetMatch(sourceName)); @@ -327,7 +327,7 @@ export class TemplateFs { // Find the source sheet path by rId const rId = sheetMatch[1]; const relsXmlPath = this.#excelKeys.workbookRels; - const relsXml = Xml.extractXmlFromSheet(await this.#readFile(relsXmlPath)); + const relsXml = await Xml.extractXmlFromSheet(await this.#readFile(relsXmlPath)); const relMatch = relsXml.match(Utils.relationshipMatch(rId)); if (!relMatch || !relMatch[1]) { @@ -376,7 +376,7 @@ export class TemplateFs { // Read [Content_Types].xml // Update [Content_Types].xml const contentTypesPath = this.#excelKeys.contentTypes; - const contentTypesXml = Xml.extractXmlFromSheet(await this.#readFile(contentTypesPath)); + const contentTypesXml = await Xml.extractXmlFromSheet(await this.#readFile(contentTypesPath)); const overrideTag = ``; const updatedContentTypesXml = contentTypesXml.replace( "", @@ -451,7 +451,7 @@ export class TemplateFs { const sheetPath = await this.#getSheetPathByName(sheetName); const sheetXmlRaw = await this.#readFile(sheetPath); - const sheetXml = Xml.extractXmlFromSheet(sheetXmlRaw); + const sheetXml = await Xml.extractXmlFromSheet(sheetXmlRaw); let nextRow = 0; diff --git a/src/lib/template/template-memory.ts b/src/lib/template/template-memory.ts index 00af0af..c9a7b11 100644 --- a/src/lib/template/template-memory.ts +++ b/src/lib/template/template-memory.ts @@ -131,7 +131,7 @@ export class TemplateMemory { * @throws {Error} If the file key is not found. * @experimental This API is experimental and might change in future versions. */ - #extractXmlFromSheet(fileKey: string): string { + async #extractXmlFromSheet(fileKey: string): Promise { if (!this.files[fileKey]) { throw new Error(`${fileKey} not found`); } @@ -151,12 +151,12 @@ export class TemplateMemory { * @throws {Error} If the file key is not found * @experimental This API is experimental and might change in future versions. */ - #extractRowsFromSheet(fileKey: string): { + async #extractRowsFromSheet(fileKey: string): Promise<{ rows: string[]; lastRowNumber: number; mergeCells: { ref: string }[]; xml: string; - } { + }> { if (!this.files[fileKey]) { throw new Error(`${fileKey} not found`); } @@ -172,9 +172,9 @@ export class TemplateMemory { * @throws {Error} If the sheet with the given name does not exist. * @experimental This API is experimental and might change in future versions. */ - #getSheetPathByName(sheetName: string): string { + async #getSheetPathByName(sheetName: string): Promise { // Find the sheet - const workbookXml = this.#extractXmlFromSheet(this.#excelKeys.workbook); + const workbookXml = await this.#extractXmlFromSheet(this.#excelKeys.workbook); const sheetMatch = workbookXml.match(Utils.sheetMatch(sheetName)); if (!sheetMatch || !sheetMatch[1]) { @@ -182,7 +182,7 @@ export class TemplateMemory { } const rId = sheetMatch[1]; - const relsXml = this.#extractXmlFromSheet(this.#excelKeys.workbookRels); + const relsXml = await this.#extractXmlFromSheet(this.#excelKeys.workbookRels); const relMatch = relsXml.match(Utils.relationshipMatch(rId)); if (!relMatch || !relMatch[1]) { @@ -243,13 +243,13 @@ export class TemplateMemory { let sheetContent = ""; if (this.files[sharedStringsPath]) { - sharedStringsContent = this.#extractXmlFromSheet(sharedStringsPath); + sharedStringsContent = await this.#extractXmlFromSheet(sharedStringsPath); } - const sheetPath = this.#getSheetPathByName(sheetName); + const sheetPath = await this.#getSheetPathByName(sheetName); if (this.files[sheetPath]) { - sheetContent = this.#extractXmlFromSheet(sheetPath); + sheetContent = await this.#extractXmlFromSheet(sheetPath); const TABLE_REGEX = /\$\{table:([a-zA-Z0-9_]+)\.([a-zA-Z0-9_]+)\}/g; @@ -293,12 +293,12 @@ export class TemplateMemory { * @throws {Error} If no sheets are found to merge. * @experimental This API is experimental and might change in future versions. */ - #mergeSheets(data: { + async #mergeSheets(data: { additions: { sheetIndexes?: number[]; sheetNames?: string[] }; baseSheetIndex?: number; baseSheetName?: string; gap?: number; - }): void { + }): Promise { const { additions, baseSheetIndex = 1, @@ -309,7 +309,7 @@ export class TemplateMemory { let fileKey: string = ""; if (baseSheetName) { - fileKey = this.#getSheetPathByName(baseSheetName); + fileKey = await this.#getSheetPathByName(baseSheetName); } if (baseSheetIndex && !fileKey) { @@ -329,7 +329,7 @@ export class TemplateMemory { mergeCells: baseMergeCells, rows: baseRows, xml, - } = this.#extractRowsFromSheet(fileKey); + } = await this.#extractRowsFromSheet(fileKey); const allRows = [...baseRows]; const allMergeCells = [...baseMergeCells]; @@ -338,11 +338,11 @@ export class TemplateMemory { const sheetPaths: string[] = []; if (additions.sheetIndexes) { - sheetPaths.push(...(additions.sheetIndexes).map(e => this.#getSheetPathById(e))); + sheetPaths.push(...(await Promise.all(additions.sheetIndexes.map(e => this.#getSheetPathById(e))))); } if (additions.sheetNames) { - sheetPaths.push(...(additions.sheetNames).map(e => this.#getSheetPathByName(e))); + sheetPaths.push(...(await Promise.all(additions.sheetNames.map(e => this.#getSheetPathByName(e))))); } if (sheetPaths.length === 0) { @@ -354,7 +354,7 @@ export class TemplateMemory { throw new Error(`Sheet "${sheetPath}" not found`); } - const { mergeCells, rows } = Xml.extractRowsFromSheet(this.files[sheetPath]); + const { mergeCells, rows } = await Xml.extractRowsFromSheet(this.files[sheetPath]); const shiftedRows = Xml.shiftRowIndices(rows, currentRowOffset); @@ -462,7 +462,7 @@ export class TemplateMemory { // Read workbook.xml and find the source sheet const workbookXmlPath = this.#excelKeys.workbook; - const workbookXml = this.#extractXmlFromSheet(this.#excelKeys.workbook); + const workbookXml = await this.#extractXmlFromSheet(this.#excelKeys.workbook); // Find the source sheet const sheetMatch = workbookXml.match(Utils.sheetMatch(sourceName)); @@ -480,7 +480,7 @@ export class TemplateMemory { // Find the source sheet path by rId const rId = sheetMatch[1]; const relsXmlPath = this.#excelKeys.workbookRels; - const relsXml = this.#extractXmlFromSheet(this.#excelKeys.workbookRels); + const relsXml = await this.#extractXmlFromSheet(this.#excelKeys.workbookRels); const relMatch = relsXml.match(Utils.relationshipMatch(rId)); if (!relMatch || !relMatch[1]) { @@ -541,7 +541,7 @@ export class TemplateMemory { // Read [Content_Types].xml // Update [Content_Types].xml const contentTypesPath = "[Content_Types].xml"; - const contentTypesXml = this.#extractXmlFromSheet(contentTypesPath); + const contentTypesXml = await this.#extractXmlFromSheet(contentTypesPath); const overrideTag = ``; const updatedContentTypesXml = contentTypesXml.replace( "", @@ -613,8 +613,8 @@ export class TemplateMemory { Utils.checkRows(preparedRows); // Find the sheet - const sheetPath = this.#getSheetPathByName(sheetName); - const sheetXml = this.#extractXmlFromSheet(sheetPath); + const sheetPath = await this.#getSheetPathByName(sheetName); + const sheetXml = await this.#extractXmlFromSheet(sheetPath); let nextRow = 0; @@ -693,8 +693,8 @@ export class TemplateMemory { if (!sheetName) throw new Error("Sheet name is required"); // Read XML workbook to find sheet name and path - const sheetPath = this.#getSheetPathByName(sheetName); - const sheetXml = this.#extractXmlFromSheet(sheetPath); + const sheetPath = await this.#getSheetPathByName(sheetName); + const sheetXml = await this.#extractXmlFromSheet(sheetPath); const output = new MemoryWriteStream(); diff --git a/src/lib/xml/build-merged-sheet.unit.spec.ts b/src/lib/xml/build-merged-sheet.unit.spec.ts new file mode 100644 index 0000000..482ee5d --- /dev/null +++ b/src/lib/xml/build-merged-sheet.unit.spec.ts @@ -0,0 +1,78 @@ +import { describe, expect, it } from "vitest"; + +import { buildMergedSheet } from "./build-merged-sheet.js"; + +describe("buildMergedSheet", () => { + it("should merge rows into sheet XML", () => { + const originalXml = ` + + + 1 + + `; + + const mergedRows = [ + "1", + "2", + ]; + + const result = buildMergedSheet(originalXml, mergedRows); + const resultStr = result.toString(); + + expect(resultStr).toContain("1"); + expect(resultStr).toContain("2"); + expect(resultStr).toContain(""); + expect(resultStr).toContain(""); + expect(resultStr).not.toContain(" { + const originalXml = ` + + + 1 + + `; + + const mergedRows = [ + "1", + ]; + + const mergeCells = [ + { ref: "A1:B1" }, + { ref: "C1:D1" }, + ]; + + const result = buildMergedSheet(originalXml, mergedRows, mergeCells); + const resultStr = result.toString(); + + expect(resultStr).toContain(""); + expect(resultStr).toContain(""); + expect(resultStr).toContain(""); + }); + + it("should replace existing merge cells", () => { + const originalXml = ` + + + 1 + + + `; + + const mergedRows = [ + "1", + ]; + + const mergeCells = [ + { ref: "C1:D1" }, + ]; + + const result = buildMergedSheet(originalXml, mergedRows, mergeCells); + const resultStr = result.toString(); + + expect(resultStr).toContain(""); + expect(resultStr).toContain(""); + expect(resultStr).not.toContain(""); + }); +}); diff --git a/src/lib/xml/extract-rows-from-sheet-sync.ts b/src/lib/xml/extract-rows-from-sheet-sync.ts new file mode 100644 index 0000000..2b8fd6a --- /dev/null +++ b/src/lib/xml/extract-rows-from-sheet-sync.ts @@ -0,0 +1,78 @@ +import { extractXmlFromSheetSync } from "./extract-xml-from-sheet-sync.js"; + +/** + * Parses a worksheet (either as Buffer or string) to extract row data, + * last row number, and merge cell information from Excel XML format. + * + * This function is particularly useful for processing Excel files in + * Open XML Spreadsheet format (.xlsx). + * + * @param {Buffer|string} sheet - The worksheet content to parse, either as: + * - Buffer (binary Excel sheet) + * - string (raw XML content) + * @returns {{ + * rows: string[], + * lastRowNumber: number, + * mergeCells: {ref: string}[] + * }} An object containing: + * - rows: Array of raw XML strings for each element + * - lastRowNumber: Highest row number found in the sheet (1-based) + * - mergeCells: Array of merged cell ranges (e.g., [{ref: "A1:B2"}]) + * @throws {Error} If the sheetData section is not found in the XML + */ +export function extractRowsFromSheetSync(sheet: Buffer | string): { + rows: string[]; + lastRowNumber: number; + mergeCells: { ref: string }[]; + xml: string; +} { + // Convert Buffer input to XML string if needed + const xml = typeof sheet === "string" + ? sheet + : extractXmlFromSheetSync(sheet); + + // Extract the sheetData section containing all rows + const sheetDataMatch = xml.match(/]*>([\s\S]*?)<\/sheetData>/); + if (!sheetDataMatch) { + throw new Error("sheetData not found in worksheet XML"); + } + + const sheetDataContent = sheetDataMatch[1] || ""; + + // Extract all elements using regex + const rowMatches = [...sheetDataContent.matchAll(/]*\/>|]*>[\s\S]*?<\/row>/g)]; + const rows = rowMatches.map(match => match[0]); + + // Calculate the highest row number present in the sheet + const lastRowNumber = rowMatches + .map(match => { + // Extract row number from r="..." attribute (1-based) + const rowNumMatch = match[0].match(/r="(\d+)"/); + return rowNumMatch?.[1] ? parseInt(rowNumMatch[1], 10) : null; + }) + .filter((row): row is number => row !== null) // Type guard to filter out nulls + .reduce((max, current) => Math.max(max, current), 0); // Find maximum row number + + // Extract all merged cell ranges from the worksheet + const mergeCells: { ref: string }[] = []; + const mergeCellsMatch = xml.match(/]*>([\s\S]*?)<\/mergeCells>/); + + if (mergeCellsMatch) { + // Find all mergeCell entries with ref attributes + const mergeCellMatches = mergeCellsMatch[1]?.match(/]+ref="([^"]+)"[^>]*>/g) || []; + + mergeCellMatches.forEach(match => { + const refMatch = match.match(/ref="([^"]+)"/); + if (refMatch?.[1]) { + mergeCells.push({ ref: refMatch[1] }); // Store the cell range (e.g., "A1:B2") + } + }); + } + + return { + lastRowNumber, + mergeCells, + rows, + xml, + }; +} diff --git a/src/lib/xml/extract-rows-from-sheet-sync.unit.spec.ts b/src/lib/xml/extract-rows-from-sheet-sync.unit.spec.ts new file mode 100644 index 0000000..09f8fca --- /dev/null +++ b/src/lib/xml/extract-rows-from-sheet-sync.unit.spec.ts @@ -0,0 +1,64 @@ +import { describe, expect, it } from "vitest"; + +import { extractRowsFromSheetSync } from "./extract-rows-from-sheet-sync.js"; + +// Упрощённый шаблон XML-страницы Excel +const sampleSheet = ` + + + + 0 + 1 + 2 + + + + + + `; + +const noRowsSheet = ` + + + + `; + +const noSheetData = ` + + + `; + +describe("extractRowsFromSheet", () => { + it("extracts rows and mergeCells correctly", () => { + const result = extractRowsFromSheetSync(sampleSheet); + + expect(result.rows.length).toBe(3); + expect(result.lastRowNumber).toBe(5); + expect(result.rows[0]).toContain(""); + expect(result.rows[2]).toContain(""); + expect(result.mergeCells).toEqual([ + { ref: "A1:B1" }, + { ref: "A2:A3" }, + ]); + expect(result.xml).toContain(" { + const result = extractRowsFromSheetSync(noRowsSheet); + expect(result.rows).toEqual([]); + expect(result.lastRowNumber).toBe(0); + expect(result.mergeCells).toEqual([]); + }); + + it("throws an error if sheetData is not found", () => { + expect(() => extractRowsFromSheetSync(noSheetData)).toThrow("sheetData not found in worksheet XML"); + }); + + it("accepts Buffer input", () => { + const buffer = Buffer.from(sampleSheet, "utf-8"); + const result = extractRowsFromSheetSync(buffer); + + expect(result.rows.length).toBe(3); + expect(result.lastRowNumber).toBe(5); + }); +}); diff --git a/src/lib/xml/extract-rows-from-sheet.ts b/src/lib/xml/extract-rows-from-sheet.ts index 7e22fb8..21c1ba0 100644 --- a/src/lib/xml/extract-rows-from-sheet.ts +++ b/src/lib/xml/extract-rows-from-sheet.ts @@ -20,14 +20,16 @@ import { extractXmlFromSheet } from "./extract-xml-from-sheet.js"; * - mergeCells: Array of merged cell ranges (e.g., [{ref: "A1:B2"}]) * @throws {Error} If the sheetData section is not found in the XML */ -export function extractRowsFromSheet(sheet: Buffer | string): { +export async function extractRowsFromSheet(sheet: Buffer | string): Promise<{ rows: string[]; lastRowNumber: number; mergeCells: { ref: string }[]; xml: string; -} { +}> { // Convert Buffer input to XML string if needed - const xml = typeof sheet === "string" ? sheet : extractXmlFromSheet(sheet); + const xml = typeof sheet === "string" + ? sheet + : await extractXmlFromSheet(sheet); // Extract the sheetData section containing all rows const sheetDataMatch = xml.match(/]*>([\s\S]*?)<\/sheetData>/); diff --git a/src/lib/xml/extract-rows-from-sheet.unit.spec.ts b/src/lib/xml/extract-rows-from-sheet.unit.spec.ts new file mode 100644 index 0000000..8f7612d --- /dev/null +++ b/src/lib/xml/extract-rows-from-sheet.unit.spec.ts @@ -0,0 +1,64 @@ +import { describe, expect, it } from "vitest"; + +import { extractRowsFromSheet } from "./extract-rows-from-sheet"; + +// Упрощённый шаблон XML-страницы Excel +const sampleSheet = ` + + + + 0 + 1 + 2 + + + + + + `; + +const noRowsSheet = ` + + + + `; + +const noSheetData = ` + + + `; + +describe("extractRowsFromSheet", () => { + it("extracts rows and mergeCells correctly", async () => { + const result = await extractRowsFromSheet(sampleSheet); + + expect(result.rows.length).toBe(3); + expect(result.lastRowNumber).toBe(5); + expect(result.rows[0]).toContain(""); + expect(result.rows[2]).toContain(""); + expect(result.mergeCells).toEqual([ + { ref: "A1:B1" }, + { ref: "A2:A3" }, + ]); + expect(result.xml).toContain(" { + const result = await extractRowsFromSheet(noRowsSheet); + expect(result.rows).toEqual([]); + expect(result.lastRowNumber).toBe(0); + expect(result.mergeCells).toEqual([]); + }); + + it("throws an error if sheetData is not found", async () => { + await expect(extractRowsFromSheet(noSheetData)).rejects.toThrow("sheetData not found in worksheet XML"); + }); + + it("accepts Buffer input", async () => { + const buffer = Buffer.from(sampleSheet, "utf-8"); + const result = await extractRowsFromSheet(buffer); + + expect(result.rows.length).toBe(3); + expect(result.lastRowNumber).toBe(5); + }); +}); diff --git a/src/lib/xml/extract-xml-from-sheet-sync.ts b/src/lib/xml/extract-xml-from-sheet-sync.ts new file mode 100644 index 0000000..6d0bdf5 --- /dev/null +++ b/src/lib/xml/extract-xml-from-sheet-sync.ts @@ -0,0 +1,44 @@ +import { inflateRawSync } from "node:zlib"; + +/** + * Extracts and parses XML content from an Excel worksheet file (e.g., xl/worksheets/sheet1.xml). + * Handles both compressed (raw deflate) and uncompressed (plain XML) formats. + * + * This function is designed to work with Excel Open XML (.xlsx) worksheet files, + * which may be stored in either compressed or uncompressed format within the ZIP container. + * + * @param {Buffer} buffer - The file content to process, which may be: + * - Raw XML text + * - Deflate-compressed XML data (without zlib headers) + * @returns {string} - The extracted XML content as a UTF-8 string + * @throws {Error} - If the buffer is empty or cannot be processed + */ +export function extractXmlFromSheetSync(buffer: Buffer): string { + if (!buffer || buffer.length === 0) { + throw new Error("Empty buffer provided"); + } + + let xml: string; + + // Check if the buffer starts with an XML declaration (]/.test(head); + + if (isXml) { + // Case 1: Already uncompressed XML - convert directly to string + xml = buffer.toString("utf8"); + } else { + // Case 2: Attempt to decompress as raw deflate data + try { + xml = inflateRawSync(buffer).toString("utf8"); + } catch (err) { + throw new Error("Failed to decompress sheet XML: " + (err instanceof Error ? err.message : String(err))); + } + } + + // Sanitize XML by removing control characters (except tab, newline, carriage return) + // This handles potential corruption from binary data or encoding issues + xml = xml.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F]/g, ""); + + return xml; +} diff --git a/src/lib/xml/extract-xml-from-sheet-sync.unit.spec.ts b/src/lib/xml/extract-xml-from-sheet-sync.unit.spec.ts new file mode 100644 index 0000000..6fa7e68 --- /dev/null +++ b/src/lib/xml/extract-xml-from-sheet-sync.unit.spec.ts @@ -0,0 +1,63 @@ +import { describe, expect, it } from "vitest"; +import { deflateRawSync } from "node:zlib"; + +import { extractXmlFromSheetSync } from "./extract-xml-from-sheet-sync.js"; + +describe("extractXmlFromSheet", () => { + it("should handle empty buffer", () => { + expect(() => extractXmlFromSheetSync(Buffer.alloc(0))).toThrow("Empty buffer provided"); + }); + + it("should extract uncompressed XML", async () => { + const xml = ""; + const buffer = Buffer.from(xml); + expect(extractXmlFromSheetSync(buffer)).toBe(xml); + }); + + it("returns plain XML from uncompressed buffer", async () => { + const xml = "test"; + const buffer = Buffer.from(xml, "utf8"); + const result = extractXmlFromSheetSync(buffer); + expect(result).toBe(xml); + }); + + it("should decompress and extract deflated XML", async () => { + // This is a deflated version of: + const deflated = Buffer.from([ + 0x3c, 0x3f, 0x78, 0x6d, 0x6c, 0x20, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, + 0x6e, 0x3d, 0x22, 0x31, 0x2e, 0x30, 0x22, 0x3f, 0x3e, 0x3c, 0x77, 0x6f, + 0x72, 0x6b, 0x73, 0x68, 0x65, 0x65, 0x74, 0x3e, 0x3c, 0x73, 0x68, 0x65, + 0x65, 0x74, 0x44, 0x61, 0x74, 0x61, 0x3e, 0x3c, 0x2f, 0x73, 0x68, 0x65, + 0x65, 0x74, 0x44, 0x61, 0x74, 0x61, 0x3e, 0x3c, 0x2f, 0x77, 0x6f, 0x72, + 0x6b, 0x73, 0x68, 0x65, 0x65, 0x74, 0x3e, + ]); + + const expected = ""; + expect(extractXmlFromSheetSync(deflated)).toBe(expected); + }); + + it("should sanitize XML by removing control characters", async () => { + const xml = "\x00\x01\x02"; + const expected = ""; + expect(extractXmlFromSheetSync(Buffer.from(xml))).toBe(expected); + }); + + it("decompresses deflate-encoded XML buffer", async () => { + const xml = "42"; + const compressed = deflateRawSync(Buffer.from(xml, "utf8")); + const result = extractXmlFromSheetSync(compressed); + expect(result).toBe(xml); + }); + + it("throws on invalid non-XML and non-deflate data", async () => { + const garbage = Buffer.from([0xde, 0xad, 0xbe, 0xef]); + expect(() => extractXmlFromSheetSync(garbage)).toThrow(/Failed to decompress sheet XML/); + }); + + it("sanitizes control characters from XML", async () => { + const xml = "\x01valid"; + const buffer = Buffer.from(xml, "utf8"); + const result = extractXmlFromSheetSync(buffer); + expect(result).toBe("valid"); + }); +}); diff --git a/src/lib/xml/extract-xml-from-sheet.ts b/src/lib/xml/extract-xml-from-sheet.ts index 2f1aacb..f1c9f3f 100644 --- a/src/lib/xml/extract-xml-from-sheet.ts +++ b/src/lib/xml/extract-xml-from-sheet.ts @@ -1,4 +1,7 @@ -import { inflateRaw } from "pako"; +import util from "node:util"; +import zlib from "node:zlib"; + +const inflateRaw = util.promisify(zlib.inflateRaw); /** * Extracts and parses XML content from an Excel worksheet file (e.g., xl/worksheets/sheet1.xml). @@ -10,39 +13,32 @@ import { inflateRaw } from "pako"; * @param {Buffer} buffer - The file content to process, which may be: * - Raw XML text * - Deflate-compressed XML data (without zlib headers) - * @returns {string} - The extracted XML content as a UTF-8 string + * @returns {Promise} - The extracted XML content as a UTF-8 string * @throws {Error} - If the buffer is empty or cannot be processed */ -export function extractXmlFromSheet(buffer: Buffer): string { +export async function extractXmlFromSheet(buffer: Buffer): Promise { if (!buffer || buffer.length === 0) { throw new Error("Empty buffer provided"); } - let xml: string | undefined; + let xml: string; // Check if the buffer starts with an XML declaration (]/.test(head); - if (startsWithXml) { + if (isXml) { // Case 1: Already uncompressed XML - convert directly to string xml = buffer.toString("utf8"); } else { // Case 2: Attempt to decompress as raw deflate data - const inflated = inflateRaw(buffer, { to: "string" }); - - // Validate the decompressed content contains worksheet data - if (inflated && inflated.includes(" { + it("should handle empty buffer", async () => { + await expect(extractXmlFromSheet(Buffer.alloc(0))).rejects.toThrow("Empty buffer provided"); + }); + + it("should extract uncompressed XML", async () => { + const xml = ""; + const buffer = Buffer.from(xml); + expect(await extractXmlFromSheet(buffer)).toBe(xml); + }); + + it("returns plain XML from uncompressed buffer", async () => { + const xml = "test"; + const buffer = Buffer.from(xml, "utf8"); + const result = await extractXmlFromSheet(buffer); + expect(result).toBe(xml); + }); + + it("should decompress and extract deflated XML", async () => { + // This is a deflated version of: + const deflated = Buffer.from([ + 0x3c, 0x3f, 0x78, 0x6d, 0x6c, 0x20, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, + 0x6e, 0x3d, 0x22, 0x31, 0x2e, 0x30, 0x22, 0x3f, 0x3e, 0x3c, 0x77, 0x6f, + 0x72, 0x6b, 0x73, 0x68, 0x65, 0x65, 0x74, 0x3e, 0x3c, 0x73, 0x68, 0x65, + 0x65, 0x74, 0x44, 0x61, 0x74, 0x61, 0x3e, 0x3c, 0x2f, 0x73, 0x68, 0x65, + 0x65, 0x74, 0x44, 0x61, 0x74, 0x61, 0x3e, 0x3c, 0x2f, 0x77, 0x6f, 0x72, + 0x6b, 0x73, 0x68, 0x65, 0x65, 0x74, 0x3e, + ]); + + const expected = ""; + expect(await extractXmlFromSheet(deflated)).toBe(expected); + }); + + it("should sanitize XML by removing control characters", async () => { + const xml = "\x00\x01\x02"; + const expected = ""; + expect(await extractXmlFromSheet(Buffer.from(xml))).toBe(expected); + }); + + it("decompresses deflate-encoded XML buffer", async () => { + const xml = "42"; + const compressed = deflateRawSync(Buffer.from(xml, "utf8")); + const result = await extractXmlFromSheet(compressed); + expect(result).toBe(xml); + }); + + it("throws on invalid non-XML and non-deflate data", async () => { + const garbage = Buffer.from([0xde, 0xad, 0xbe, 0xef]); + await expect(extractXmlFromSheet(garbage)).rejects.toThrow(/Failed to decompress sheet XML/); + }); + + it("sanitizes control characters from XML", async () => { + const xml = "\x01valid"; + const buffer = Buffer.from(xml, "utf8"); + const result = await extractXmlFromSheet(buffer); + expect(result).toBe("valid"); + }); +}); diff --git a/src/lib/xml/extract-xml-from-system-content.ts b/src/lib/xml/extract-xml-from-system-content.ts deleted file mode 100644 index 624a468..0000000 --- a/src/lib/xml/extract-xml-from-system-content.ts +++ /dev/null @@ -1,54 +0,0 @@ -import { inflateRaw } from "pako"; - -/** - * Extracts and decompresses XML content from Excel system files (e.g., workbook.xml, [Content_Types].xml). - * Handles both compressed (raw DEFLATE) and uncompressed (plain XML) formats with comprehensive error handling. - * - * @param {Buffer} buffer - The file content to process, which may be: - * - Raw XML text - * - DEFLATE-compressed XML data (without zlib headers) - * @param {string} name - The filename being processed (for error reporting) - * @returns {string} - The extracted XML content as a sanitized UTF-8 string - * @throws {Error} - With descriptive messages for various failure scenarios: - * - Empty buffer - * - Decompression failures - * - Invalid XML content - */ -export const extractXmlFromSystemContent = (buffer: Buffer, name: string): string => { - // Validate input buffer - if (!buffer || buffer.length === 0) { - throw new Error(`Empty data buffer provided for file ${name}`); - } - - let xml: string; - - // Check for XML declaration in first 5 bytes ( { + it("shifts a single row and cell reference down", () => { + const input = [""]; + const expected = [""]; + expect(shiftRowIndices(input, 2)).toEqual(expected); + }); + + it("shifts multiple rows and cells up", () => { + const input = [ + "", + "", + ]; + const expected = [ + "", + "", + ]; + expect(shiftRowIndices(input, -2)).toEqual(expected); + }); + + it("returns original if offset is zero", () => { + const input = [""]; + expect(shiftRowIndices(input, 0)).toEqual(input); + }); + + it("handles mixed columns and multi-digit row numbers", () => { + const input = [""]; + const expected = [""]; + expect(shiftRowIndices(input, 3)).toEqual(expected); + }); + + it("does not modify unrelated attributes", () => { + const input = [""]; + const expected = [""]; + expect(shiftRowIndices(input, 2)).toEqual(expected); + }); +}); diff --git a/src/test/template-fs/test-02.integration.spec.ts b/src/test/template-fs/test-02.integration.spec.ts index 5c91411..8dfe935 100644 --- a/src/test/template-fs/test-02.integration.spec.ts +++ b/src/test/template-fs/test-02.integration.spec.ts @@ -56,7 +56,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const rebuiltXml = Xml.extractRowsFromSheet(sheet1Rebuilt); + const rebuiltXml = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(rebuiltXml.rows).toEqual([ "12345", @@ -109,7 +109,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const rebuiltXml = Xml.extractRowsFromSheet(sheet1Rebuilt); + const rebuiltXml = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(rebuiltXml.rows).toEqual([ "12345", diff --git a/src/test/template-fs/test-03.integration.spec.ts b/src/test/template-fs/test-03.integration.spec.ts index 1c88bbb..b8a6b9f 100644 --- a/src/test/template-fs/test-03.integration.spec.ts +++ b/src/test/template-fs/test-03.integration.spec.ts @@ -59,7 +59,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", @@ -126,7 +126,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", diff --git a/src/test/template-fs/test-04.integration.spec.ts b/src/test/template-fs/test-04.integration.spec.ts index d19092a..b7aa770 100644 --- a/src/test/template-fs/test-04.integration.spec.ts +++ b/src/test/template-fs/test-04.integration.spec.ts @@ -69,7 +69,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", @@ -138,7 +138,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", diff --git a/src/test/template-fs/test-05.integration.spec.ts b/src/test/template-fs/test-05.integration.spec.ts index c3c0512..97e05ba 100644 --- a/src/test/template-fs/test-05.integration.spec.ts +++ b/src/test/template-fs/test-05.integration.spec.ts @@ -56,7 +56,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", @@ -115,7 +115,7 @@ describe("TemplateFs integration test", () => { // find new rows in the rebuilt zip file const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", diff --git a/src/test/template-fs/test-06.integration.spec.ts b/src/test/template-fs/test-06.integration.spec.ts index 4146a0b..cccdd72 100644 --- a/src/test/template-fs/test-06.integration.spec.ts +++ b/src/test/template-fs/test-06.integration.spec.ts @@ -75,8 +75,8 @@ describe("TemplateFs integration test", () => { const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); const sheet2Rebuilt = rebuiltZip["xl/worksheets/sheet2.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); - const sheet2RowsData = Xml.extractRowsFromSheet(sheet2Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet2RowsData = await Xml.extractRowsFromSheet(sheet2Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345", @@ -167,8 +167,8 @@ describe("TemplateFs integration test", () => { const sheet1Rebuilt = rebuiltZip["xl/worksheets/sheet1.xml"].toString(); const sheet2Rebuilt = rebuiltZip["xl/worksheets/sheet2.xml"].toString(); - const sheet1RowsData = Xml.extractRowsFromSheet(sheet1Rebuilt); - const sheet2RowsData = Xml.extractRowsFromSheet(sheet2Rebuilt); + const sheet1RowsData = await Xml.extractRowsFromSheet(sheet1Rebuilt); + const sheet2RowsData = await Xml.extractRowsFromSheet(sheet2Rebuilt); expect(sheet1RowsData.rows).toEqual([ "12345",