-
Notifications
You must be signed in to change notification settings - Fork 441
feat: support stream read large-file reduce memory usage #4666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Walkthrough新增编辑器偏好 Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant FileServiceClient
participant PreferenceService
participant Provider
participant StreamReader
Client->>FileServiceClient: readFile(uri)
FileServiceClient->>PreferenceService: get(editor.streamLargeFile, editor.largeFile)
PreferenceService-->>FileServiceClient: prefs
FileServiceClient->>Provider: stat(uri)
Provider-->>FileServiceClient: statResult
alt use stream (enabled && size >= threshold && provider supports stream)
FileServiceClient->>Provider: readFileStream(uri)
Provider-->>StreamReader: IReadableStream<Uint8Array>
StreamReader-->>FileServiceClient: chunks (async)
FileServiceClient->>FileServiceClient: collectReadableStream -> merge Uint8Array
FileServiceClient-->>Client: content (from stream)
else fallback (unsupported/disabled/stream error)
FileServiceClient->>Provider: readFile(uri)
Provider-->>FileServiceClient: full content
FileServiceClient-->>Client: content (from readFile)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
packages/i18n/src/common/zh-CN.lang.ts (1)
521-521: 中文文案可微调以更自然建议将“ 大文件使用流式读取 ”改为更简洁的“ 大文件流式读取 ”;描述保持不变即可。仅样式优化,非功能问题。
- 'preference.editor.streamLargeFile': '大文件使用流式读取', + 'preference.editor.streamLargeFile': '大文件流式读取',Also applies to: 1101-1102
packages/i18n/src/common/en-US.lang.ts (1)
569-569: 润色英文文案以符合英语习惯建议更自然的表达,统一复数和介词,便于用户理解。
- 'preference.editor.streamLargeFile': 'Use Stream For Large File', + 'preference.editor.streamLargeFile': 'Stream Large Files', @@ - 'editor.configuration.streamLargeFile': - 'Read large files via stream when file size exceeds the configured threshold.', + 'editor.configuration.streamLargeFile': + 'Read large files via streaming when the file size exceeds the configured threshold.',Also applies to: 611-613
packages/file-service/src/browser/file-service-client.ts (2)
610-623: 降低流式拼接的峰值内存(预分配/增量写入)当前 collectReadableStream 先聚合再一次性合并,峰值内存接近 2x。已在 shouldUseReadStream 中拿到 stat.size,可用该值预分配目标缓冲并随到随写,避免保存所有分片,显著降低峰值内存和一次性分配压力。保底仍可退回“聚合+合并”路径。
- private async doReadFile(provider: FileSystemProvider, uri: Uri) { - const shouldStream = await this.shouldUseReadStream(provider, uri); - if (shouldStream && provider.readFileStream) { - try { - const stream = await provider.readFileStream(uri); - return await this.collectReadableStream(stream); - } catch (error) { - this.logger?.warn('[FileServiceClient] readFileStream failed, fallback to readFile.', error); - } - } - return await provider.readFile(uri); - } + private async doReadFile(provider: FileSystemProvider, uri: Uri) { + const decision = await this.shouldUseReadStream(provider, uri); + if (decision.useStream && provider.readFileStream) { + try { + const stream = await provider.readFileStream(uri); + return await this.collectReadableStream(stream, decision.expectedSize); + } catch (error) { + this.logger?.warn('[FileServiceClient] readFileStream failed, fallback to readFile.', error); + } + } + return await provider.readFile(uri); + } - private async shouldUseReadStream(provider: FileSystemProvider, uri: Uri): Promise<boolean> { - if (!provider.readFileStream || !this.isLargeFileStreamEnabled()) { - return false; - } - const threshold = this.getLargeFileStreamThreshold(); - if (!threshold) { - return false; - } - try { - const stat = await provider.stat(uri); - if (stat && typeof stat.size === 'number' && stat.size >= threshold) { - return true; - } - } catch (error) { - this.logger?.warn( - '[FileServiceClient] stat failed when deciding readFile strategy, fallback to readFile.', - error, - ); - } - return false; - } + private async shouldUseReadStream( + provider: FileSystemProvider, + uri: Uri, + ): Promise<{ useStream: boolean; expectedSize?: number }> { + if (!provider.readFileStream || !this.isLargeFileStreamEnabled()) { + return { useStream: false }; + } + const threshold = this.getLargeFileStreamThreshold(); + if (!threshold) { + return { useStream: false }; + } + try { + const stat = await provider.stat(uri); + if (stat && typeof stat.size === 'number' && stat.size >= threshold) { + return { useStream: true, expectedSize: stat.size }; + } + } catch (error) { + this.logger?.warn( + '[FileServiceClient] stat failed when deciding readFile strategy, fallback to readFile.', + error, + ); + } + return { useStream: false }; + } - private collectReadableStream(stream: IReadableStream<Uint8Array>): Promise<Uint8Array> { - return new Promise((resolve, reject) => { - const chunks: Uint8Array[] = []; - let totalLength = 0; - listenReadable(stream, { - onData: (chunk) => { - const data = chunk instanceof Uint8Array ? chunk : Uint8Array.from(chunk as any); - chunks.push(data); - totalLength += data.byteLength; - }, - onError: (error) => reject(error), - onEnd: () => { - if (chunks.length === 0) { - resolve(new Uint8Array(0)); - return; - } - if (chunks.length === 1) { - resolve(chunks[0]); - return; - } - const merged = new Uint8Array(totalLength); - let offset = 0; - for (const chunk of chunks) { - merged.set(chunk, offset); - offset += chunk.byteLength; - } - resolve(merged); - }, - }); - }); - } + private collectReadableStream( + stream: IReadableStream<Uint8Array>, + expectedSize?: number, + ): Promise<Uint8Array> { + return new Promise((resolve, reject) => { + if (typeof expectedSize === 'number' && expectedSize > 0) { + let buf = new Uint8Array(expectedSize); + let offset = 0; + listenReadable(stream, { + onData: (chunk) => { + const data = chunk instanceof Uint8Array ? chunk : Uint8Array.from(chunk as any); + // 自动扩容以应对 size 变化 + if (offset + data.byteLength > buf.byteLength) { + const next = new Uint8Array(Math.max(buf.byteLength * 2, offset + data.byteLength)); + next.set(buf, 0); + buf = next; + } + buf.set(data, offset); + offset += data.byteLength; + }, + onError: (error) => reject(error), + onEnd: () => resolve(buf.subarray(0, offset)), + }); + return; + } + // 无期望长度,退回聚合+合并 + const chunks: Uint8Array[] = []; + let total = 0; + listenReadable(stream, { + onData: (chunk) => { + const data = chunk instanceof Uint8Array ? chunk : Uint8Array.from(chunk as any); + chunks.push(data); + total += data.byteLength; + }, + onError: (error) => reject(error), + onEnd: () => { + if (chunks.length === 0) return resolve(new Uint8Array(0)); + if (chunks.length === 1) return resolve(chunks[0]); + const merged = new Uint8Array(total); + let off = 0; + for (const c of chunks) { + merged.set(c, off); + off += c.byteLength; + } + resolve(merged); + }, + }); + }); + }Also applies to: 624-647, 649-682
692-695: 简化布尔默认值获取(非功能性清理)getValid 已提供默认值,后续 typeof 判断可省略,代码更简洁。
- private isLargeFileStreamEnabled(): boolean { - const enabled = this.preference?.getValid<boolean>('editor.streamLargeFile', true); - return typeof enabled === 'undefined' ? true : enabled; - } + private isLargeFileStreamEnabled(): boolean { + return this.preference?.getValid<boolean>('editor.streamLargeFile', true) ?? true; + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
packages/editor/src/browser/preference/schema.ts(1 hunks)packages/file-service/src/browser/file-service-client.ts(4 hunks)packages/i18n/src/common/en-US.lang.ts(2 hunks)packages/i18n/src/common/zh-CN.lang.ts(2 hunks)packages/preferences/src/browser/preference-settings.service.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-30T12:55:07.459Z
Learnt from: weiwen111
Repo: opensumi/core PR: 4139
File: packages/core-browser/src/window/window.service.ts:0-0
Timestamp: 2024-10-30T12:55:07.459Z
Learning: 在 `packages/core-browser/src/window/window.service.ts` 中,`workspace` 可能是以 `http` 开头的 URI,此时 `fsPath` 方法不可用。需要根据 `workspaceUri.scheme` 来判断并分别处理不同协议的 URI。
Applied to files:
packages/file-service/src/browser/file-service-client.ts
🧬 Code graph analysis (1)
packages/file-service/src/browser/file-service-client.ts (3)
packages/utils/src/buffer.ts (1)
BinaryBuffer(20-136)packages/core-common/src/types/file.ts (1)
FileSystemProvider(135-258)packages/utils/src/stream.ts (2)
IReadableStream(7-12)listenReadable(24-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: unittest (ubuntu-latest, 18.x, jsdom)
- GitHub Check: unittest (ubuntu-latest, 18.x, node)
- GitHub Check: unittest (macos-latest, 18.x, node)
- GitHub Check: unittest (macos-latest, 18.x, jsdom)
- GitHub Check: build (ubuntu-latest, 20.x)
- GitHub Check: build (macos-latest, 20.x)
- GitHub Check: ubuntu-latest, Node.js 20.x
🔇 Additional comments (5)
packages/editor/src/browser/preference/schema.ts (1)
1658-1662: 新增开关项与实现一致,默认值合理editor.streamLargeFile 的 schema 定义正确,默认值与 FileServiceClient 中的 getValid 默认为 true 一致,放置在 editor.largeFile 之后也符合预期。LGTM。
packages/preferences/src/browser/preference-settings.service.ts (1)
733-734: 设置面板曝光位置与 schema 一致在 editor.largeFile 之后加入 editor.streamLargeFile,且本地化 key 存在。LGTM。
packages/file-service/src/browser/file-service-client.ts (3)
169-171: 注入首选项服务用于开关与阈值读取依赖注入合理,为按偏好切换流式读取提供基础。LGTM。
190-191: 数据包装健壮性改进到位BinaryBuffer.wrap 兼容 Uint8Array 与数组输入,避免 provider 差异导致的崩溃。LGTM。
197-201: readFile 统一走策略函数并保留旧返回形态先尝试流式,失败回退 readFile,且保持 { content: BinaryBuffer } 的对外契约。LGTM。
|
/next |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4666 +/- ##
==========================================
+ Coverage 52.60% 52.62% +0.02%
==========================================
Files 1686 1686
Lines 104515 104568 +53
Branches 22708 22733 +25
==========================================
+ Hits 54984 55034 +50
- Misses 41151 41153 +2
- Partials 8380 8381 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🎉 PR Next publish successful! 3.9.1-next-1762780708.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/file-service/__tests__/browser/file-service-client.test.ts(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/file-service/__tests__/browser/file-service-client.test.ts (2)
packages/utils/src/stream.ts (1)
SumiReadableStream(60-117)packages/core-common/src/types/file.ts (1)
FileStat(14-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build (macos-latest, 20.x)
- GitHub Check: build (ubuntu-latest, 20.x)
- GitHub Check: ubuntu-latest, Node.js 20.x
- GitHub Check: unittest (macos-latest, 18.x, jsdom)
- GitHub Check: unittest (ubuntu-latest, 18.x, jsdom)
- GitHub Check: unittest (macos-latest, 18.x, node)
- GitHub Check: unittest (ubuntu-latest, 18.x, node)
🔇 Additional comments (3)
packages/file-service/__tests__/browser/file-service-client.test.ts (3)
18-46: Mock 辅助函数设计良好测试辅助函数结构清晰:
createMockStreamProvider提供了必要的流式读取能力模拟createReadableStream使用setTimeout(0)异步发射数据块,模拟真实的流式行为createStreamStat创建了最小化但完整的 FileStat 对象这些辅助函数为测试提供了良好的基础设施。
54-99: 测试基础设施配置正确PreferenceService 和流式提供者的集成配置正确:
- PreferenceService mock 通过
getValid方法模拟偏好设置- 依赖注入注册遵循现有模式
- 流式提供者正确注册到 'stream' 协议
256-276: 测试套件组织结构清晰测试套件设计合理:
configureStreamPreferences辅助函数提供了清晰的配置 APIbeforeEach钩子正确重置所有 mock,避免测试间污染- 偏好设置 mock 实现根据不同的键返回对应值,逻辑正确
|
/next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/file-service/__tests__/browser/file-service-client.test.ts (1)
335-346: 已补充文件大小低于阈值的测试场景此测试用例验证了当文件大小小于阈值时,即使启用了流式读取偏好也会使用常规 readFile 路径,符合之前评审建议的要求。
🧹 Nitpick comments (1)
packages/file-service/__tests__/browser/file-service-client.test.ts (1)
278-291: 可选优化:移除未使用的 mock 配置第 284 行对
readFile的 mock 配置在此测试中未被使用(第 290 行断言其未被调用),可考虑移除以提高测试代码的清晰度。不过保留它作为防御性配置也是可以接受的。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/file-service/__tests__/browser/file-service-client.test.ts(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/file-service/__tests__/browser/file-service-client.test.ts (3)
packages/utils/src/stream.ts (1)
SumiReadableStream(60-117)packages/core-common/src/types/file.ts (2)
FileStat(14-70)IFileServiceClient(12-12)packages/file-service/src/common/file-service-client.ts (2)
IFileServiceClient(29-29)IFileServiceClient(31-122)
🔇 Additional comments (7)
packages/file-service/__tests__/browser/file-service-client.test.ts (7)
18-28: Mock 提供者结构完善流式提供者的 mock 结构完整,包含了所有必要的方法和能力标志,便于后续测试场景的灵活配置。
30-46: 辅助函数设计合理流创建和状态对象构造的辅助函数简洁清晰,异步行为模拟恰当,提升了测试用例的可读性。
54-99: 测试设置扩展得当PreferenceService 和流式提供者的集成符合现有测试架构模式,mock 配置满足测试需求。
256-276: 测试套件结构清晰通过独立的 describe 块和配置辅助函数组织测试,beforeEach 确保了测试隔离,整体设计规范。
293-304: 边界条件测试到位此测试用例验证了文件大小恰好等于阈值时也会触发流式读取,确保边界逻辑正确。
306-320: 错误回退逻辑验证完备正确测试了流式读取失败时回退到常规 readFile 的场景,确保了容错能力。
322-333: 偏好设置控制验证准确测试确认了即使文件大小超过阈值,偏好设置禁用时也不会使用流式读取,体现了配置优先级。
|
🎉 PR Next publish successful! 3.9.1-next-1762830662.0 |
|
/next |
|
🎉 PR Next publish successful! 3.9.1-next-1763090659.0 |
Ricbet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
66f9859 to
c986c87
Compare
|
/next |
|
🎉 PR Next publish successful! 3.9.1-next-1763369301.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/file-service/__tests__/browser/file-service-client.test.ts (1)
287-378: 大文件流式读取测试覆盖充分,建议可选补充 stat 失败场景这一组用例已经很好地覆盖了主要分支:
- 开关开启且
size > threshold:走readFileStream,并验证 chunk 合并逻辑。size === threshold:确认边界值也走流式路径。- 流式读取抛错:验证回退到
readFile。- 开关关闭:即使超过阈值也走
readFile。- 低于阈值:即使开启流式也走
readFile。如果还想把行为验证得更“闭环”,可以额外添加一个 stat 抛错的用例,例如让
streamProvider.statmockRejectedValue(new Error('stat error')),然后断言:
readFileStream不会被调用;readFile被调用一次;- 结果内容为
readFile的返回值。这样就能覆盖实现里
stat失败时的回退分支,既能防止未来重构时误删日志/回退逻辑,也能明确说明“stat 失败不会阻塞读文件”这一约定。packages/file-service/src/browser/file-service-client.ts (1)
627-712: 大文件流式读取决策与回退逻辑清晰,建议抽取/对齐默认阈值定义
doReadFile/shouldUseReadStream/collectReadableStream这一组逻辑整体设计合理:
- 先通过
shouldUseReadStream检查:
- provider 是否实现
readFileStream;- 用户是否开启
editor.streamLargeFile;editor.largeFile阈值是否为有效正数;stat.size >= threshold时才走流式路径。- 流式路径中用
collectReadableStream聚合 chunk,处理了空/单 chunk 的快速路径,并在出错时reject。readFileStream或stat失败时都会记录 warn 日志并回退到readFile,不会让大文件读取“直接失败”。有两点可以考虑优化(建议级,不是必须改):
默认阈值复用问题(Line 702-707)
getLargeFileStreamThreshold里直接写死了4 * 1024 * 1024 * 1024,建议确认该值是否与 editor 配置 schema 中editor.largeFile的默认值完全一致。如果存在多处硬编码:
- 更推荐抽成一个共享常量,或
- 尽量只在 schema 定义默认值,这里调用
getValid时让 PreferenceService 自行回落。这样可以避免未来只改 schema 而忘记这里,导致 UI 上显示的默认值和实际行为不一致。
阈值为 0/非法值时的语义说明
当前实现中,threshold <= 0会返回undefined,从而禁用流式路径,这个行为本身没问题。可以在注释里简单说明“非正数阈值等价于关闭按大小触发的流式读取”,避免后续维护者误以为是 bug。如果以上两点在其他文件(例如 editor schema)里已经处理或对齐了,可以保持现状。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/editor/src/browser/preference/schema.ts(1 hunks)packages/file-service/__tests__/browser/file-service-client.test.ts(3 hunks)packages/file-service/src/browser/file-service-client.ts(4 hunks)packages/i18n/src/common/en-US.lang.ts(2 hunks)packages/i18n/src/common/zh-CN.lang.ts(2 hunks)packages/preferences/src/browser/preference-settings.service.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/i18n/src/common/zh-CN.lang.ts
- packages/i18n/src/common/en-US.lang.ts
- packages/editor/src/browser/preference/schema.ts
- packages/preferences/src/browser/preference-settings.service.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-30T12:55:07.459Z
Learnt from: weiwen111
Repo: opensumi/core PR: 4139
File: packages/core-browser/src/window/window.service.ts:0-0
Timestamp: 2024-10-30T12:55:07.459Z
Learning: 在 `packages/core-browser/src/window/window.service.ts` 中,`workspace` 可能是以 `http` 开头的 URI,此时 `fsPath` 方法不可用。需要根据 `workspaceUri.scheme` 来判断并分别处理不同协议的 URI。
Applied to files:
packages/file-service/src/browser/file-service-client.ts
🧬 Code graph analysis (2)
packages/file-service/__tests__/browser/file-service-client.test.ts (3)
packages/utils/src/stream.ts (1)
SumiReadableStream(60-117)packages/core-common/src/types/file.ts (2)
FileStat(14-70)IFileServiceClient(12-12)packages/file-service/src/common/file-service-client.ts (2)
IFileServiceClient(29-29)IFileServiceClient(31-124)
packages/file-service/src/browser/file-service-client.ts (3)
packages/utils/src/buffer.ts (1)
BinaryBuffer(20-136)packages/core-common/src/types/file.ts (1)
FileSystemProvider(135-258)packages/utils/src/stream.ts (2)
IReadableStream(7-12)listenReadable(24-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: ubuntu-latest, Node.js 20.x
- GitHub Check: build (ubuntu-latest, 20.x)
- GitHub Check: build (macos-latest, 20.x)
- GitHub Check: unittest (ubuntu-latest, 18.x, node)
- GitHub Check: unittest (ubuntu-latest, 18.x, jsdom)
- GitHub Check: unittest (macos-latest, 18.x, jsdom)
- GitHub Check: unittest (macos-latest, 18.x, node)
🔇 Additional comments (2)
packages/file-service/__tests__/browser/file-service-client.test.ts (1)
5-53: 流式 provider 测试脚手架与偏好注入方式整体设计良好这里通过:
- 使用
createMockStreamProvider提供stat/readFile/readFileStreamstub;- 注入
PreferenceService并只 mockgetValid;- 在
beforeAll中注册额外的streamscheme provider;整体把生产环境的依赖形态在单测里还原得比较完整,可读性和可维护性都不错,目前看不需要额外调整。
Also applies to: 61-67, 94-97, 105-106
packages/file-service/src/browser/file-service-client.ts (1)
189-205: 对 provider 返回数据的统一包装改动是正向的
resolveContent和readFile中通过:const data = (rawContent as any).data || rawContent; const buffer = BinaryBuffer.wrap(data instanceof Uint8Array ? data : Uint8Array.from(data));让实现同时兼容:
- 直接返回
Uint8Array/Buffer的 provider;- 返回
{ data: ... }结构或普通 array-like 的 provider;这在不改变外部调用签名的前提下,提高了对不同 provider 实现的容错性,属于很好的兼容性增强。
Types
Background:
RPC and honoring non-configurable thresholds. This caused frequent timeouts (e.g. setContent /
getContent) and gave users no way to disable the new stream path.
Solution:
Memory Usage (Before)
Memory Usage (After)
Changelog:
Summary by CodeRabbit
新功能
测试