Skip to content

Conversation

@siddharthbaleja7
Copy link

Description

This PR implements CAR file detection in the add command to prevent users from accidentally wrapping a CAR file in another UnixFS DAG.

As suggested in #199, it uses @ipld/car/decoder to efficiently check the file header. If a CAR file is detected, a warning is displayed suggesting the import command instead.

Changes

  • Added src/utils/car-detection.ts utility using createDecoder and asyncIterableReader.
  • Updated src/commands/add.ts to use the utility and warn the user.
  • Added unit tests in src/test/unit/car-detection.test.ts.

Verification

  • Automated Tests: Added unit tests covering valid CAR, text files, and binary files.
  • Manual Verification: Verified CLI output with a valid CAR file:
▲  Warning: You are adding a CAR file. Did you mean to 'import' it?
│  'add' wraps the file in a new UnixFS DAG.
│  To import existing CAR data, use 'filecoin-pin import'.

Comment on lines +35 to +38
const stream = createWriteStream(textFilePath)
stream.write('This is just a text file')
stream.end()
await new Promise<void>((resolve) => stream.on('finish', () => resolve()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two instances of non-CAR could just be a plain await writeFile('...'), no need to use streams here

})

// Cleanup
it('cleanup', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a test so don't make it one, you have afterEach or afterAll to help with this; perhaps make an array within this test group, then within each test append the file, then use an afterAll to clean up.

@SgtPooki might have more creative ideas of how to structure this though.

const reader = asyncIterableReader(stream)
const decoder = createDecoder(reader)
const header = await decoder.header()
return !!(header && header.version === 1 && header.roots)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version can also be 2; but also it's pretty strict in there so you shouldn't need to check this output so you should be able to just await decoder.header() without bothering to look at the return value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !!(header && header.version === 1 && header.roots)
return header != null


// Check if the file is a CAR file and warn the user
if (await isCarFile(path)) {
log.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run this branch and do an add for a valid CAR and show us what the output looks like? then also do it and | less the output and show us that output too? I'm not convinced about the multi-line here, but also want to know that a log.warn actually does something useful.

One for @SgtPooki to consider too, is this the right way we want to output a warning in here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this scenario, I think we should display a spinnerSection telling the users to "use filecoin-pin import or provide paths to a non-CAR file or a directory" instead of using log.warn. log.* with the pino logger will only be visible if someone explicitly enables pino log levels.

In general, inside the CLI "UI" code, i.e. /src/commands/* we should display warnings explicitly via clack, similar to payments setup (non-auto) and potentially prompt them to confirm understanding.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this check from inside runAdd itself. There is already an await validatePath inside there that determines if it's a directory or a file, we could type that as "contentType" instead of a simple isDirectory boolean, and pivot display messaging there.

then we can move on to the future i'm imagining if Rod is in agreement.

Copy link
Collaborator

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out @siddharthbaleja7. There are a few changes to do for now, and another larger point I want to get across:

My ideal end goal for filecoin-pin: I want the codebase to continue working towards the src/core code containing enough of the logic so both CLI, GH action, pinning server, and programmatic users can come to the same conclusions easily. What that means in this case is: the warning should be identifiable via CODE and simple text message in src/core, and the CLI should read warnings and display the given message, or customize one (if needed). I believe the src/core/data-set types have a DataSetWarning now that I want to use throughout other src/core code. (but we shouldn't re-use the DataSetWarning type.. it would have a generic name in the future, so an Add/ImportWarning that matches the interface for now should be fine)

With that in mind, I have a few larger points that I think need to be addressed before we would merge this:

I don't think we want isCarFile to accept a filePath and do file reading. We should tie it into the existing createCarFromFile and createCarFromPath methods and accept a stream (or asyncIter) from there. If we do this, we will prevent future issues where programmatic users upload a car file and encounter the problems described in #199.

import { createReadStream } from 'node:fs'
import { asyncIterableReader, createDecoder } from '@ipld/car/decoder'

export async function isCarFile(filePath: string): Promise<boolean> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name should match exported function.. is-car-file.ts

const reader = asyncIterableReader(stream)
const decoder = createDecoder(reader)
const header = await decoder.header()
return !!(header && header.version === 1 && header.roots)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !!(header && header.version === 1 && header.roots)
return header != null

Comment on lines +11 to +15
describe('isCarFile', () => {
const tempDir = process.cwd()
const validCarPath = join(tempDir, 'valid.car')
const invalidCarPath = join(tempDir, 'invalid.car')
const textFilePath = join(tempDir, 'text.txt')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests here should be pretty simple and the same for each scenario. This file could be simplified a lot by doing some table-driven/data-driven test scenarios in an array of tuples, with a single it block:

const tests = [
 ['valid.car', 'returns true for valid car files', true],
 ['invalid.car', 'returns false for invalid car files', false],
 ['text.txt', 'returns false for regular files', false]
]
describe('isCarFile', () => {
  beforeAll(() => { /* set up all files */ })
  afterAll(() => { /* cleanup */ })
  for([fileName, msg, result] of tests) {
    it(msg, () => {
      // read file, call isCar, assert result
    })
  }
})

import * as raw from 'multiformats/codecs/raw'
import { sha256 } from 'multiformats/hashes/sha2'

describe('isCarFile', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this filename should be updated as well, likely is-car-file.test.ts

@rvagg
Copy link
Member

rvagg commented Dec 2, 2025

We should tie it into the existing createCarFromFile and createCarFromPath methods and accept a stream (or asyncIter) from there

Can you walk through what you're imagining here a bit more @SgtPooki to get a feel for API and internal mechanics. Two issues I see needing careful resolution:

  • This is a warning, not an error, so it shouldn't be fatal to ingesting; so how do we get that back from inside createCarFrom* in a helpful way?
  • We only need to sniff the start of a file, but if you want this to handle an existing asyncIterator then we're going to need to write a careful wrapper that plucks out enough bytes to do the sniffing but doesn't interrupt the remainder of the ingest. Possible, but quite messy and prone to introducing subtle bugs.

@SgtPooki
Copy link
Collaborator

SgtPooki commented Dec 2, 2025

  • This is a warning, not an error, so it shouldn't be fatal to ingesting; so how do we get that back from inside createCarFrom* in a helpful way?

I think it should be shown similarly to how we handle payment validation. Return an array of warnings that people can do with how they will.. we are currently doing this for payment validation and dataset list/details

We only need to sniff the start of a file, but if you want this to handle an existing asyncIterator then we're going to need to write a careful wrapper that plucks out enough bytes to do the sniffing but doesn't interrupt the remainder of the ingest. Possible, but quite messy and prone to introducing subtle bugs.

yea, my end goal is likely too large of a change for this PR and something we need to address later.. but I think my main point is that we should be making the src/core functionality contain these checks and balances and always have a way to surface them programmatically.

Can you walk through what you're imagining here a bit more @SgtPooki to get a feel for API and internal mechanics. Two issues I see needing careful resolution:

I was imagining filecoin-pin-website users trying to upload a file and not getting any feedback about potentially double-wrapped car files since it's only in the CLI code in this PR. So, the way we're currently handling that is a warnings or suggestions array that users can parse/render.

Essentially, I'm imagining the CLI could use default messages, and programmatic users could use a unique CODE to determine what UI to show. It would look something like this:

const { carPath, rootCid, warnings } = await createCarFromPath(filePath)

if (warnings.length > 0) {
  // handle code/msgs for warnings, may block continuation if desired..
}

// upload

Once we are there.. it would be easier to move to an, in my opinion, better API.. something like:

const onProgress: CreateCarOnProgressHandler = async (event) => {
  switch (event.type) {
    case 'create-car:already-car': // needs better name
      // handle however you wish.. event.msg could contain our default description, but devs can do whatever they wish
      break
  }
}
const { carPath, rootCid } = await createCarFromPath(filePath, { onProgress })

essentially we want users to be able to get to success easily, without blocking (both above handle this) but also be able to determine if something was done they didn't want done, and potentially block it. Both of the above allow us to do so in the CLI and everywhere else.

@SgtPooki
Copy link
Collaborator

SgtPooki commented Dec 2, 2025

We could also just throw on createCarFromPath being given a CAR file, because that's probably not what users want.. and we likely don't want to get into the mess of reading car file and then generating one of our own.

After thinking through other solutions, I really think this is probably what we want, because programmatically, they can just upload their own car and get errors there if it's not in the correct format.. and via CLI add they would get a quick failure and then use filecoin-pin import foo.car instead of us piping through a --force option for filecoin-pin add. There's nothing that add is doing that import doesn't, except directories, which doesn't matter for this car scenario we're discussing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants