Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 164 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,204 @@

# TypeScript Code Extractor and Analyzer

This project provides an advanced toolkit for parsing TypeScript code using the TypeScript Abstract Syntax Tree (AST) to extract, analyze, and map code structures.
The **TypeScript Code Extractor and Analyzer** is a robust library designed to parse and analyze TypeScript and JavaScript codebases using the TypeScript Abstract Syntax Tree (AST). It generates a structured, hierarchical representation of your codebase, detailing modules, classes, functions, properties, interfaces, enums, and dependencies. This tool is perfect for developers creating code analysis tools, documentation generators, or AI-driven systems like Retrieval-Augmented Generation (RAG) for codebases.

## Table of Contents
- [TypeScript Code Extractor and Analyzer](#typescript-code-extractor-and-analyzer)
- [Table of Contents](#table-of-contents)
- [Key Features](#key-features)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Basic Example](#basic-example)
- [API Reference](#api-reference)
- [`TypeScriptCodeMapper`](#typescriptcodemapper)
- [Data Structures](#data-structures)
- [Sample `ICodebaseMap` Structure](#sample-icodebasemap-structure)
- [Examples](#examples)
- [Analyzing a Single File's Dependencies](#analyzing-a-single-files-dependencies)
- [Handling Errors](#handling-errors)
- [Notes](#notes)
- [Contributing](#contributing)
- [License](#license)

## Key Features

- **AST-based Class Metadata Extraction**: Captures detailed metadata about classes, including methods, properties, interfaces, and enums.
- **Function and Method Signature Analysis**: Parses function signatures to extract parameters, return types, and JSDoc comments.
- **Interface and Enum Parsing**: Extracts TypeScript-specific constructs for comprehensive type system analysis.
- **Dependency Graph Construction**: Builds a graph of file dependencies by analyzing import declarations.
- **JavaScript Support**: Analyzes JavaScript files with type inference from JSDoc comments when `"allowJs": true` is set in `tsconfig.json`.

## Installation

Install the library using npm:

```bash
npm install @traversets/code-extractor
```

Ensure your project includes a `tsconfig.json` file. For JavaScript projects, add the following to enable parsing:

```json
{
"compilerOptions": {
"allowJs": true
}
}
```

TypeScript Code Extractor and Analyzer is a robust system that utilizes a TypeScript parser to navigate through the codebase's AST, extracting structured metadata about various components such as modules, classes, functions, interfaces, properties, and enums.
## Getting Started

Key features:
To begin analyzing your codebase, create an instance of `TypeScriptCodeMapper` and use the `buildCodebaseMap` method to generate a comprehensive map of your codebase. This map is returned as a `Result<ICodebaseMap>`, which you can inspect for success or errors.

- AST-based Class Metadata Extraction: Utilizes TypeScript's AST to gather comprehensive metadata on class methods, properties, interfaces, and enums.
- Function and Method Signature Analysis: Parses function signatures from the AST for details on parameters, return types, and inferred type information.
- Interface and Enum Parsing: Extracts information from AST nodes representing interfaces and enums in TypeScript.
- Dependency Graph Construction: Builds a graph of file dependencies by analyzing import declarations within the AST.
### Basic Example

### Installation
To integrate this tool into your project, install it via npm:
```
npm i @traversets/code-extractor
```typescript
import { TypeScriptCodeMapper } from '@traversets/code-extractor';

async function analyzeCodebase() {
const codeMapper = new TypeScriptCodeMapper();
const result = await codeMapper.buildCodebaseMap();
if (result.isOk()) {
console.log(JSON.stringify(result.getValue(), null, 2));
} else {
console.error('Error:', result.getError());
}
}

analyzeCodebase();
```

### Code Analysis
This example outputs a JSON structure representing your codebase, including modules, classes, functions, and dependencies.

Below is an example of how to use the AST parser for code analysis:
## API Reference

```typescript
### `TypeScriptCodeMapper`

const codeMapper: TypeScriptCodeMapper = new TypeScriptCodeMapper();
The primary class for codebase analysis, offering methods to extract and navigate metadata.

// Get Root files
const rootFiles: readonly string[] = codeMapper.getRootFileNames();
| Method | Description | Parameters | Return Type |
| --- | --- | --- | --- |
| `getRootFileNames()` | Retrieves the list of root file names from the TypeScript program, as specified in `tsconfig.json`. | None | `readonly string[] | undefined` |
| `getSourceFile(fileName: string)` | Retrieves the source file object for a given file name. | `fileName: string` | `ts.SourceFile | undefined` |
| `buildDependencyGraph(sourceFile: ts.SourceFile)` | Builds a dependency graph by extracting import statements from a source file. | `sourceFile: ts.SourceFile` | `string[]` |
| `buildCodebaseMap()` | Generates a hierarchical map of the codebase, including modules, classes, functions, properties, interfaces, enums, and dependencies. | None | `Promise<Result<ICodebaseMap>>` |
| `getProgram()` | Returns the current TypeScript program instance. | None | `ts.Program | undefined` |
| `getTypeChecker()` | Retrieves the TypeScript TypeChecker instance for type analysis. | None | `ts.TypeChecker | undefined` |

// Convert a rootFile into a sourceFile
const sourceFile: ts.SourceFile = codeMapper.getSourceFile(rootFiles[5]);
**Note**: For `buildCodebaseMap`, check `result.isOk()` to confirm success before accessing `result.getValue()`. Use `result.getError()` to handle errors.

// Build a dependency graph
const getSourceFileDepencies: string[] = codeMapper.buildDependencyGraph(sourceFile);
## Data Structures

// Build a codebase map
const codebaseMap = await codeMapper.buildCodebaseMap().getValue();
```
The library uses interfaces to represent extracted metadata:

### Sample Response Structure
The resulting JSON structure reflects the TypeScript AST's hierarchical representation:
```
| Interface | Description |
| --- | --- |
| `IClassInfo` | Represents a class with its name, functions, properties, interfaces, and enums. |
| `IModuleInfo` | Represents a module (file) with its path, classes, functions, interfaces, enums, and dependencies. |
| `IFunctionInfo` | Represents a function with its name, content, parameters, return type, and comments. |
| `IProperty` | Represents a property with its name and type. |
| `IInterfaceInfo` | Represents an interface with its name, properties, and summary. |
| `IEnumInfo` | Represents an enum with its name, members, and summary. |
| `ICodebaseMap` | A hierarchical map of the codebase, mapping project names to modules. |

### Sample `ICodebaseMap` Structure

```json
{
"MyProject": {
"projectName": {
"modules": {
"src/utils/logger.ts": {
"src/index.ts": {
"path": "src/index.ts",
"classes": [
{
"name": "Logger",
"name": "ExampleClass",
"functions": [
{
"name": "log",
"parameters": [{ "name": "message", "type": "string" }],
"name": "exampleMethod",
"content": "function exampleMethod(param: string) { ... }",
"parameters": [
{
"name": "param",
"type": "string"
}
],
"returnType": "void",
"content": "",
"comment": "Logs application Error"
"comments": "Example method description"
}
],
"properties": [
{ "name": "logLevel", "type": "LogLevel" }
]
{
"name": "exampleProperty",
"type": "number"
}
],
"interfaces": [],
"enums": []
}
],
"functions": [],
"interfaces": [],
"enums": [],
"dependencies": ["import { LogLevel } from './types';"]
"dependencies": [
"import * as fs from 'fs';"
]
}
}
}
}
```

## Examples

### Analyzing a Single File's Dependencies

```typescript
import { TypeScriptCodeMapper } from '@traversets/code-extractor';

const codeMapper = new TypeScriptCodeMapper();
const rootFiles = codeMapper.getRootFileNames();
if (rootFiles && rootFiles.length > 0) {
const sourceFile = codeMapper.getSourceFile(rootFiles[0]);
if (sourceFile) {
const dependencies = codeMapper.buildDependencyGraph(sourceFile);
console.log('Dependencies:', dependencies);
}
}
```

### Usage for Agentic RAG Systems
This tool enhances Retrieval-Augmented Generation (RAG) systems by:
### Handling Errors

```typescript
import { TypeScriptCodeMapper } from '@traversets/code-extractor';

async function analyzeWithErrorHandling() {
const codeMapper = new TypeScriptCodeMapper();
try {
const result = await codeMapper.buildCodebaseMap();
if (result.isOk()) {
console.log('Codebase Map:', JSON.stringify(result.getValue(), null, 2));
} else {
console.error('Failed to build codebase map:', result.getError());
}
} catch (error) {
console.error('Unexpected error:', error);
}
}

analyzeWithErrorHandling();
```

## Notes

- **JavaScript Support**: The library supports JavaScript parsing by enabling `"allowJs": true` in `tsconfig.json`. Use JSDoc comments (e.g., `/** @returns {number} */`) to enhance type inference.
- **Error Handling**: Methods like `buildCodebaseMap` return a `Result` type. Always check `isOk()` before accessing `getValue()` to handle errors gracefully.
- **Performance**: For large codebases, optimize `tsconfig.json` to include only necessary files, reducing processing time.

## Contributing

Contributions are welcome! Please submit issues or pull requests to the [GitHub Repository](https://github.com/olasunkanmi-SE/ts-codebase-analyzer). Follow the contribution guidelines in the repository for coding standards and testing requirements.

- Parsing the TypeScript AST into embeddings for semantic code search and similarity matching
- Leveraging AST metadata for advanced code analysis, query resolution, or to aid in code generation, thereby improving the understanding and manipulation of TypeScript codebases within AI systems.
## License

This library is licensed under the MIT License. See the [LICENSE](https://github.com/olasunkanmi-SE/ts-codebase-analyzer/blob/main/LICENSE) file for details.
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ts-toolbox/code-extractor",
"version": "0.0.1",
"name": "@traversets/code-extractor",
"version": "0.0.8",
"description": "The TypeScript Code Extractor and Analyzer can be handy for RAG (Retrieval-Augmented Generation) systems for codebases. It provides a detailed and structured representation of the codebase that can be converted into embeddings, enabling more effective advanced code analysis, retrieval, and generation tasks.",
"main": "dist/index.js",
"types": "dist/index.d.ts",
Expand Down