Single-File Project Context: Working Around AI Chat Tool Upload Limitations
Large Language Models (LLMs) like ChatGPT, Claude, and Google Gemini have become invaluable tools for developers, offering assistance with code review, debugging, and general programming queries. However, these tools vary in the files they handle, creating challenges when seeking help with larger codebases.
While ChatGPT supports ZIP file uploads, allowing it to analyse entire projects, other AI assistants are more limited. Claude and the Google Gemini, for instance, only accept individual file uploads. This limitation becomes challenging when trying to discuss architectural decisions, debug cross-file interactions, or seek advice on refactoring, where context from multiple files is crucial. Some of the models available on Google’s AI Studio aistudio.google.com offer session context sizes of up to two million tokens, potentially allowing for entire codebases to be processed at once. However, this still leaves the challenge of preparing and structuring the input in a meaningful way.
The common workarounds are not ideal. Even with Google AI Studio’s large context window, you still need to prepare your codebase in a format the model can understand. Pasting multiple files manually is time-consuming and error-prone. Sharing only the relevant files often misses important context. Creating minimal reproductions works for simple issues but falls short for architectural discussions or complex debugging scenarios. GitHub links are not an option.
Consider a typical scenario: you are debugging a service that spans multiple components, with dependencies scattered across various files and directories. To get meaningful help, you need to share:
- The main service files
- Related utility functions
- Type definitions
- Configuration files
With ChatGPT, you could simply zip these files and upload them. With Google AI Studio, you might have the token headroom, but still have to upload individual files and need a way to preserve the project’s structure. With other AI tools, you would need to carefully copy and paste each file, maintain their structure somehow, and hope you have not missed any crucial context.
This is where having a tool to consolidate project files into a single, well-structured text file becomes valuable. Such a file can be easily uploaded to any AI assistant, providing complete context while preserving the project’s structure and file relationships.
Let us look at a Python script to address this challenge. It walks through a project directory, intelligently filters relevant files, and produces a single structured document that maintains the project’s hierarchy while including useful metadata about each file.
The script is straightforward to use, it requires Python 3.6. Simply run it from the command line, providing the project directory and desired output file:
The script generates a structured text file containing three main sections:
-
Project Structure: A tree representation of your project’s directory hierarchy, similar to the output of the
tree
command but filtered to exclude common development directories likenode_modules
,.git
, and__pycache__
. -
Project Summary: Key metrics about your codebase, including:
- Total number of files processed
- Total size in bytes
- Estimated token count (helpful for understanding LLM context limits)
-
File Contents: Each file’s content is wrapped with metadata headers including:
- File path relative to project root
- File size
- Last modified timestamp
- Line count (for code files)
- The actual file content, with line numbers for code files
Here is a snippet showing how the output is structured:
--- PROJECT STRUCTURE ---
/
└─ README.md
└─ data_analysis_app.ipynb
└─ src/
└─ core/
└─ application.py
└─ error_handler.py
...
--- PROJECT SUMMARY ---
Total Files: 16
Total Size: 73,133 bytes
Estimated Tokens: 18,283
--------------------
--- START FILE: README.md ---
File Size: 1,801 bytes
Last Modified: 2024-12-30 13:36:20.035657
Lines of Code: 68
[Content follows...]
The output format is designed to be both human-readable and suitable for AI tools to parse. Code files include line numbers, making it easier to reference specific sections when discussing the code with AI assistants. Binary files are noted but their content is omitted to keep the output manageable and focused on reviewable code.
This simple tool solves a common problem when working with AI coding assistants. While each AI platform has different approaches to handling project context - from ZIP uploads to expanded context windows - having a standardised way to prepare and present your codebase ensures you can effectively use any of these tools.
The project consolidator is particularly useful for:
- Architectural discussions where cross-file context is crucial
- Debugging complex interactions between components
- Code reviews that span multiple files
- Refactoring discussions where understanding the full project structure matters
Further enhancements could include configurable file filtering, support for more sophisticated token counting algorithms, or integration with version control systems to capture git history.
The script is also available at gist.github.com/rnsloan/03284903b554c8edd5bbb2b88080e3bc. Whether you are using ChatGPT, Claude, Google’s AI tools, or any other AI assistant, having a reliable way to share your codebase context can significantly improve the quality of AI-assisted development discussions.