Single-File Project Context: Working Around AI Chat Tool Upload Limitations
Large Language Models (LLMs) like ChatGPT, Claude, and Google Gemini have become invaluable tools for developers, offering assistance with code review, debugging, and general programming queries. However, these tools vary in the files they handle, creating challenges when seeking help with larger codebases.
While ChatGPT supports ZIP file uploads, allowing it to analyse entire projects, other AI assistants are more limited. Claude and the Google Gemini, for instance, only accept individual file uploads. This limitation becomes challenging when trying to discuss architectural decisions, debug cross-file interactions, or seek advice on refactoring, where context from multiple files is crucial. Some of the models available on Google’s AI Studio aistudio.google.com offer session context sizes of up to two million tokens, potentially allowing for entire codebases to be processed at once. However, this still leaves the challenge of preparing and structuring the input in a meaningful way.
The common workarounds are not ideal. Even with Google AI Studio’s large context window, you still need to prepare your codebase in a format the model can understand. Pasting multiple files manually is time-consuming and error-prone. Sharing only the relevant files often misses important context. Creating minimal reproductions works for simple issues but falls short for architectural discussions or complex debugging scenarios. GitHub links are not an option.
Consider a typical scenario: you are debugging a service that spans multiple components, with dependencies scattered across various files and directories. To get meaningful help, you need to share:
- The main service files
- Related utility functions
- Type definitions
- Configuration files
With ChatGPT, you could simply zip these files and upload them. With Google AI Studio, you might have the token headroom, but still have to upload individual files and need a way to preserve the project’s structure. With other AI tools, you would need to carefully copy and paste each file, maintain their structure somehow, and hope you have not missed any crucial context.
This is where having a tool to consolidate project files into a single, well-structured text file becomes valuable. Such a file can be easily uploaded to any AI assistant, providing complete context while preserving the project’s structure and file relationships.
Let us look at a Python script to address this challenge. It walks through a project directory, intelligently filters relevant files, and produces a single structured document that maintains the project’s hierarchy while including useful metadata about each file.
import os
import argparse
import re
from datetime import datetime
class ProjectConsolidator:
def __init__(self, project_dir, output_file, max_file_size_mb=1):
self.project_dir = project_dir
self.output_file = output_file
self.max_file_size_mb = max_file_size_mb
self.total_size = 0
self.file_count = 0
# Configuration
self.exclude_dirs_patterns = [r"__pycache__", r"\.idea", r"\.git", r"venv", r"node_modules"]
self.exclude_files_patterns = [r"\.pyc$", r"\.log$", r"\.DS_Store$"]
self.binary_extensions = {'.db', '.pyc', '.pkl', '.bin', '.jpg', '.png', '.exe'}
def should_process_file(self, file_path):
"""Determine if file should be processed."""
return (
not any(re.search(pattern, file_path) for pattern in self.exclude_files_patterns) and
os.path.getsize(file_path) <= (self.max_file_size_mb * 1024 * 1024)
)
def detect_language(self, file_path):
"""Detect file language based on extension."""
ext = os.path.splitext(file_path)[1].lower()
language_map = {
'.py': 'python',
'.js': 'javascript',
'.ts': 'typescript',
'.java': 'java',
'.cpp': 'cpp',
'.h': 'cpp',
'.cs': 'csharp',
'.go': 'go',
'.rb': 'ruby',
'.php': 'php',
'.rs': 'rust'
}
return language_map.get(ext, 'text')
def write_file_content(self, outfile, file_path, relative_path):
"""Write file content with appropriate formatting."""
stats = os.stat(file_path)
language = self.detect_language(file_path)
# Write file header
outfile.write(f"\n--- START FILE: {relative_path} ---\n")
outfile.write(f"File Size: {stats.st_size:,} bytes\n")
outfile.write(f"Last Modified: {datetime.fromtimestamp(stats.st_mtime)}\n")
if os.path.splitext(file_path)[1] in self.binary_extensions:
outfile.write(f"Content of binary file {relative_path} is omitted.\n")
return
try:
with open(file_path, 'r', encoding='utf-8') as infile:
lines = infile.readlines()
outfile.write(f"Lines of Code: {len(lines)}\n\n")
# Write content with line numbers for code files
if language != 'text':
for i, line in enumerate(lines, 1):
outfile.write(f"{i:4d} | {line}")
else:
outfile.writelines(lines)
except UnicodeDecodeError:
outfile.write(f"Content could not be decoded.\n")
outfile.write(f"\n--- END FILE: {relative_path} ---\n")
def write_project_summary(self, outfile):
"""Write project summary information."""
outfile.write("--- PROJECT SUMMARY ---\n")
outfile.write(f"Total Files: {self.file_count}\n")
outfile.write(f"Total Size: {self.total_size:,} bytes\n")
outfile.write(f"Estimated Tokens: {(self.total_size // 4):,}\n")
if (self.total_size // 4) > 100000:
outfile.write("WARNING: Content may exceed LLM context limits\n")
outfile.write("--------------------\n\n")
def consolidate(self):
"""Consolidate the project files."""
with open(self.output_file, 'w', encoding='utf-8') as outfile:
outfile.write("--- PROJECT STRUCTURE ---\n")
# First pass: collect file information and write structure
for root, dirs, files in os.walk(self.project_dir):
# Filter directories
dirs[:] = [d for d in dirs if not any(re.search(pattern, d)
for pattern in self.exclude_dirs_patterns)]
relative_path = os.path.relpath(root, self.project_dir)
if relative_path == ".":
outfile.write(" /\n")
else:
indent = " " * relative_path.count(os.sep)
outfile.write(f"{indent}└─ {os.path.basename(root)}/\n")
# Write files in this directory
for file in sorted(files):
if self.should_process_file(os.path.join(root, file)):
self.file_count += 1
indent = " " * (relative_path.count(os.sep) + 1)
outfile.write(f"{indent}└─ {file}\n")
self.total_size += os.path.getsize(os.path.join(root, file))
outfile.write("--- END PROJECT STRUCTURE ---\n\n")
# Write project summary
self.write_project_summary(outfile)
# Second pass: write file contents
for root, dirs, files in os.walk(self.project_dir):
dirs[:] = [d for d in dirs if not any(re.search(pattern, d)
for pattern in self.exclude_dirs_patterns)]
for file in sorted(files):
file_path = os.path.join(root, file)
if self.should_process_file(file_path):
relative_path = os.path.relpath(file_path, self.project_dir)
self.write_file_content(outfile, file_path, relative_path)
def main():
parser = argparse.ArgumentParser(description="Consolidate a project into a single file for LLM analysis.")
parser.add_argument("project_dir", help="The root directory of the project")
parser.add_argument("output_file", help="The path to the output file")
parser.add_argument("--max-file-size", type=float, default=1.0,
help="Maximum size of individual files in MB (default: 1.0)")
args = parser.parse_args()
consolidator = ProjectConsolidator(
args.project_dir,
args.output_file,
args.max_file_size
)
consolidator.consolidate()
print(f"Project consolidated into: {args.output_file}")
if __name__ == "__main__":
main()
The script is straightforward to use, it requires Python 3.6. Simply run it from the command line, providing the project directory and desired output file:
python project_consolidator.py /path/to/your/code output.txt
The script generates a structured text file containing three main sections:
-
Project Structure: A tree representation of your project’s directory hierarchy, similar to the output of the
tree
command but filtered to exclude common development directories likenode_modules
,.git
, and__pycache__
. -
Project Summary: Key metrics about your codebase, including:
- Total number of files processed
- Total size in bytes
- Estimated token count (helpful for understanding LLM context limits)
-
File Contents: Each file’s content is wrapped with metadata headers including:
- File path relative to project root
- File size
- Last modified timestamp
- Line count (for code files)
- The actual file content, with line numbers for code files
Here is a snippet showing how the output is structured:
--- PROJECT STRUCTURE ---
/
└─ README.md
└─ data_analysis_app.ipynb
└─ src/
└─ core/
└─ application.py
└─ error_handler.py
...
--- PROJECT SUMMARY ---
Total Files: 16
Total Size: 73,133 bytes
Estimated Tokens: 18,283
--------------------
--- START FILE: README.md ---
File Size: 1,801 bytes
Last Modified: 2024-12-30 13:36:20.035657
Lines of Code: 68
[Content follows...]
The output format is designed to be both human-readable and suitable for AI tools to parse. Code files include line numbers, making it easier to reference specific sections when discussing the code with AI assistants. Binary files are noted but their content is omitted to keep the output manageable and focused on reviewable code.
This simple tool solves a common problem when working with AI coding assistants. While each AI platform has different approaches to handling project context - from ZIP uploads to expanded context windows - having a standardised way to prepare and present your codebase ensures you can effectively use any of these tools.
The project consolidator is particularly useful for:
- Architectural discussions where cross-file context is crucial
- Debugging complex interactions between components
- Code reviews that span multiple files
- Refactoring discussions where understanding the full project structure matters
Further enhancements could include configurable file filtering, support for more sophisticated token counting algorithms, or integration with version control systems to capture git history.
The script is also available at gist.github.com/rnsloan/03284903b554c8edd5bbb2b88080e3bc. Whether you are using ChatGPT, Claude, Google’s AI tools, or any other AI assistant, having a reliable way to share your codebase context can significantly improve the quality of AI-assisted development discussions.