Data Proposal
Technical Proposal
We propose a decentralized federated learning network to train a 1.58-bit FP4 quantized Hinton-proposed forward-forward based language model off of function calls. This will allow us to train language models far cheaper, faster and more privately than traditional methods. By leveraging Hinton's forward-forward algorithm, we can further parallelize the training of Large Language Models. It is the consensus that current transformer-based LLMs are insufficient to handle truly complex tasks without spending tens of billions of dollars on GPU data centers. We have modified the transformer model to use ternary weights and replaced backpropagation with two forward passes of positive and negative information generated by the model itself coinciding with the function-call based dataset generated by our application usage. With federated learning we can scale the training of new models to any application using our embedded code. We request funding to establish our compute network to research this novel model architecture. The network would allow customers to benefit from AI and AI Agents for a lower cost without sacrificing personal user privacy.
Further R&D is needed to implement FP4 quantization for both weights and activations, minimizing precision loss via techniques like oscillation-reduced training in vision transformers, which stabilizes low-precision computations in forward and backward passes. This differs significantly from existing solutions like traditional transformer-based LLMs, which rely on backpropagation, high-bit precision (e.g., FP16/32), and massive centralized GPU clusters costing billions, leading to high energy consumption and privacy risks from data aggregation; our approach eliminates backpropagation entirely, uses ternary approximations (log2(3) ~1.58 bits) for extreme compression, and decentralizes training via federated protocols on edge devices, achieving 10-100x cost reductions, faster convergence through parallelism, and inherent privacy by keeping data local. It surpasses standard quantized LLMs (e.g., BitNet) by fusing forward-forward's plausibility with function call-specific datasets, enabling on-device model evolution without raw data sharing, as explored in federated LLM frameworks addressing communication bottlenecks via message quantization and streaming. This innovation could create a new market for embeddable, privacy-centric AI training in app ecosystems, adopted due to regulatory demands (e.g., GDPR), reduced barriers for SMEs lacking GPU infrastructure, and viral scaling through user devices, similar to how FL has been embraced in IoT for scalable intelligence while preserving data sovereignty. We intend for the usage of our decentralized compute network to create an efficient model for hardware modules which could be soldered to PCBs or connected via GPIO to allow any device to work with the AI without worrying about compatibility as the AI module would be able to figure out what it is connected to and control the device.
Research & Development
The specific research and development required includes prototyping the modified transformer architecture with ternary weights and FP4 quantization to adapt forward-forward's dual passes for function call sequences, using PyTorch for quantization-aware training to benchmark efficiency; designing a federated learning framework with tools like Flower for embedding in apps to aggregate model updates securely; building preprocessing pipelines for curating anonymized function call datasets with differential privacy; and validating the prototype on tasks like function prediction, measuring accuracy, speed, energy, and privacy metrics to prove 10-100x improvements over centralized methods.
This work is technically innovative by pioneering the fusion of forward-forward parallelism—enabling local, biologically inspired learning without gradients—with low-precision quantization in decentralized settings, overcoming transformer limitations in resource-constrained environments, as demonstrated in recent LLMs meet FL studies tailoring paradigms for unique LM characteristics. Challenges include slower convergence in forward-forward due to contrastive limitations on complex tasks, managed via hybrid fine-tuning with backpropagation in early stages and Bayesian hyperparameter optimization on simulated benchmarks; data heterogeneity in federated setups causing biased models from non-IID function calls, addressed through personalized techniques like FedProx and synthetic balancing in wireless FL for LLMs, with simulations on 100+ virtual nodes; communication overhead on edge devices, mitigated by top-k sparsification and FP4 compression reducing payload by 75%, tested via streaming in efficient FL for LLMs; privacy vulnerabilities like model inversion, countered with Gaussian noise differential privacy and secure aggregation, audited for compliance in real-world FL applications; and quantization-induced accuracy drops from outliers in transformers, handled by post-training refinement and SVD-based absorption of low-rank components in 4-bit setups. Funding supports compute purchases for distributed networks, including GPU/FPGA clusters and edge nodes like NVIDIA Jetsons, essential for realistic prototyping and validation, aligning with NSF SBIR AI examples emphasizing hardware for efficient model deployment.
Use Cases & Competitive Advantage
The potential use cases of the model architecture are for embedded scripts into any system ranging from microcontrollers to cloud-based applications, even beyond. By training the model for less, and by implementing it in one of our agents, we can collect data useful to the consumer who intends on creating applications and robotics hardware that could utilize this function-calling model. The closest competitors usually create model wrappers to form the basis of their AI Agents, but our AI Agents are designed to run on agentic LLM models like the one we propose to make. By making the model incredibly small and accurate, we can extend the reach by placing it in smaller and smaller devices.
Mobile devices are the obvious target space, where competition remains low because of the limited compute available in mobile processors; this leaves competitors having to utilize cloud-based API calls to compute farms running the current state of LLMs. The competitive advantage of utilizing our model is that it will run faster, be private, and dedicated to function calls, where agents need speed and privacy to be effective. By being private, apps running our model no longer need to worry about authentication and credential handling for security purposes, because data from app usage is not sent directly to a central data training center. This means it's easier for adoption of our models, and the speed and accuracy of this small model will encourage use. Both people and companies want to use AI, but the restrictions of being cloud-based, price, privacy concerns, and speed all limit what can be done. Even then, AI is rapidly being adopted. The potential of a model without these restrictions is awesome.
Team Background
The team began in the AutoGPT research and development lab in 2023. Composed of five members, Kyle Steel, BentlyBro, Colton Frear, Ethan Shelton, and Alexey Kuznetsov Jr., we have so far been able to accomplish development of three apps and three agents, as well as a framework for training LLM models on Apple Silicon hardware. We have learned to adapt and use AI Agents to automate the processes that work and frequently experiment on the business model. Kyle Steel and BentlyBro both developed the first multi-agent system and self-improving agent, respectively. Alexey Kuznetsov has developed ERP systems for SpaceX suppliers. Ethan Shelton has built personal AI assistants for consumers and businesses.
The company has been formed, and the team is dedicated to remaining open source. Expertise is not a gap; the main gap is volume on the team to distribute the workload. So far, the startup has managed; the team has created a bulletin board system for development and refinement through experimentation, collaboration and communication. Once issues are identified, open discussion with material exploration occurs on the topic. We automatically track programming progress with background agents to keep development flow uninterrupted. When solutions to the issues are proposed, the team experiments with the solutions until an ideal outcome arrives. The problem-solving process and the solution is announced on all platforms as part of our marketing strategy. By automating this process, the team has been more effective at development. Our implementation time metric is twenty minutes after issue identification. We need to hire more developers to accomplish the project sooner and distribute the workload.
Model Proposal
Our model proposal focuses on creating a novel architecture that combines the efficiency of quantized models with the power of federated learning. We envision a model that can be deployed across diverse hardware environments while maintaining high performance and privacy standards.
The core innovation lies in our approach to model compression and distributed training. By leveraging FP4 quantization and ternary weights, we achieve significant reductions in model size while preserving essential functionality. This enables deployment on edge devices and embedded systems that were previously unable to support AI workloads.
Key aspects of our model proposal include adaptive quantization techniques that dynamically adjust precision based on layer importance, federated training protocols that ensure data privacy while enabling collaborative learning, and hardware-specific optimizations that maximize performance across different computing environments.
Agent Proposal
Our agent proposal centers on creating intelligent, autonomous systems that can operate effectively in decentralized environments. These agents will be designed to work with our quantized models while providing robust, privacy-preserving AI capabilities.
The agent architecture emphasizes modularity and adaptability, allowing for easy integration into existing systems and workflows. Each agent will be capable of learning from its environment and improving its performance over time through federated learning mechanisms.
We propose developing agents that can handle complex decision-making tasks, automate routine processes, and provide intelligent assistance across various domains. The agents will be designed with built-in privacy controls and will operate without requiring constant cloud connectivity, making them ideal for edge computing scenarios.
App Proposal
Our app proposal focuses on creating user-friendly applications that demonstrate the power of our decentralized AI technology. These applications will serve as both proof-of-concept implementations and practical tools for end users.
We plan to develop applications that showcase the unique capabilities of our federated learning approach, including privacy-preserving data analysis tools, intelligent automation platforms, and collaborative learning systems. Each app will be designed with a focus on user experience while maintaining the technical innovations that set our platform apart.
The app ecosystem will include both standalone applications and SDKs for developers who want to integrate our technology into their own projects. We'll provide comprehensive documentation and support to ensure successful adoption and implementation.