By: Suresh Dodda
I have been doing development for the past 20 + years. As a traditional developer, a lot of repetitive tasks take a long time and do not improve the efficiency of the developer. AI BOTS are going to play a significant role in reducing developer effort thereby improving the profits for organizations.
Devin can access common developer tools, including its own shell, code editor and browser, within a sandboxed compute environment to plan and execute complex engineering tasks requiring thousands of decisions. The human user simply types a natural language prompt into Devin’s chatbot style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to tackle the problem. It then begins the project using its developer tools, just like how a human would use them, writing its own code, fixing issues, testing and reporting on its progress in real-time, allowing the user to keep an eye on everything as it works.
If something doesn’t look right to the human observer, the user can also jump into the chat interface and give the AI a command to fix it. This, Cognition says, enables engineering teams to delegate some of their projects to the AI and focus on more creative tasks that require human intelligence.
In this way, Devin offers a new paradigm that may be a glimpse of the way all software development — and computer work generally — may workers overseen by human supervisors/users. According to demos shared by Wu, Devin is capable of handling a range of tasks in its current form. This includes common engineering projects like deploying and improving apps/websites end-to-end and finding and fixing bugs in codebases to more complex things like setting up fine-tuning for a large language model using the link to a research repository on GitHub or learning how to use unfamiliar technologies.
In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end – without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed.
In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end – without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed.
While the tool remains to be tested, its ability to handle multiple steps – while staying on track – to complete a software engineering project is the biggest unique selling point.
About the Author
Suresh Dodda, a seasoned technologist with strong focus on AI/ML research and with 24 years of progressive experience in the field of technology, is adept at leveraging Java, J2EE, AWS, Micro Services, and Angular for innovative design and implementation. With a keen eye for detail, Suresh excels in developing applications from inception to execution, showcasing his deep expertise in Java as evidenced by his authored book on Microservices and his role as a book reviewer for publications such as Packet and BPB.
Suresh’s technical prowess extends to the realm of AI/ML, where he has contributed to research papers, while his effective management skills have consistently ensured timely project delivery within allocated budgets. His extensive international experience includes working with esteemed clients such as Dubai Telecom in Abu Dhabi, Nokia in Canada, Epson in Japan, Wipro Technologies in India, Mastercard in the USA, National Grid in the USA, Yash Technologies in the USA, and ADP in the USA.
Within core industries such as banking, telecom, retail, utilities, and payroll, Suresh possesses a deep understanding of domain-specific challenges, bolstered by his track record as a technical lead and manager for globally dispersed teams.
Suresh’s professional stature is further underscored by his membership in prestigious organizations like IEEE, his role as a keynote speaker at esteemed research universities like Eudoxia, and his contribution as a journal reviewer for IGI Global, highlighting his active involvement in advancing technology.
Published By: Aize Perez