Catalypt LogoCatalypt.ai

Industry Focus

Developer Options

Resources

Back to Blog

How I Caught AI Copy-Pasting from Stack Overflow

December 15, 2023 Josh Butler Technical

The pull request looked great. Complex sorting algorithm, clean implementation, even had comments. One problem: I'd seen this exact code before. Right down to the variable named `tempVar` and the comment '//TODO: optimize this later".

It was the top Stack Overflow answer. From 2019. Complete with the same typo in the comments.

The First Red Flag


// AI-generated code for "implement quicksort in JavaScript"
function quickSort(arr, left = 0, right = arr.length - 1) {
  // TODO: optimize this later
  if (left < right) {
    let pivotIndex = partition(arr, left, right);
    //Recursively sort elements before and after partition
    quickSort(arr, left, pivotIndex - 1);
    quickSort(arr, pivotIndex + 1, right);
  }
  return arr;
}

function partition(arr, left, right) {
  let pivot = arr[right];
  let i = left - 1;
  
  for (let j = left; j < right; j++) {
    if (arr[j] <= pivot) {
      i++;
      let tempVar = arr[i];  // <-- Nobody names variables "tempVar"
      arr[i] = arr[j];
      arr[j] = tempVar;
    }
  }
  
  let tempVar = arr[i + 1];
  arr[i + 1] = arr[right];
  arr[right] = tempVar;
  
  return i + 1;
}
      

Googled "quicksort javascript tempVar" - first result, exact match. Even the weird spacing in the comment.

The Smoking Gun Collection

The Username in Comments


// AI output:
/**
 * Solution for finding array intersection
 * Thanks to user3842965 for the optimization tip!
 */

// Who's user3842965? Stack Overflow user, that's who
      

The Ancient jQuery


// Request: "Modern JavaScript to toggle class"
// AI response:
$('#myDiv').toggleClass('active');

// Me: "I said modern JavaScript"
// AI: "Here's the modern version:'

$(document).ready(function() {
  $('#myDiv').on('click', function() {
    $(this).toggleClass('active');
  });
});

// That's... more jQuery
      

The Broken Example That Never Worked


// AI's regex for email validation
/^[^s@]+@[^s@]+.[^s@]+$/

// Looks familiar? It's the infamous SO regex that allows:
// [email protected] (single char TLD)
// test@@test.com (double @)
// And other nonsense

// Complete with the SO comment:
// "Simple regex, not perfect but works for most cases"
      

The Dead Giveaways

  • TODO comments that make no sense in context
  • Variable names from a different naming convention
  • Comments referencing "the OP" or "the asker"
  • Edit history in comments: "EDIT: Fixed typo"
  • Thanks to specific usernames
  • Links to JSFiddle that expired in 2018

The Time Travel Bug


// Me: "Use modern React patterns"
// AI generates:

import React from 'react';

class UserProfile extends React.Component {
  constructor(props) {
    super(props);
    this.state = { user: null };
    this.handleClick = this.handleClick.bind(this); // 2017 called
  }
  
  componentWillMount() { // Deprecated since 2018!
    this.fetchUser();
  }
  
  componentWillReceiveProps(nextProps) { // Also deprecated!
    if (nextProps.userId !== this.props.userId) {
      this.fetchUser();
    }
  }
}
      

Found the exact component on SO. Answer from 2017. Before hooks existed.

The License Violation Special


// AI's "original" utility function
function deepClone(obj) {
  // Handle null or undefined
  if (obj === null || typeof obj !== "object") return obj;
  
  // Handle Date
  if (obj instanceof Date) {
    const copy = new Date();
    copy.setTime(obj.getTime());
    return copy;
  }
  
  // Handle Array
  if (obj instanceof Array) {
    const copy = [];
    for (let i = 0, len = obj.length; i < len; i++) {
      copy[i] = deepClone(obj[i]);
    }
    return copy;
  }
  
  // Handle Object
  if (obj instanceof Object) {
    const copy = {};
    for (const attr in obj) {
      if (obj.hasOwnProperty(attr)) copy[attr] = deepClone(obj[attr]);
    }
    return copy;
  }
  
  throw new Error("Unable to copy obj! Its type isn't supported.');
}
      

Looks professional! One issue: It's from a popular MIT-licensed library, comment for comment. Without attribution.

The Frankenstein's Monster

The worst case - AI stitching together multiple SO answers:


// Part 1: From SO answer about promises (2016)
function fetchData(url) {
  return new Promise((resolve, reject) => {
    // Part 2: From different SO answer about fetch (2018)
    fetch(url)
      .then(response => {
        // Part 3: From another answer about error handling (2020)
        if (!response.ok) {
          throw new Error(`HTTP error! status: ${response.status}`);
        }
        return response.json();
      })
      // Part 4: Mix of two different error handling approaches
      .then(data => resolve(data))
      .catch(error => reject(error));
  });
}

// It's fetch wrapped in a promise constructor - an antipattern!
      

How to Detect Stack Overflow Copy-Paste

  1. Unusual variable names - `tempVar`, `myVar`, `foo`, `bar`
  2. Outdated patterns - `var` in 2024, deprecated APIs
  3. Inconsistent style - Different formatting mid-function
  4. Random TODOs - That will never be done
  5. Google unique comments - Often verbatim from SO
  6. Check for common SO bugs - The broken regex everyone copies

The Hall of Shame


// Comment includes SO metadata
"use strict"; // Answer edited 5 times

// Includes the acceptance checkmark
✓ function validateEmail(email) { // <-- Actual checkmark in code

// References other answers
// See johndoe's answer below for async version

// Stack Overflow formatting survived
<code>console.log('hello')</code> // HTML in comments

// The classic
// This works in my machine ¯_(ツ)_/¯
      

Why This Matters

  • License issues - SO is CC BY-SA, not public domain
  • Quality issues - Old answers may be wrong now
  • Security issues - That SQL injection fix from 2010? Not great
  • Maintenance issues - Copying code you don't understand

The Right Way to Use Stack Overflow Knowledge


// Good prompt:
"Implement quicksort in JavaScript. Use modern syntax (ES6+), 
descriptive variable names, and explain the partition logic. 
Do not copy from existing implementations."

// Better prompt:
"Explain how quicksort works, then implement it step by step 
with clear variable names and comments explaining each part."
      

My Favorite Catch


// AI-generated Python code
def fibonacci(n):
    """Calculate fibonacci number
    
    Note: This is O(2^n), see DynamicProgrammer79's 
    answer for O(n) solution
    """
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Who's DynamicProgrammer79? 
# SO user with the optimized solution right below this one
      

The Verification Process

Now I always:

  1. Google suspicious comments
  2. Check for outdated patterns
  3. Look for inconsistent naming
  4. Verify "clever" one-liners
  5. Test edge cases (SO code often misses these)

AI copying from Stack Overflow is like a student copying homework - including the part where they wrote their name. The code might work, but it comes with all the baggage of the original: outdated patterns, specific bugs, and occasionally, someone else's copyright. Always verify AI-generated code isn't just regurgitating the greatest hits of Stack Overflow 2015. And if you see a variable named `tempVar`, you know where it came from.

Get Started