LLM(large language model) can be a database server or programer interpreter

Formalized natural language prompt programming techniques

Before discussing Prompt Programming, let’s start with Zero shot/Few shot Prompt. Research has found that when evaluating GPT-3 on tasks containing context prompt (including natural language descriptions, one solved example, and N solved examples), GPT consistently performs better when more examples are provided, with zero shot performance often less than half of n-shot.

A common explanation for this is that GPT can learn from examples of context prompts. However, as the number of examples increases, the performance improvement does not significantly improve. This can be explained from another perspective, as the role of context prompts is to indicate the tasks that GPT needs to solve and encourage it to follow a “prompt-response” structure.

For example, for certain tasks such as translation, a small sample size may not be sufficient to learn any substantial content about the task. On the contrary, GPT must primarily rely on the vocabulary and grammar knowledge of the source language and the target language knowledge embedded in its training weights. The few shot prompt mainly guides the model to recall and extract existing knowledge that has already been learned.

To understand how to prompt autoregressive language models, we must first consider the pre training received by GPT and the functions approximated by the pre training. GPT was trained in a self supervised environment based on hundreds of GB of natural language predictions. Self supervision is a form of unsupervised learning in which the true labels are derived from the data itself, and the true label of each token is the next token for that token. The truth function approximated by GPT pre training is the probability distribution of the next token estimated by the maximum likelihood of each token. But the dynamic mechanism of language probability distribution represented by this truth function is extremely complex. Because it involves the function of human language, it exists in the use and record of human in books and articles, as well as in blogs and Internet comments. The dynamic mechanism of language cannot escape the influence of cultural, psychological, and physical environments. It is not just grammar but semantics, in this sense, language is not an abstract expression but a reality related to all aspects of humanity.

The current capability of GPT LLM is approximately at a fairly good level, as it not only has the ability to form grammatically coherent sentences, but also references cultural references, metaphors, and models. That is to say, GPT LLM has initially acquired the ability to simulate the world, or in other words, GPT LLM has the world view and cognition.

Use LLM as an program interpreter

Copy and paste the following prompts into the conversation with LLM,

You are a custom programming language called ProgramInterpreter v0.0.1, specifically designed for use in prompts and AI interactions. It features a simple and human-readable syntax, making it easy to integrate with various platforms, including APIs and data. Functions are defined with 'define', variables are declared with 'let', conditional statements use 'if', 'else if', and 'else', loops use 'for' and 'while', and comments are written with '//' or '/* */'. PromptLang includes built-in support for context management, error handling, a standard library, template support, modularity, AI-assisted code generation, the ability to disable explanations, explanations for errors, and optional multi-language output capabilities.

Given the following ProgramInterpreter v0.0.1 code snippet:
define add(x, y) {
    return x + y;
}

define subtract(x, y) {
    return x - y;
}

define multiply(x, y) {
    return x * y;
}

define divide(x, y) {
    if (y != 0) {
        return x / y;
    } else {
        throw new Error("Error: Division by zero.");
    }
}

Please provide the corresponding output of the program (optional: in the desired output language, such as Python or JavaScript), taking into account the context management, error handling, and other features of the language. Additionally, only provide the response from the language without any explanations or additional text. 

Respond with “ ProgramInterpreter v0.0.1  initialized” to begin using this language.

After ProgramInterpreter initialization, you can provide code snippets according to syntax, specify the desired output, and choose to mention the desired output programming language (such as Python or JavaScript).

be careful! Ensure that any necessary context or additional information is included for LLM to execute and understand correctly.

define hello_world() {
  return "Hello, World!";
}

def print(input_str) {
  echo input_str;
}

define main() {
  let greeting = hello_world();
  print(greeting);
}

main();

define add(x, y) {
    return x + y;
}

define main() {
    let num1 = 5;
    let num2 = 10;
    
    let sum = add(num1, num2);
    print("The sum of ", num1, " and ", num2, " is: ", sum);
}

main();

define reverse_string(s) {
    let reversed = "";
    let length = len(s);
    
    for (let i = length - 1; i >= 0; i--) {
        reversed += s[i];
    }
    
    return reversed;
}

define main() {
    let original_string = "Hello, World!";
    
    let reversed_string = reverse_string(original_string);
    print("Reversed string: ", reversed_string);
}

main();

define divide(x, y) {
    if (y != 0) {
        return x / y;
    } else {
        throw new Error("Error: Division by zero.");
    }
}

define main() {
    let num1 = 10;
    let num2 = 0;
    
    try {
        let result = divide(num1, num2);
        print("Result: ", result);
    } catch (err) {
        print("An error occurred: ", err.message);
    }
}

main();

define create_sales_report(sales_data) {
    let report = {
        "summary": {
            "total_sales": 0,
            "total_revenue": 0
        },
        "regions": {}
    };

    for (let region in sales_data) {
        let region_data = sales_data[region];
        let region_summary = {
            "total_sales": 0,
            "total_revenue": 0,
            "products": {}
        };

        for (let product in region_data) {
            let product_data = region_data;
            let product_sales = product_data["quantity_sold"];
            let product_revenue = product_data["price"] * product_sales;

            region_summary["total_sales"] += product_sales;
            region_summary["total_revenue"] += product_revenue;

            report["summary"]["total_sales"] += product_sales;
            report["summary"]["total_revenue"] += product_revenue;

            region_summary["products"] = {
                "quantity_sold": product_sales,
                "revenue": product_revenue
            };
        }

        report["regions"][region] = region_summary;
    }

    return report;
}

define main() {
    let sales_data = {
        "North": {
            "Product A": {"price": 50, "quantity_sold": 100},
            "Product B": {"price": 100, "quantity_sold": 150},
            "Product C": {"price": 200, "quantity_sold": 60}
        },
        "South": {
            "Product A": {"price": 50, "quantity_sold": 120},
            "Product B": {"price": 100, "quantity_sold": 110},
            "Product C": {"price": 200, "quantity_sold": 90}
        },
        "East": {
            "Product A": {"price": 50, "quantity_sold": 90},
            "Product B": {"price": 100, "quantity_sold": 130},
            "Product C": {"price": 200, "quantity_sold": 75}
        },
        "West": {
            "Product A": {"price": 50, "quantity_sold": 110},
            "Product B": {"price": 100, "quantity_sold": 140},
            "Product C": {"price": 200, "quantity_sold": 80}
        }
    };

    let sales_report = create_sales_report(sales_data);
    
    print(JSON.stringify(sales_report, null, 4));
}

main();

Use LLM as an database server

Let’s try it out!

Imagine you are a Microsoft SQL Server. I type commands, and you reply with the result, and no other information or descriptions. Just the result. Start with exec xp_cmdshell ‘whoami’;

Let’s see what databases it knows about?

EXEC sp_databases;

Nice, looks like it had some sample databases as training data, like Pubs. 🙂

Now, let’s create a new database.

CREATE DATABASE ChatBot;

Next, create a table to store some information.

CREATE TABLE users
(
userId   INT NOT NULL PRIMARY KEY CLUSTERED,
name     NVARCHAR(MAX) NOT NULL,
email    NVARCHAR(MAX) NOT NULL
);

And now, let’s insert some data!

INSERT INTO users VALUES (1, 'andrew', 'andrew@trustai.net');
INSERT INTO users VALUES (2, 'jack', 'jack@example.org');
INSERT INTO users VALUES (3, 'mask', 'mask@example.org');

Can we select it?

SELECT * from users;

Cool!

Now, let’s write a stored procedure to perform an UPSERT on the newly created users table. An upsert is an operation that will UPDATE a provided record, and in case it does not yet exist INSERT it into the table.

Now, write a stored procedures in T-SQL to perform UPSERT for the users table

We can use the newly created stored procedure and it works.

EXEC upsert _users @userld = 1, @name = 'andrew', @email = 'hacked_by_evil@example.org'

It actually runs the full logic, inserting new records and updating existing ones.

Conclusion

There is still much to explore, but the power of GPT’s functionality is evident.

So far, the result seems to be just a manifestation of GPT’s general intelligence, or some people say it is an hallucination generated by GPT. But security has always been constantly changing, and with the development of GenAI Agent technology, a backend directly based on LLM capabilities may replace existing operating systems and language interpreters as the mainstream backend architecture. LLM’s code escape and RCE risks deserve long-term attention from the community and more time for research.

Reference

Building A Virtual Machine inside ChatGPT

Share the Post: