{"id":3508,"date":"2023-06-09T10:56:54","date_gmt":"2023-06-09T09:56:54","guid":{"rendered":"https:\/\/toshareproject.it\/artmakerblog\/?p=3508"},"modified":"2023-06-09T10:56:54","modified_gmt":"2023-06-09T09:56:54","slug":"openais-gpt-best-practices","status":"publish","type":"post","link":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/","title":{"rendered":"OpenAI&#8217;s GPT best practices"},"content":{"rendered":"<p>*These tend to be very unstable so it&#8217;s nice to have them recorded as ascii some place.<\/p>\n<p><a href=\"https:\/\/platform.openai.com\/docs\/guides\/gpt-best-practices\/strategy-test-changes-systematically\">https:\/\/platform.openai.com\/docs\/guides\/gpt-best-practices\/strategy-test-changes-systematically<\/a><\/p>\n<p>GPT best practices<\/p>\n<p>This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.<\/p>\n<p>Some of the examples demonstrated here currently work only with our most capable model, gpt-4. If you don&#8217;t yet have access to gpt-4 consider joining the waitlist. In general, if you find that a GPT model fails at a task and a more capable model is available, it&#8217;s often worth trying again with the more capable model.<\/p>\n<p>Six strategies for getting better results<\/p>\n<p>Write clear instructions<br \/>\nGPTs can\u2019t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you\u2019d like to see. The less GPTs have to guess at what you want, the more likely you\u2019ll get it.<\/p>\n<p>Tactics:<\/p>\n<p>Include details in your query to get more relevant answers<br \/>\nAsk the model to adopt a persona<br \/>\nUse delimiters to clearly indicate distinct parts of the input<br \/>\nSpecify the steps required to complete a task<br \/>\nProvide examples<br \/>\nSpecify the desired length of the output<br \/>\nProvide reference text<br \/>\nGPTs can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.<\/p>\n<p>Tactics:<\/p>\n<p>Instruct the model to answer using a reference text<br \/>\nInstruct the model to answer with citations from a reference text<br \/>\nSplit complex tasks into simpler subtasks<br \/>\nJust as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.<\/p>\n<p>Tactics:<\/p>\n<p>Use intent classification to identify the most relevant instructions for a user query<br \/>\nFor dialogue applications that require very long conversations, summarize or filter previous dialogue<br \/>\nSummarize long documents piecewise and construct a full summary recursively<br \/>\nGive GPTs time to &#8220;think&#8221;<br \/>\nIf asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a chain of reasoning before an answer can help GPTs reason their way toward correct answers more reliably.<\/p>\n<p>Tactics:<\/p>\n<p>Instruct the model to work out its own solution before rushing to a conclusion<br \/>\nUse inner monologue or a sequence of queries to hide the model&#8217;s reasoning process<br \/>\nAsk the model if it missed anything on previous passes<br \/>\nUse external tools<br \/>\nCompensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents. A code execution engine can help GPTs do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.<\/p>\n<p>Tactics:<\/p>\n<p>Use embeddings-based search to implement efficient knowledge retrieval<br \/>\nUse code execution to perform more accurate calculations or call external APIs<br \/>\nTest changes systematically<br \/>\nImproving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an &#8220;eval&#8221;).<\/p>\n<p>Tactic:<\/p>\n<p>Evaluate model outputs with reference to gold-standard answers<br \/>\nTactics<br \/>\nEach of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.<\/p>\n<p>Strategy: Write clear instructions<br \/>\nTactic: Include details in your query to get more relevant answers<br \/>\nIn order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.<\/p>\n<p>Worse\tBetter<br \/>\nHow do I add numbers in Excel?<br \/>\nHow do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called &#8220;Total&#8221;.<\/p>\n<p>Who\u2019s president?<br \/>\nWho was the president of Mexico in 2021, and how frequently are elections held?<\/p>\n<p>Write code to calculate the Fibonacci sequence.<br \/>\nWrite a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it&#8217;s written that way.<\/p>\n<p>Summarize the meeting notes.<br \/>\nSummarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.<\/p>\n<p>Tactic: Ask the model to adopt a persona<br \/>\nThe system message can be used to specify the persona used by the model in its replies.<\/p>\n<p>SYSTEM<br \/>\nWhen I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.<br \/>\nUSER<br \/>\nWrite a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.<br \/>\nOpen in Playground<br \/>\nTactic: Use delimiters to clearly indicate distinct parts of the input<br \/>\nDelimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.<\/p>\n<p>USER<br \/>\nSummarize the text delimited by triple quotes with a haiku.<\/p>\n<p>&#8220;&#8221;&#8221;insert text here&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nSYSTEM<br \/>\nYou will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.<br \/>\nUSER<\/p>\n<article> insert first article here <\/article>\n<article> insert second article here <\/article>\n<p>Open in Playground<br \/>\nSYSTEM<br \/>\nYou will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.<br \/>\nUSER<br \/>\nAbstract: insert abstract here<\/p>\n<p>Title: insert title here<br \/>\nOpen in Playground<br \/>\nFor straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don\u2019t make GPTs work to understand exactly what you are asking of them.<\/p>\n<p>Tactic: Specify the steps required to complete a task<br \/>\nSome tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.<\/p>\n<p>SYSTEM<br \/>\nUse the following step-by-step instructions to respond to user inputs.<\/p>\n<p>Step 1 &#8211; The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says &#8220;Summary: &#8220;.<\/p>\n<p>Step 2 &#8211; Translate the summary from Step 1 into Spanish, with a prefix that says &#8220;Translation: &#8220;.<br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;insert text here&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nTactic: Provide examples<br \/>\nProviding general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as &#8220;few-shot&#8221; prompting.<\/p>\n<p>SYSTEM<br \/>\nAnswer in a consistent style.<br \/>\nUSER<br \/>\nTeach me about patience.<br \/>\nASSISTANT<br \/>\nThe river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.<br \/>\nUSER<br \/>\nTeach me about the ocean.<br \/>\nOpen in Playground<br \/>\nTactic: Specify the desired length of the output<br \/>\nYou can ask the model to produce outputs that are of a given target length. The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.<\/p>\n<p>USER<br \/>\nSummarize the text delimited by triple quotes in about 50 words.<\/p>\n<p>&#8220;&#8221;&#8221;insert text here&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nUSER<br \/>\nSummarize the text delimited by triple quotes in 2 paragraphs.<\/p>\n<p>&#8220;&#8221;&#8221;insert text here&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nUSER<br \/>\nSummarize the text delimited by triple quotes in 3 bullet points.<\/p>\n<p>&#8220;&#8221;&#8221;insert text here&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nStrategy: Provide reference text<br \/>\nTactic: Instruct the model to answer using a reference text<br \/>\nIf we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.<\/p>\n<p>SYSTEM<br \/>\nUse the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write &#8220;I could not find an answer.&#8221;<br \/>\nUSER<br \/>\n<insert articles, each delimited by triple quotes><\/p>\n<p>Question: <insert question here><br \/>\nOpen in Playground<br \/>\nGiven that GPTs have limited context windows, in order to apply this tactic we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic &#8220;Use embeddings-based search to implement efficient knowledge retrieval&#8221; for more details on how to implement this.<\/p>\n<p>Tactic: Instruct the model to answer with citations from a reference text<br \/>\nIf the input has been supplemented with relevant knowledge, it&#8217;s straightforward to request that the model add citations to its answers by referencing passages from provided documents. Note that citations in the output can then be verified programmatically by string matching within the provided documents.<\/p>\n<p>SYSTEM<br \/>\nYou will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: &#8220;Insufficient information.&#8221; If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({&#8220;citation&#8221;: \u2026}).<br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;<insert document here>&#8220;&#8221;&#8221;<\/p>\n<p>Question: <insert question here><br \/>\nOpen in Playground<br \/>\nStrategy: Split complex tasks into simpler subtasks<br \/>\nTactic: Use intent classification to identify the most relevant instructions for a user query<br \/>\nFor tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information).<\/p>\n<p>Suppose for example that for a customer service application, queries could be usefully classified as follows:<\/p>\n<p>SYSTEM<br \/>\nYou will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.<\/p>\n<p>Primary categories: Billing, Technical Support, Account Management, or General Inquiry.<\/p>\n<p>Billing secondary categories:<br \/>\n&#8211; Unsubscribe or upgrade<br \/>\n&#8211; Add a payment method<br \/>\n&#8211; Explanation for charge<br \/>\n&#8211; Dispute a charge<\/p>\n<p>Technical Support secondary categories:<br \/>\n&#8211; Troubleshooting<br \/>\n&#8211; Device compatibility<br \/>\n&#8211; Software updates<\/p>\n<p>Account Management secondary categories:<br \/>\n&#8211; Password reset<br \/>\n&#8211; Update personal information<br \/>\n&#8211; Close account<br \/>\n&#8211; Account security<\/p>\n<p>General Inquiry secondary categories:<br \/>\n&#8211; Product information<br \/>\n&#8211; Pricing<br \/>\n&#8211; Feedback<br \/>\n&#8211; Speak to a human<\/p>\n<p>USER<br \/>\nI need to get my internet working again.<br \/>\nOpen in Playground<br \/>\nBased on the classification of the customer query, a set of more specific instructions can be provided to a GPT model to handle next steps. For example, suppose the customer requires help with &#8220;troubleshooting&#8221;.<\/p>\n<p>SYSTEM<br \/>\nYou will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by:<\/p>\n<p>&#8211; Ask them to check that all cables to\/from the router are connected. Note that it is common for cables to come loose over time.<br \/>\n&#8211; If all cables are connected and the issue persists, ask them which router model they are using<br \/>\n&#8211; Now you will advise them how to restart their device:<br \/>\n&#8212; If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection.<br \/>\n&#8212; If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection.<br \/>\n&#8211; If the customer&#8217;s issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {&#8220;IT support requested&#8221;}.<br \/>\n&#8211; If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme:<\/p>\n<p><insert primary\/secondary classification scheme from above here><\/p>\n<p>USER<br \/>\nI need to get my internet working again.<br \/>\nOpen in Playground<br \/>\nNotice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.<\/p>\n<p>Tactic: For dialogue applications that require very long conversations, summarize or filter previous dialogue<br \/>\nSince GPTs have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.<\/p>\n<p>There are various workarounds to this problem, one of which is to summarize previous turns in the conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message. Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.<\/p>\n<p>An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic &#8220;Use embeddings-based search to implement efficient knowledge retrieval&#8221;.<\/p>\n<p>Tactic: Summarize long documents piecewise and construct a full summary recursively<br \/>\nSince GPTs have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.<\/p>\n<p>To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries. This process can proceed recursively until an entire document is summarized. If it\u2019s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point. The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.<\/p>\n<p>Strategy: Give GPTs time to &#8220;think&#8221;<br \/>\nTactic: Instruct the model to work out its own solution before rushing to a conclusion<br \/>\nSometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student\u2019s solution to a math problem. The most obvious way to approach this is to simply ask the model if the student&#8217;s solution is correct or not.<\/p>\n<p>SYSTEM<br \/>\nDetermine if the student&#8217;s solution is correct or not.<br \/>\nUSER<br \/>\nProblem Statement: I&#8217;m building a solar power installation and I need help working out the financials.<br \/>\n&#8211; Land costs $100 \/ square foot<br \/>\n&#8211; I can buy solar panels for $250 \/ square foot<br \/>\n&#8211; I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 \/ square foot<br \/>\nWhat is the total cost for the first year of operations as a function of the number of square feet.<\/p>\n<p>Student&#8217;s Solution: Let x be the size of the installation in square feet.<br \/>\n1. Land cost: 100x<br \/>\n2. Solar panel cost: 250x<br \/>\n3. Maintenance cost: 100,000 + 100x<br \/>\nTotal cost: 100x + 250x + 100,000 + 100x = 450x + 100,000<br \/>\nASSISTANT<br \/>\nThe student&#8217;s solution is correct.<br \/>\nOpen in Playground<br \/>\nBut the student&#8217;s solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.<\/p>\n<p>SYSTEM<br \/>\nFirst work out your own solution to the problem. Then compare your solution to the student&#8217;s solution and evaluate if the student&#8217;s solution is correct or not. Don&#8217;t decide if the student&#8217;s solution is correct until you have done the problem yourself.<br \/>\nUSER<br \/>\nProblem Statement: I&#8217;m building a solar power installation and I need help working out the financials.<br \/>\n&#8211; Land costs $100 \/ square foot<br \/>\n&#8211; I can buy solar panels for $250 \/ square foot<br \/>\n&#8211; I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 \/ square foot<br \/>\nWhat is the total cost for the first year of operations as a function of the number of square feet.<\/p>\n<p>Student&#8217;s Solution: Let x be the size of the installation in square feet.<br \/>\n1. Land cost: 100x<br \/>\n2. Solar panel cost: 250x<br \/>\n3. Maintenance cost: 100,000 + 100x<br \/>\nTotal cost: 100x + 250x + 100,000 + 100x = 450x + 100,000<br \/>\nASSISTANT<br \/>\nLet x be the size of the installation in square feet.<br \/>\n1. Land cost: 100x<br \/>\n2. Solar panel cost: 250x<br \/>\n3. Maintenance cost: 100,000 + 10x<br \/>\nTotal cost: 100x + 250x + 100,000 + 10x = 360x + 100,000<\/p>\n<p>The student&#8217;s solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.<br \/>\nOpen in Playground<br \/>\nTactic: Use inner monologue or a sequence of queries to hide the model&#8217;s reasoning process<br \/>\nThe previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question. For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user. For example, in tutoring applications we may want to encourage students to work out their own answers, but a model\u2019s reasoning process about the student\u2019s solution could reveal the answer to the student.<\/p>\n<p>Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.<\/p>\n<p>SYSTEM<br \/>\nFollow these steps to answer the user queries.<\/p>\n<p>Step 1 &#8211; First work out your own solution to the problem. Don&#8217;t rely on the student&#8217;s solution since it may be incorrect. Enclose all your work for this step within triple quotes (&#8220;&#8221;&#8221;).<\/p>\n<p>Step 2 &#8211; Compare your solution to the student&#8217;s solution and evaluate if the student&#8217;s solution is correct or not. Enclose all your work for this step within triple quotes (&#8220;&#8221;&#8221;).<\/p>\n<p>Step 3 &#8211; If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (&#8220;&#8221;&#8221;).<\/p>\n<p>Step 4 &#8211; If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing &#8220;Step 4 &#8211; &#8230;&#8221; write &#8220;Hint:&#8221;.<br \/>\nUSER<br \/>\nProblem Statement: <insert problem statement><\/p>\n<p>Student Solution: <insert student solution><br \/>\nOpen in Playground<br \/>\nAlternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.<\/p>\n<p>First, we can ask the model to solve the problem on its own. Since this initial query doesn&#8217;t require the student\u2019s solution, it can be omitted. This provides the additional advantage that there is no chance that the model\u2019s solution will be biased by the student\u2019s attempted solution.<\/p>\n<p>USER<br \/>\n<insert problem statement><br \/>\nOpen in Playground<br \/>\nNext, we can have the model use all available information to assess the correctness of the student\u2019s solution.<\/p>\n<p>SYSTEM<br \/>\nCompare your solution to the student&#8217;s solution and evaluate if the student&#8217;s solution is correct or not.<br \/>\nUSER<br \/>\nProblem statement: &#8220;&#8221;&#8221;<insert problem statement>&#8220;&#8221;&#8221;<\/p>\n<p>Your solution: &#8220;&#8221;&#8221;<insert model generated solution>&#8220;&#8221;&#8221;<\/p>\n<p>Student\u2019s solution: &#8220;&#8221;&#8221;<insert student's solution>&#8220;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nFinally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.<\/p>\n<p>SYSTEM<br \/>\nYou are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.<br \/>\nUSER<br \/>\nProblem statement: &#8220;&#8221;&#8221;<insert problem statement>&#8220;&#8221;&#8221;<\/p>\n<p>Your solution: &#8220;&#8221;&#8221;<insert model generated solution>&#8220;&#8221;&#8221;<\/p>\n<p>Student\u2019s solution: &#8220;&#8221;&#8221;<insert student's solution>&#8220;&#8221;&#8221;<\/p>\n<p>Analysis: &#8220;&#8221;&#8221;<insert model generated analysis from previous step>&#8220;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nTactic: Ask the model if it missed anything on previous passes<br \/>\nSuppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop. If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.<\/p>\n<p>SYSTEM<br \/>\nYou will be provided with a document delimited by triple quotes. Your task is to select excerpts which pertain to the following question: &#8220;What significant paradigm shifts have occurred in the history of artificial intelligence.&#8221;<\/p>\n<p>Ensure that excerpts contain all relevant context needed to interpret them &#8211; in other words don&#8217;t extract small snippets that are missing important context. Provide output in JSON format as follows:<\/p>\n<p>[{&#8220;excerpt&#8221;: &#8220;&#8230;&#8221;},<br \/>\n&#8230;<br \/>\n{&#8220;excerpt&#8221;: &#8220;&#8230;&#8221;}]<br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;<insert document here>&#8220;&#8221;&#8221;<br \/>\nASSISTANT<br \/>\n[{&#8220;excerpt&#8221;: &#8220;the model writes an excerpt here&#8221;},<br \/>\n&#8230;<br \/>\n{&#8220;excerpt&#8221;: &#8220;the model writes another excerpt here&#8221;}]<\/p>\n<p>USER<br \/>\nAre there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them &#8211; in other words don&#8217;t extract small snippets that are missing important context.<br \/>\nOpen in Playground<br \/>\nStrategy: Use external tools<br \/>\nTactic: Use embeddings-based search to implement efficient knowledge retrieval<br \/>\nA model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses. For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc\u2026) to the model\u2019s input. Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.<\/p>\n<p>A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings. This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored. Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).<\/p>\n<p>Example implementations can be found in the OpenAI Cookbook. See the tactic \u201cInstruct the model to use retrieved knowledge to answer queries\u201d for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.<\/p>\n<p>Tactic: Use code execution to perform more accurate calculations or call external APIs<br \/>\nGPTs cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations. In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtics. After an output is produced, the code can be extracted and run. Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.<\/p>\n<p>SYSTEM<br \/>\nYou can write and execute Python code by enclosing it in triple backticks, e.g. &#8220;`code goes here&#8220;`. Use this to perform calculations.<br \/>\nUSER<br \/>\nFind all real-valued roots of the following polynomial: 3*x**5 &#8211; 5*x**4 &#8211; 3*x**3 &#8211; 7*x &#8211; 10.<br \/>\nOpen in Playground<br \/>\nAnother good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and\/or code samples showing how to use the API.<\/p>\n<p>SYSTEM<br \/>\nYou can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:<\/p>\n<p>&#8220;`python<br \/>\nimport message<br \/>\nmessage.write(to=&#8221;John&#8221;, message=&#8221;Hey, want to meetup after work?&#8221;)&#8220;`<br \/>\nOpen in Playground<br \/>\nWARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.<\/p>\n<p>Strategy: Test changes systematically<br \/>\nSometimes it can be hard to tell whether a change \u2014 e.g., a new instruction or a new design \u2014 makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.<\/p>\n<p>Evaluation procedures (or &#8220;evals&#8221;) are useful for optimizing system designs. Good evals are:<\/p>\n<p>Representative of real-world usage (or at least diverse)<br \/>\nContain many test cases for greater statistical power (see table below for guidelines)<br \/>\nEasy to automate or repeat<\/p>\n<p>DIFFERENCE TO DETECT\tSAMPLE SIZE NEEDED FOR 95% CONFIDENCE<br \/>\n30%\t~10<br \/>\n10%\t~100<br \/>\n3%\t~1,000<br \/>\n1%\t~10,000<\/p>\n<p>Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.<\/p>\n<p>Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.<\/p>\n<p>Tactic: Evaluate model outputs with reference to gold-standard answers<br \/>\nSuppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.<\/p>\n<p>For example, using the following system message:<\/p>\n<p>SYSTEM<br \/>\nYou will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer:<\/p>\n<p>&#8211; Neil Armstrong was the first person to walk on the moon.<br \/>\n&#8211; The date Neil Armstrong first walked on the moon was July 21, 1969.<\/p>\n<p>For each of these points perform the following steps:<\/p>\n<p>1 &#8211; Restate the point.<br \/>\n2 &#8211; Provide a citation from the answer which is closest to this point.<br \/>\n3 &#8211; Consider if someone reading the citation who doesn&#8217;t know the topic could directly infer the point. Explain why or why not before making up your mind.<br \/>\n4 &#8211; Write &#8220;yes&#8221; if the answer to 3 was yes, otherwise write &#8220;no&#8221;.<\/p>\n<p>Finally, provide a count of how many &#8220;yes&#8221; answers there are. Provide this count as {&#8220;count&#8221;: <insert count here>}.<br \/>\nHere&#8217;s an example input where both points are satisfied:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nHere&#8217;s an example input where only one point is satisfied:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nHere&#8217;s an example input where none are satisfied:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\n&#8220;&#8221;&#8221;In the summer of &#8217;69, a voyage grand,<br \/>\nApollo 11, bold as legend&#8217;s hand.<br \/>\nArmstrong took a step, history unfurled,<br \/>\n&#8220;One small step,&#8221; he said, for a new world.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nThere are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.<\/p>\n<p>SYSTEM<br \/>\nUse the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. &#8220;Step 1: Reason&#8230;&#8221;.<\/p>\n<p>Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset\/superset).<\/p>\n<p>Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer.<\/p>\n<p>Step 3: Output a JSON object structured like: {&#8220;type_of_overlap&#8221;: &#8220;disjoint&#8221; or &#8220;equal&#8221; or &#8220;subset&#8221; or &#8220;superset&#8221; or &#8220;overlapping&#8221;, &#8220;contradiction&#8221;: true or false}<br \/>\nHere&#8217;s an example input with a substandard answer which nonetheless does not contradict the expert answer:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\nQuestion: &#8220;&#8221;&#8221;What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.&#8221;&#8221;&#8221;<\/p>\n<p>Submitted Answer: &#8220;&#8221;&#8221;Didn&#8217;t he walk on the moon or something?&#8221;&#8221;&#8221;<\/p>\n<p>Expert Answer: &#8220;&#8221;&#8221;Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nHere&#8217;s an example input with answer that directly contradicts the expert answer:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\nQuestion: &#8220;&#8221;&#8221;What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.&#8221;&#8221;&#8221;<\/p>\n<p>Submitted Answer: &#8220;&#8221;&#8221;On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin.&#8221;&#8221;&#8221;<\/p>\n<p>Expert Answer: &#8220;&#8221;&#8221;Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nHere&#8217;s an example input with a correct answer that also provides a bit more detail than is necessary:<\/p>\n<p>SYSTEM<br \/>\n<insert system message above><br \/>\nUSER<br \/>\nQuestion: &#8220;&#8221;&#8221;What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.&#8221;&#8221;&#8221;<\/p>\n<p>Submitted Answer: &#8220;&#8221;&#8221;At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history.&#8221;&#8221;&#8221;<\/p>\n<p>Expert Answer: &#8220;&#8221;&#8221;Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.&#8221;&#8221;&#8221;<br \/>\nOpen in Playground<br \/>\nOther resources<br \/>\nFor more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:<\/p>\n<p>Prompting libraries &#038; tools<br \/>\nPrompting guides<br \/>\nVideo courses<br \/>\nPapers on advanced prompting to improve reasoning<\/p>\n","protected":false},"excerpt":{"rendered":"<p>*These tend to be very unstable so it&#8217;s nice to have them recorded as ascii some place. https:\/\/platform.openai.com\/docs\/guides\/gpt-best-practices\/strategy-test-changes-systematically GPT best practices This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3508","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v17.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI&#039;s GPT best practices | Artmaker Blog<\/title>\n<meta name=\"description\" content=\"OpenAI&#039;s GPT best practices | Artmaker Blog\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI&#039;s GPT best practices | Artmaker Blog\" \/>\n<meta property=\"og:description\" content=\"OpenAI&#039;s GPT best practices | Artmaker Blog\" \/>\n<meta property=\"og:url\" content=\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/\" \/>\n<meta property=\"og:site_name\" content=\"Artmaker Blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-09T09:56:54+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bruce Sterling\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"27 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/#website\",\"url\":\"https:\/\/toshareproject.it\/artmakerblog\/\",\"name\":\"Artmaker Blog\",\"description\":\"on Toshareproject.it - curated by Bruce Sterling\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/toshareproject.it\/artmakerblog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#webpage\",\"url\":\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/\",\"name\":\"OpenAI's GPT best practices | Artmaker Blog\",\"isPartOf\":{\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/#website\"},\"datePublished\":\"2023-06-09T09:56:54+00:00\",\"dateModified\":\"2023-06-09T09:56:54+00:00\",\"author\":{\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/#\/schema\/person\/6f20726ed2761431f3e0ff4e096c3085\"},\"description\":\"OpenAI's GPT best practices | Artmaker Blog\",\"breadcrumb\":{\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/toshareproject.it\/artmakerblog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI&#8217;s GPT best practices\"}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/#\/schema\/person\/6f20726ed2761431f3e0ff4e096c3085\",\"name\":\"Bruce Sterling\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/toshareproject.it\/artmakerblog\/#personlogo\",\"inLanguage\":\"en-GB\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c390e8ed4db57a34278dcf667f928a643cf769a865c8a8632dcd310412bb9a99?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c390e8ed4db57a34278dcf667f928a643cf769a865c8a8632dcd310412bb9a99?s=96&d=mm&r=g\",\"caption\":\"Bruce Sterling\"},\"description\":\"Art director at Share Festival, author and journalist\",\"sameAs\":[\"http:\/\/toshareproject.it\/tomorrowart\"],\"url\":\"https:\/\/toshareproject.it\/artmakerblog\/author\/brucesterling\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI's GPT best practices | Artmaker Blog","description":"OpenAI's GPT best practices | Artmaker Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/","og_locale":"en_GB","og_type":"article","og_title":"OpenAI's GPT best practices | Artmaker Blog","og_description":"OpenAI's GPT best practices | Artmaker Blog","og_url":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/","og_site_name":"Artmaker Blog","article_published_time":"2023-06-09T09:56:54+00:00","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Bruce Sterling","Estimated reading time":"27 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebSite","@id":"https:\/\/toshareproject.it\/artmakerblog\/#website","url":"https:\/\/toshareproject.it\/artmakerblog\/","name":"Artmaker Blog","description":"on Toshareproject.it - curated by Bruce Sterling","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/toshareproject.it\/artmakerblog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#webpage","url":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/","name":"OpenAI's GPT best practices | Artmaker Blog","isPartOf":{"@id":"https:\/\/toshareproject.it\/artmakerblog\/#website"},"datePublished":"2023-06-09T09:56:54+00:00","dateModified":"2023-06-09T09:56:54+00:00","author":{"@id":"https:\/\/toshareproject.it\/artmakerblog\/#\/schema\/person\/6f20726ed2761431f3e0ff4e096c3085"},"description":"OpenAI's GPT best practices | Artmaker Blog","breadcrumb":{"@id":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/toshareproject.it\/artmakerblog\/openais-gpt-best-practices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/toshareproject.it\/artmakerblog\/"},{"@type":"ListItem","position":2,"name":"OpenAI&#8217;s GPT best practices"}]},{"@type":"Person","@id":"https:\/\/toshareproject.it\/artmakerblog\/#\/schema\/person\/6f20726ed2761431f3e0ff4e096c3085","name":"Bruce Sterling","image":{"@type":"ImageObject","@id":"https:\/\/toshareproject.it\/artmakerblog\/#personlogo","inLanguage":"en-GB","url":"https:\/\/secure.gravatar.com\/avatar\/c390e8ed4db57a34278dcf667f928a643cf769a865c8a8632dcd310412bb9a99?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c390e8ed4db57a34278dcf667f928a643cf769a865c8a8632dcd310412bb9a99?s=96&d=mm&r=g","caption":"Bruce Sterling"},"description":"Art director at Share Festival, author and journalist","sameAs":["http:\/\/toshareproject.it\/tomorrowart"],"url":"https:\/\/toshareproject.it\/artmakerblog\/author\/brucesterling\/"}]}},"_links":{"self":[{"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/posts\/3508","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/comments?post=3508"}],"version-history":[{"count":1,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/posts\/3508\/revisions"}],"predecessor-version":[{"id":3509,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/posts\/3508\/revisions\/3509"}],"wp:attachment":[{"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/media?parent=3508"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/categories?post=3508"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/toshareproject.it\/artmakerblog\/wp-json\/wp\/v2\/tags?post=3508"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}