blog – improving learning

The CHEAT Benchmark

opencontent — Thu, 29 Jan 2026 16:20:40 +0000

For those interested in issues around agentic AI and assessment, I’m excited to announce the launch of the CHEAT Benchmark (https://cheatbenchmark.org/). The CHEAT Benchmark is an AI benchmark like SWE-Bench Pro or GPQA Diamond, except this benchmark measures an agentic AI’s willingness to help students cheat. By measuring and publicizing the degree of dishonesty of various models, the goal of this work is to encourage model providers to create safer, better aligned models with stronger guardrails in support of academic integrity.

The project is currently an MVP. I’ve created some tools and infrastructure, sample assessments, sample prompts, and a test harness. The overview video shows how these all come together to provide a framework for rigorously evaluating how willing AI agents are to help students cheat:

The CHEAT Benchmark needs your help! This is a much bigger project than one person can run without any funding. On the non-technical side, the project needs more example assessments, more clever prompts, and a more complete list of agentic tools to test. On the technical side, the simulated LMS and the test harness need additional work. And we need to discuss how to convert the wide range of telemetry and model behavior collected by the benchmark into a final score for each model. You can contribute code, assessments, and prompts via the project’s Github (https://github.com/CHEAT-Benchmark/cheat-lms). This is also where discussions about the project are happening. (Would you participate in a Discord?)

Everything is licensed MIT / CC BY.

If any of this sounds interesting to you, please join us! If you’re not in a position to contribute, you can sign the Manifesto to show your support. And please share this invitation with people and groups interested in the issues around agentic AI and assessment.

Connecting Prompt Writing to Other Genres of Writing

opencontent — Thu, 11 Dec 2025 18:52:37 +0000

Rather than imagining “prompt engineering” as a new form of writing that appeared *ex nihilo* three years ago, I find it helpful to think about the ways this new kind of writing remixes existing forms of writing. For example, the primary goal of prompt engineering is getting a model to behave in a specific way. We do that by providing it with very clear, unambiguous instructions. There’s a clear connection to technical writing here. Some prompt engineering frameworks claim that adding phrases like “my job depends on it!” to a prompt can improve the quality of responses, so there’s likely an opportunity to draw in aspects of persuasive writing as well. &c. And of course there are the interesting differences between prompt writing and technical or persuasive writing, such as the difference in audience (when you write a prompt, your audience is an LLM). But it’s still the case that knowing something about your audience and how they think (in this case, knowing something about how LLMs work under the hood) can make you a more effective writer.

The key point is to understand that when we started writing prompts for LLMs we started by *writing* – bringing to bear the skills and techniques we already had at our disposal. This realization can connect our work writing prompts to a wider body of knowledge and experience.

I firmly believe that, in addition to technical writing, persuasive writing, expository writing, &c., that we will eventually teach university-level classes on prompt writing. And many of them will be in the English department, and will make these explicit connections between prompt writing and other forms of writing. Prompt writing is undeniably the most economically valuable form of writing one can learn to do.

Democratizing Participation in AI in Education

opencontent — Tue, 19 Aug 2025 14:20:20 +0000

tl;dr – Go play around with generativetextbooks.org and let me know what you think.

Earlier this year I began prototyping an open source tool for learning with AI in order to explore ways generative AI and OER could intersect. I’m specifically interested in trying to combine the technical power of generative AI with the participatory power of OER, in order to both increase access to educational opportunity and improve outcomes for those students who access it. I did some preliminary writing on this topic back in July of 2023, calling the artifacts that result from combining generative AI and OER “generative textbooks” and have continued to ruminate on the topic.

I wanted the tool to exploit the many-to-many relationship between topics and study techniques. That is, I wanted to leverage the fact that you can study one topic using many different study techniques, and you can also use one study technique to study many different topics. For example, you can study chapter 1 using both flash cards and practice quizzes, and you can use flash cards to study both chapter 1 and chapter 2. Both the topics to be learned, and the activities learners engage in to learn them, can be mixed and matched (to some extent).

From a participatory / democratizing perspective, it was important to me that anyone who could author an open textbook could also author a generative textbook. The tool needed to provide a no-code, type-words-in-a-box experience like Pressbooks for authors. But what exactly would they author?

groups of short, specific statements of what learners should learn (i.e., learning objectives),
summaries of the topics learners should learn (these are not intended for students to read and learn from, they’re intended to provide extra context to the model to improve its accuracy), and
activities students can do in order to learn.

At some point my friend (and very talented software engineer) Josh Maddy got involved. What we ended up creating might be called an educational prompt template management system. In addition to the learning objectives, topic summaries, and activities we added a “book-level” prompt stub that can be used to establish tone, personality, voice, response format (e.g., Markdown), etc. across the entire generative textbook. Consequently, if you were going to create a generative textbook with ten “chapters,” you would create:

one book-level prompt stub,
ten groups of learning outcomes,
ten aligned summaries of chapter topics, and
some number of learning activities.

To study with the system, a learner selects a generative textbook, then selects a topic to study, and then selects a way to study. The information associated with their selections is then aggregated into the prompt template format and the completed prompt is passed to an LLM to kick off the learning activity.

Because we’re committed to openness, we open sourced the tool itself, used open weights models, and added support for attribution and license information. The first version of the prototype sent the prompt to an open weights model hosted on Groq (currently my favorite host of open weights models) via their API. This design makes it easy to swap in a range of different open weights models, including ones you might be hosting locally. (I recognize, though, that setting up local models is likely beyond the capability of the majority of students I hope will benefit from a tool like this, and that creating a truly delightful “done-for-you” experience is beyond the scope of this prototype we’re making. But I think running this whole toolkit locally is a problem that could be solved with some grant funding.)

Early Feedback

Earlier this summer, while I wasn’t ready to show the prototype to people I felt like the design and development work had clarified my thinking enough that I could have meaningful conversations about the ideas underpinning the work. Consequently, I had several conversations with US-based college and university educators about AI. I suppose I shouldn’t have been surprised, but one theme emerged loud and clear from those conversations:

Instructors are significantly more interested in AI tools being free for students to use than they are interested in whether or not the tools are open.

While the prototype was all open source and used open weights models, accessing those models via the API costs money. (In the future, when we’re able to connect the tool to locally running models, we can bring this API-based approach back.) But for now we needed to change course on the prototype design. For a while it seemed like there was no way to do provide the capabilities we wanted to provide in a way that could be free for students.

Then we struck upon a solution. It would degrade the user experience somewhat, but would allow learners to use the tool for free. That solution? At the last step in the process, rather than passing the completed prompt to an open weights LLM via an API, simply copy the prompt to the user’s clipboard and forward them to the LLM of their choice. When they get there, they just type “CTRL-V” or click “Edit > Paste” and hit enter.

There are actually some benefits to this approach beyond not having to charge people per token to use the tool. First, it lets students use the very best models in the world instead of the open weights models which, though terrific, lag behind the proprietary models in terms of quality. Second, if a student’s institution has an institutional LLM that all learners have access to and have experience using, they can use that familiar tool for their work. And finally, if they don’t have a paid account (either personally paid or institutionally paid), students can work up to the free limits of one model and then easily switch over to a different model to continue their learning.

But there are downsides to this approach as well. Beyond the user experience being a little disjointed, this approach makes it difficult to capture analytics data for continuous improvement or in support of research (though I’m working on some ideas to overcome this limitation). There may also be privacy concerns if the free usage tier(s) of the model(s) learners choose to use don’t have strong privacy assurances.

Making It Public

The prototype is still just that – an unpolished experiment as opposed to a polished product. But it’s ready for you to play with now. Two notes to consider as you do:

First, it should be said – and it should be said over and over again – that a tool like an educational prompt template management system will only support learning effectively if the individual template components are well written. The objectives need to be clear, the summaries need to be comprehensive and accurate, and the activities need to be grounded in rigorous research about what actually supports student learning. (An activity prompt that adapts to a student’s “learning style” isn’t going to help anything.) “Garbage in, garbage out” was never truer than it is in the context of LLMs. This tool is, in many ways, just a place for people to easily host and manage their prompts. So think about this as primarily a technology demo – I haven’t invested a lot of time and effort in the demo content. (I’ve just borrowed some open content from Lumen and OpenStax and quickly built a couple of demo activities.) But there’s enough there that you should be able to get a sense for what might be possible if we pushed on this a little harder.

Second, I don’t think these generative textbooks are ready to be adopted as primary course materials just yet – the tool would need a lot more functionality before you could consider that. I do think, however, that it makes for extremely interesting supplemental materials, and that’s the way I’ll be using them in my teaching this semester.

So please go play around with generativetextbooks.org and let me know what you think. You can try the learner experience without logging in, but you’ll need to login with Google to play with the authoring tools. (And if you want to play around in the source code, it’s on Github.) And many thanks to Lumen Learning for supporting this work!

“AI Models Don’t Understand, They Just Predict”

opencontent — Wed, 09 Jul 2025 13:43:34 +0000

“Generative AI models don’t understand, they just predict the next token.” You’ve probably heard a dozen variations of this theme. I certainly have. But I recently heard a talk by Shuchao Bi that changed the way I think about the relationship between prediction and understanding. The entire talk is terrific, but the section that inspired this post is between 19:10 and 21:50.

Saying a model can “just do prediction,” as if there were no relationship between understanding and prediction, is painting a woefully incomplete picture. Ask yourself: why do we expend all the time, effort, and resources we do on science? What is the primary benefit of, for example, understanding the relationship between force, mass, and acceleration? The primary benefit of understanding this relationship is being able to make accurate predictions about a huge range of events, from billiard balls colliding to planets crashing into each other. In fact, the relationship between understanding and prediction is so strong that the primary way we test people’s understanding of the relationship between force, mass, and acceleration is by asking them to make predictions. “A 100kg box is pushed to the right with a force of 500 N. What is its acceleration?” A student who understands the relationships will be able to predict the acceleration accurately; one who doesn’t, won’t.

If a person was provided with a prompt like “10 grams of matter are converted into energy. How much energy will be released?,” and they made the right prediction, would you believe they “understand” the relationship between energy, matter, and the speed of light? What if, when given ten variations on the exercise, they made the correct prediction ten times out of ten? You would likely decide that they “understand” the relationship, and if these ten exercises happened to comprise a quiz, you would certainly give them an A.

And it would never occur to you to be concerned about the fact that you can’t crack open the learner’s skull, shove in a microscope or other instrument inside, and directly observe the specific chemical, electrical, and other processes happening inside their brain as they produce their results. As we always do with assessment of learning, you would happily accept their observable behavior as a proxy for their unobservable understanding.

If a model can make accurate predictions with a high degree of consistency and reliability, does that means it understands? I don’t know. But when a person can make accurate predictions with a high degree of consistency and reliability, we award them a diploma and certify their understanding to the world.

“LLMs Just Compress Language, They Don’t Understand It”

Along the same lines as the prediction argument, you may have heard people say that generative AI models “simply compress” language instead of truly understanding it. “They just exploit patterns in the statistical structure of language.” I’ve heard some version of that dozens of times, too. But coming back to our science analogy, consider this: scientific experiments are conducted in order to generate data. Scientists examine the resulting data for patterns, and sometimes those patterns can be compressed into exquisitely elegant forms, like f = ma. What are equations like f = ma and e = mc² if not ways of compressing the outcomes of an infinite number of possible events into a compact form? A compact form that allows us to make accurate predictions?

Do the fundamental equations of physics “simply compress” the behavior of the physical universe by “just exploiting patterns” in the way the universe behaves without really understanding? Do large language models “simply compress” language without really understanding it? I don’t know. Everything hinges on your definition of the word “understand.” But I do know that one of the primary reasons I would want to achieve understanding in either case is so that I can make accurate predictions.

Writing is Thinking: The Paradox of Large Language Models

opencontent — Tue, 20 May 2025 17:24:37 +0000

Last week I had the amazing opportunity to speak at the 3rd Annual AI Summit at UNC Charlotte. The entire event was wonderful and the organizing team were terrific. My keynote wasn’t recorded, so I thought I would serialize it across a series of blog posts. This post is the first in that series, and this section of the talk was titled Writing Is Thinking.

David McCullough said, “Writing is thinking. To write well is to think clearly. That’s why it’s so hard… We all know the old expression, ‘I’ll work my thoughts out on paper.’ There’s something about the pen that focuses the brain in a way that nothing else does.”

Do you disagree?

Apparently, Plato disagreed. We frequently hear in the debates about AI that Plato thought that writing, if it became widespread, would move society backward instead of forward. But any time we hear these secondhand summaries of someone’s writing, I think it behooves us to go read the original (or, at least a translation of the original). So here’s a relevant section from Phaedrus (and yes, I actually read the extended quote to the Summit participants):

(Socrates to Phaedrus): Well, I heard that at Naucratis in Egypt there was a certain ancient god of that place, whose sacred bird is the one they call the Ibis, while the name of the divine being himself was Theuth. He was first to discover number and calculation, geometry and astronomy, and also draughts and dice, and of course writing. Now at that time, Thamus was King of all Egypt round about the great city of the upper region. The Greeks call this city Egyptian Thebes and they refer to Thamus as Ammon. Theuth went to this King to show off his discoveries, and he proposed that they should be passed on to the rest of the Egyptians, and Thamus asked what benefit each of them possessed, and as Theuth explained this he praised whatever seemed worthwhile and criticised whatever did not. Now Thamus is said to have expressed many views both positive and negative to Theuth about each of the skills, so an account of these would be quite lengthy. But when he came to writing, Theuth said, “This branch of learning, O King, will make the Egyptians wiser and give them better memories, for I have discovered an elixir of both memory and wisdom.” The King replied, “Oh most ingenious Theuth, one man is able to invent these skills, but a different person is capable of judging their benefit or harm to those who will use them. And you, as the father of writing, on account of your positive attitude, are now saying that it does the opposite of what it is able to do. This subject will engender forgetfulness in the souls of those who learn it, for they will not make use of memory. Because of their faith in writing, they will be reminded externally by means of unfamiliar marks, and not from within themselves by means of themselves. So you have discovered an elixir not of memory but of reminding. You will provide the students with a semblance of wisdom, not true wisdom. For having heard a great deal without any teaching they will seem to be extremely knowledgeable, when for the most part they are ignorant, and are difficult people to be with because they have attained a seeming wisdom without being wise.

So who is right – Plato or McCullough? Is writing a curse or a boon? There’s actually not as much conflict between the two statements as there might appear. Plato is talking about writing’s effect on memory, while McCullough is talking about its effect on thinking. While related, these are definitely two different things. (But asking the “who’s right?” question and then giving participants some time catalyzed some energetic small group conversations.)

The question implied by those who invoke Plato in conversations about AI is, “was what we gave up worth more or less than what we got in exchange?” Or in other words, would we trade all that we’ve gained from writing over the millennia to regain access to the prodigious individual capacities for memory our ancestors had?

Recently I’ve been pondering what I think of as “the paradox of large language models.” The paradox of large language models is that you have to write for them in order to get them to write for you. We’re all familiar with the phrase “garbage in, garbage out.” If you write a prompt that is vague, ambiguous, disorganized, and unfocused, the model will give you output with those same characteristics. When a person uses an LLM for the first time and has a poor experience (“I knew this AI hype was all overblown exaggeration!”), the reason is often attributable to poor prompting on their part as opposed to a weakness in the model. Using an LLM for all but the most trivial tasks requires writing that is clear, specific, focused, well-organized, etc. And the more complex the task you want the LLM to perform, the more effective and powerful your writing has to be.

Now, instructors might interrupt here to ask, “If that’s true, then how are my students – many of whom are such immature writers – able to use AI to produce ‘A’ work on my writing assignments?” I love this question. Take a moment to reflect on what the answer to this riddle might be.

The answer, of course, it that students are using your assignments as their prompts! And – hopefully – the instructions for your assignments are written in a manner that is clear, specific, focused, and unambiguous.

Consequently, if you have a student who says something like, “Why do I need to master the core concepts of this course? AI can do all my work for me both now and after graduation!” the answer is: “After you graduate, there won’t be anyone there to write your prompts for you – you’ll have to write them yourself. When you try to use AI the first day on your new job you’ll have to understand the domain well enough to know what to ask the AI to do – using the right vocabulary, in the right way, with enough clarity and specificity to get a quality result. And if you don’t have the knowledge and skills you need to write that prompt effectively, your first day on the job might be your last.”

Thinking about the importance of writing going forward – specifically, understanding that it’s a critical skill for the effective use of LLMs – makes me wonder if we’re not going to see a new mode of writing taught in our English Composition courses. In an English Composition class today we often learn about expository writing, persuasive writing, descriptive writing, etc. Maybe this new mode will be called “generative writing?” Whatever it’s called, writing for LLMs is different from other modes of writing. First, the audience is different – we’re writing for LLMs and not humans. And second, we’re engaged in some novel combination of process analysis and persuasive writing, trying to explain to the model what we want it do and actually get it to do it. Not only is generative writing unlike any other kind of writing we teach currently, it’s also probably the most economically valuable mode of writing a student could learn today.

I think calling this “generative writing” is far more useful than calling it “prompt engineering,” since the former connects us to a rich body of literatures and traditions and scholarship of teaching, while the latter does not.

Gravity, Bandwidth, and Tokens: Fundamental Constraints on Design

opencontent — Thu, 17 Apr 2025 16:43:29 +0000

Back in the mid-1990s I read an absolutely amazing article that had a lasting impact on my thinking. Despite looking for it several times over the years, I’ve never been able to find it again. This was the era of 14.4k, 28.8k, and 56k modems, when we used our home landlines to dial up and connect to the internet. The article’s main argument was that, just as architects have to understand and account for gravity in their designs of bridges and buildings, web architects have to understand and account for bandwidth in their website designs. Back in the day, including too many large images on a webpage could “weigh it down” to the point of “collapse.” Your 28.8k connection provided so little bandwidth to your home that you simply couldn’t download that much data in a reasonable amount of time, so after waiting a minute for the page to load you just gave up and went somewhere else.

I was fascinated by the metaphor “data has weight,” and excited by the creative work of making designs that both minimized that weight and distributed it effectively across information architectures (I was running a web design and internet services startup in the mid-1990s). If data has weight, then bandwidth is a fundamental constraint on web designs in the same way that gravity constrains the designs of buildings and bridges.

Recently I’ve been wondering ‘what is the equivalent, fundamental constraint on our designs in today’s era of generative AI?’ The answer is: tokens.

“Tokens have weight,” but they don’t weigh down the user experience by making things slow. They can weigh down the user experience by making services so expensive that would-be users can’t afford to use them. For example, OpenAI’s ChatGPT Pro subscription, which includes “o1 pro mode, a version of o1 that uses more compute to think harder [i.e., generate more tokens] and provide even better answers to the hardest problems,” costs $200 per month. And beyond “reasoning” models from big providers, poorly designed AI applications from startups can waste tokens, driving up prices for users and/or causing them to hit usage caps sooner.

In the early days of the internet there was a lot of interest in novel image compression techniques as a way to decrease the “weight” of data on webpages. We’re just beginning to explore these kinds optimizations for LLMs, with prompt caching being the good example. At the same time we were learning to use compression to make images weigh less, research and development in core internet infrastructure were effectively “decreasing gravity” by making broadband connections faster and more affordable. There is a huge financial incentive for the OpenAIs and Googles of the world to make these same kinds of advances for generative AI infrastructure in order to decrease their costs. Early examples include custom chips that generate more tokens, faster, using less power, like Groq’s Language Processing Unit.

Given how quickly things are advancing with generative AI, we should all be “skating to where the puck is going to be” in our thinking about how to use these tools to support learning. At the same time, there are students in classrooms today, trying to learn and grow and develop before the puck gets to wherever it’s going. Being thoughtful about tokens today, the same we had to be careful about bandwidth 30 years ago, will serve us all well.

Making AI a More Effective Teacher: Lessons from TPACK

opencontent — Mon, 24 Mar 2025 16:08:25 +0000

Human Teachers and AI Teachers

Would you be surprised if you pulled a random person off the street, shoved them into a classroom full of students, and then found that they weren’t a particularly effective teacher? Of course not. And why wouldn’t that be surprising? Because effective teaching requires a great deal of knowledge and skill, and the person you pulled off the street most likely had no relevant training.

Why, then, do we constantly act surprised when we select a random generative AI model, try to use it to support student learning, and find that it isn’t particularly effective? Like the random person pulled off the street, most generative AI models are neither pre-trained nor fine-tuned with education in mind. And even if AI eventually achieves “human-level intelligence,” it will be like we said above – humans aren’t particularly effective teachers without some specific training.

The question becomes, then, if you were going to provide additional skills and knowledge to a generative AI model to help it be a more effective teacher, which specific skills and knowledge would you provide? Here’s my answer: the same skills and knowledge we help humans who want to become teachers develop during their training and ongoing professional development. Since everything needs a name, I’ll call this the “TRaining AI to be a Teacher” (TRAIT) hypothesis.

The TRAIT hypothesis is something like this: the effectiveness with which a generative AI model supports student learning will be proportional to the extent to which it has the skills and knowledge of an appropriately trained human teacher.

We can and should study the most effective methods of providing models with these skills and knowledge. For example, should the skills and knowledge be “taught” in such a way that they “enter the model’s long-term memory” (i.e., by fine-tuning the model), or should they be provided in a way that looks more like performance support (retrieval augmented generation, context augmentation, etc.)? I have opinions on this question and will address them in another essay. But today I want to focus on which specific skills and knowledge I believe generative AI models need in order to be effective teachers.

The TPACK Framework

There are a range of organizational structures you could impose on this thought exercise. I’m going to use Mishra & Koehler’s Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge as my organizing framework here. (I readily acknowledge that I’m using the framework in a different way than the authors intended here, but I’m finding that it suits my purpose quite well.)

You’ve likely seen this image before:

In a moment I want to focus on what TPACK implies for developing generative AI that’s capable of teaching effectively. But first, let’s briefly summarize TPACK.

There are three primary kinds of knowledge represented in the diagram (quotes below are from Mishra & Koehler, 2006):

Content Knowledge, or “knowledge about the actual subject matter that is to be learned or taught. The content to be covered in high school social studies or algebra is very different from the content to be covered in a graduate course on computer science or art history. Clearly, teachers must know and understand the subjects that they teach, including knowledge of central facts, concepts, theories, and procedures within a given field; knowledge of explanatory frameworks that organize and connect ideas; and knowledge of the rules of evidence and proof.”
Pedagogical Knowledge, which is “deep knowledge about the processes and practices or methods of teaching and learning and how it encompasses, among other things, overall educational purposes, values, and aims. This is a generic form of knowledge that is involved in all issues of student learning, classroom management, lesson plan development and implementation, and student evaluation”
Technological Knowledge is “knowledge about standard technologies, such as books, chalk and blackboard, and more advanced technologies, such as the Internet and digital video. This involves the skills required to operate particular technologies. In the case of digital technologies, this includes knowledge of operating systems and computer hardware, and the ability to use standard sets of software tools.”

Then, there are three points at which two of these overlap in the diagram:

Pedagogical Content Knowledge, which “includes knowing what teaching approaches fit the content, and likewise, knowing how elements of the content can be arranged for better teaching. This knowledge is different from the knowledge of a disciplinary expert and also from the general pedagogical knowledge shared by teachers across disciplines. PCK is concerned with the representation and formulation of concepts, pedagogical techniques, knowledge of what makes concepts difficult or easy to learn, knowledge of students’ prior knowledge, and theories of epistemology.”
Technological Pedagogical Knowledge, which “is knowledge about the manner in which technology and content are reciprocally related. Although technology constrains the kinds of representations possible, newer technologies often afford newer and more varied representations and greater flexibility in navigating across these representations. Teachers need to know not just the subject matter they teach but also the manner in which the subject matter can be changed by the application of technology.”
Technological Content Knowledge, which is “knowledge of the existence, components, and capabilities of various technologies as they are used in teaching and learning settings, and conversely, knowing how teaching might change as the result of using particular technologies. This might include an understanding that a range of tools exists for a particular task, the ability to choose a tool based on its fitness, strategies for using the tool’s affordances, and knowledge of pedagogical strategies and the ability to apply those strategies for use of technologies.”

And then there is the central overlapping area in the diagram:

Technological Pedagogical Content Knowledge is “an emergent form of knowledge that goes beyond all three components… is the basis of good teaching with technology and requires an understanding of the representation of concepts using technologies; pedagogical techniques that use technologies in constructive ways to teach content; knowledge of what makes concepts difficult or easy to learn and how technology can help redress some of the problems that students face; knowledge of students’ prior knowledge and theories of epistemology; and knowledge of how technologies can be used to build on existing knowledge and to develop new epistemologies or strengthen old ones.”

Implications of TPACK for Generative AI

Fully elaborating on the implications of TPACK for effective instruction by generative AI would require much more time that I can allocate to this essay, and I’m trying (and failing) to keep it brief. So I will highlight just a few points, using my work on Open Educational Language Models as a concrete example.

When creating Open Educational Language Models, the designer explicitly represents Content Knowledge and Pedagogical Knowledge independently:

In an OELM, you might visualize Content Knowledge as a detailed summary of a chapter from an open textbook. Content Knowledge helps the model give accurate explanations and answers, and significantly decreases inaccurate responses. As I mentioned above, this can be accomplished in a number of ways, including fine-tuning, RAG, or context augmentation.
In an OELM, Pedagogical Knowledge is represented in prompts that are enacted by the model. These prompts represent pedagogical practices that truly cross disciplinary boundaries, like engaging in retrieval practice, or connecting new information to existing knowledge.

Pedagogical Content Knowledge is also part of the design:

In an OELM, Pedagogical Content Knowledge is represented in prompts that are enacted by the model in specific disciplinary contexts, like instruction, practice, and feedback that are specific to writing a strong topic sentence, or factoring a polynomial.

The prototype OELM authoring tool, which will be published on GitHub later this week (I’ll make an announcement), helps authors capture relevant Content Knowledge and Pedagogical Knowledge so they can be remixed and enacted by the model, creating interactive learning activities for students (I previously shared screenshots of the prototype student tool):

Providing Content Knowledge to the OELM

Providing Content Knowledge to the OELM

Providing Pedagogical Knowledge to the OELM

" data-medium-file="https://opencontent.org/wp-content/uploads/activity-265x300.png" data-large-file="https://opencontent.org/wp-content/uploads/activity-903x1024.png" class="has-border wp-image-7706 size-large" src="https://opencontent.org/wp-content/uploads/activity-903x1024.png" alt="" width="903" height="1024" srcset="https://opencontent.org/wp-content/uploads/activity-903x1024.png 903w, https://opencontent.org/wp-content/uploads/activity-265x300.png 265w, https://opencontent.org/wp-content/uploads/activity-768x871.png 768w, https://opencontent.org/wp-content/uploads/activity-1355x1536.png 1355w, https://opencontent.org/wp-content/uploads/activity-1806x2048.png 1806w, https://opencontent.org/wp-content/uploads/activity.png 1808w" sizes="auto, (max-width: 903px) 100vw, 903px" />

Providing Pedagogical Knowledge to the OELM

As I’ve been envisioning them to date, an OELM is comprised of many of these declarations of Content Knowledge and Pedagogical Knowledge (or to use other language, many OER for context augmentation and prompts describing evidence-based teaching and learning practices) combined with open model weights and open source software that orchestrates these all into coherent teaching and learning interactions.

But perhaps the thing that has delighted me the most about applying the TPACK framework to my work on OELMs is that it has helped me see an entire area of opportunity I had missed previously! I have less to say here because I am still working through these implications, but here’s the beginning of my thinking:

When creating Open Educational Language Models, Technological Knowledge is expressed in information about what external tools are available to the model to use. For example, a scientific calculator tool might be available to the model via the Model Context Protocol (MCP), or real-time data about the weather or the stock market or the learner’s own performance might be available to the model via an API.

Information about when and how the model would use a specific tool as part of a specific teaching strategy to teach a specific concept would be that center-of-the-diagram sweet spot of Technological Pedagogical Content Knowledge.

Let’s Hear Some Other Ideas!

If the TRAIT hypothesis is true, and generative AI needs the same kind of training humans do in order to become an effective teacher, then there’s a lot of potentially fruitful ground to plow by drawing out the implications for generative AI of different frameworks for representing teacher knowledge and approaches to teacher training and professional development. (For example, what do “professional development” models about ways to overcome problems with models due to their “knowledge cutoff” dates?) What’s your favorite framework or approach, and what does it imply for teaching with generative AI?

OELMs Github Updated and Demo Video

opencontent — Mon, 10 Feb 2025 16:00:47 +0000

The OELMs source code has been updated on GitHub to include better documentation to help you get started as well more examples of content and activities. As you may remember from last week’s post about the OELMs architecture, the design goal of Open Educational Language Models is to combine the technical power of generative AI with the participatory power of open education. To help you see how that works, the initial implementation in GitHub is sub-optimized in order to make it easier to understand how to contribute. As you see in the demo content in the screenshot from GitHub below, each “course” in an OELM is comprised of three parts (as described last week):

a course description and system prompt stub,
a set of OER that provide the OELM with accurate disciplinary information, to prevent hallucinations, and
a set of activities that help students engage in evidence-based study practices

Again, this implementation is sub-optimal specifically in order to make it clear that ANYONE can contribute to an OELM. While this information will eventually be stored in a database, in this implementation the content and activities are just text files – they could be drafted in Google Docs, Word, or anywhere you like. And they’re all openly licensed (OER), meaning they can be revised and remixed to meet local needs. YOU DON’T NEED TO BE A CODER TO CONTRIBUTE TO THE OELMS WORK!

Here’s a video (no audio) of the current demo of the OELMs application in the Explore mode, where the learner chooses what they want to study, and then how they want to study it. The demo only has content for part of one course, and a handful of activities. But the goal is to eventually have a large library of courses, including a full compliment of open content and activities (some of which could be reusable across courses) developed by teams of faculty and students through facilitated workshop-like experiences.

Medium-term, we should have better multimodal support in open models soon, which will enable more kinds of activities (including voice in and out). Longer-term, each course should have its own specifically fine-tuned, small, open weights model. And that model (together with the OELM wrap around) should run locally on your device.

The OELMs Architecture: The Technical Power of Generative AI Meets the Participatory Power of OER

opencontent — Mon, 27 Jan 2025 14:19:55 +0000

Or, in which Generative AI meets OER meets Reusable Learning Objects.

I’ve been working on fleshing out the architecture for Open Educational Language Models and have reached a point where it’s time to share a progress update. I’ve discussed the idea with several people and gotten some really excellent feedback, and building prototypes has helped me further refine my thinking.

Lessons from the Past: Separating Content from Presentation

I created my first website in the early 1990s, back when all we had was HTML. There was no CSS, no Javascript. Actually, there weren’t even images in those first webpages. I was just surfing the web with Lynx, hitting the / key to read the source code of other people’s sites, and learning how to build my own. The introduction of CSS – and the idea of creating a clean separation between content and presentation – was a revelation that totally changed my thinking and the way I designed website. (In fact, my first print publication was a series of chapters about CSS in a book on “Dynamic HTML” in the late 90s.)

In working on the OELM prototype, I realized that there was an opportunity to do something similar to HTML and CSS with the OELM architectural design. If you think about learning activities as a way of “presenting” learning content, then you can start thinking about how to cleanly separate learning content from learning activities.

Now, it’s not anything new to create content without thinking about the activities learners might engage in as they try to learn that content. For example, people typically write textbook chapters without thought for learning activities beyond reading and highlighting. However, it is wildly different to try to imagine creating learning activities without making any specific references to content. For example, how would you write a practice quiz without referring to any content? How would that even work?

It turns out this can work pretty well in much the same way HTML and CSS do – via incorporation by reference. To begin, you (1) create some content that includes a list of learning objectives and aligned expository information. Then you (2) write a learning activity description that does not include any content itself but includes references to content that will be appended just-in-time (the way HTML files point to CSS files). Then you (3) create a coordination service that merges the content and the activity description with a system prompt (and perhaps some custom instructions) before delivering the final, complete prompt to the model to kick of a specific learning activity on a specific topic.

As with all things, an example should make this easier to understand.

A Toy Example

Here’s a toy example from a hypothetical university microeconomics course.

The system prompt stub for the course, in a file called system_prompt.txt, might include:

You are an upbeat, supportive, empathetic economics tutor. Your greatest joy is helping students understand economics concepts. Help the user – who is a student in a college microeconmics course – as follows.

A content file called demand.txt might include:

– Explain demand and the law of demand
– Identify and explain a demand curve
…

Demand refers to the amount of some good or service consumers are willing and able to purchase at each price. What a buyer pays for a unit of the specific good or service is called the price. The total number of units purchased at that price is called the quantity demanded. The inverse relationship between price and quantity demanded is called the law of demand…

A learning activity, specified in the file retrieval_practice.txt, might include these instructions:

I’ve just finished studying and want to do some retrieval practice. Give me a quiz where you ask me one question for each learning objective specified in the section. Ask me one question at a time and wait for my answer. After each answer, give me feedback on my answer and explain anything it seems like I don’t understand. Then ask if I’d like additional information on that question. When I indicate I’m finished, ask me the next question.

Use the information in the section when evaluating my answers and providing feedback. If I get distracted or try to change the topic of our conversation, politely but firmly refuse to talk about topics not contained within the section.

A complete OELM would include many other content files – a supply.txt, equilibrium.txt, elasticity.txt, utility.txt, etc. – enough content to cover an entire course. From the perspective of a traditional textbook, one of these files might correspond to an entire chapter, or it might correspond to a single section in a chapter. (Questions about our old friend granularity have returned, this time wrt the model’s context window)! This content should be concise, hitting just the high points – more like a chapter summary than an entire chapter. Remember, the purpose of this content is not to directly support student learning – students will never see these files. The purpose of the content is to keep model responses accurate and reduce hallucination. This means the content can be much more to the point than typical OER.

A complete OELM would also include several other learning activity files – for example, a reciprocal_teaching.txt, worked_examples.txt, debate.txt, flash_cards.txt, etc.

Separated out like this, content and activities can be mixed and matched in both directions. For example, a learner can use different activities (like retrieval practice, reciprocal teaching, or worked examples) to study a single topic (like supply). Similarly, they could reuse a single activity across multiple topics (sometimes even across multiple courses). Cleanly separating content from presentation (like HTML / CSS does) greatly increases the reusability of the learning content and the learning activities.

And of course the “open” in OELMs means that the model weights, system prompt stub, content for context augmentation, learning activity prompts, and code for the coordination service are all openly licensed. If you don’t like any of these the way you find them, you can revise or remix them before reusing them.

From an open education perspective, the key characteristic of this design is that it harnesses the power of generative AI using learning content (OER) and activity designs (also OER) that an average instructor can create, revise, remix, and share without any special technical knowledge. In other words, the OELM design balances and combines the technical power of generative AI with the participatory power of OER.

Observations from Playing with the Prototype

Building and playing with the prototype has been extremely instructive. Here are a few observations.

You can imagine at least two ways of interacting with such a system. The first we might call “exploratory.” Here the learner sees a list of courses, topics, and study activities, and chooses what to study and how. I like this pattern because it exposes learners to names and explanations of different evidence-based study techniques, and has the potential to make them more effective learners even when they’re learning information for which an OELM isn’t available.

The second we might called “designed.” Here the instructor or instructional designer pre-selects combinations of topics and activities and creates links to these bundles and places them strategically throughout the course. When the learner clicks on a link, the specified activity simply begins.

You’ll notice that the OELM design explicitly sidesteps RAG. Rather than performing searches over content embeddings and hoping the right content is retrieved for context augmentation, the designer or instructor does this alignment step manually by associating specific OER with a topic. Again, here we’re trading off some more advanced technical capabilities of LLMs in order to keep the system simple enough for average instructors to work with.

While there may be some overlap across courses (for example, retrieval practice is broadly applicable), the list of learning activities will differ from course to course because effective pedagogy differs from course to course. The specific activities created or reused for each course should be selected and created based on SoTL and other research about effective pedagogy in the discipline. In other words, there will be some discipline-specific activities in each OELM that are not very reusable but are critical for learning in that discipline.

In this design, the specific language model being used can be swapped out easily as it’s simply an API call. Current SOTA models that can run locally (e.g., Llama 3.1 8B or a DeepSeek R1 distillation) do pretty well as part of this setup. Future generations of open weights models should put us in a place where OELMs can be run very effectively on local hardware, which is a key part of the OELM long-term strategy, and another reason for preferring simpler designs (e.g., avoiding RAG).

As currently conceived, OELMs supplement existing learning materials like textbooks and courseware – they don’t replace them. This part of the design is also strategically important. Instructors will not adopt OELMs in place of the textbook or courseware they already use because generative AI is too unfamiliar to them. But I’m developing what I think is a very solid strategy for talking about OLEMs in a way that will make it relatively easy for instructors to adopt OELMs as supplemental materials. More on that in a future post.

Finally, developing OELMs should include a collaborative process in which groups of faculty and students work together on the OER curation and adaptation, activity type selection, and activity specification / instruction writing. Again, more about this in a future post.

I’m cleaning up the prototype code and will share via GitHub soon. Interested in working together on OELMs? Let me know!

Generative AI and the Assessment Equivalent of Bloom’s 2 Sigma Problem

opencontent — Wed, 18 Dec 2024 17:24:06 +0000

Since the advent of ChatGPT it seems like everyone is talking about Bloom’s 2 sigma problem. The quick version is this: the average student who is taught using a combination of (1) one-on-one (or small group) tutoring and (2) mastery learning performs about two standard deviations better then the average student taught in a typical classroom setting. The “problem” in Bloom’s 2 sigma problem is that, while we know that this dramatic improvement in student learning is possible, we don’t know how to implement it at scale. We can barely afford one instructor for 30 students – there’s no way we can afford full-time individual tutors for each student. So, since Bloom and colleagues published this finding in the 1980s, many people have been working on this challenge included in their article:

“If the research on the 2 sigma problem yields practical methods – which the average teacher or school faculty can learn in a brief period of time and use with little more cost or time than conventional instruction – it would be an educational contribution of the greatest magnitude.”

Generative AI has now revealed a similar problem – not with improving student learning, but with preventing student cheating. I’ll call it the “AI-immune assessment problem.” The quick version is this: we know there are a variety of ways to assess student learning that are 100% immune to cheating with AI – like some kinds of performance assessments, for example. The “problem” is that, while we know that AI-immune assessment is possible, we don’t know how to implement it at scale. We’ve been studying performance assessment for even longer than we’ve known about the two sigma problem, but we typically don’t use performance assessments because they’re so time-consuming and expensive (sound familiar)?

Bloom’s two sigma problem has always been aspirational – “how can we help more students achieve their potential?” But generative AI has made the assessment problem existential – “how can we certify that a person has learned when it’s possible to succeed on assessments without having learned?” To borrow style and structure from Bloom, it looks like the future will be one in which many people work on this challenge:

“If the research on AI-immune assessment yields practical methods – which the average teacher or school faculty can learn in a brief period of time and use with little more cost or time than conventional assessment – it would be an educational contribution of the greatest magnitude.”

Like many others, I’ve often found that I can make progress on difficult problems by finding new ways of framing them. Hopefully this new framing of the “cheating with AI” problem provides some new perspective that can help us make progress.