I think this process is typical of the new wave of AI, for an increasing range of complex tasks, you get an amazing and sophisticated output in response to a vague request, but you have no part in the process. You don’t know how the AI made the choices it made, nor can you confirm that everything is completely correct. We're shifting from being collaborators who shape the process to being supplicants who receive the output. It is a transition from working with a co-intelligence to working with a wizard. Magic gets done, but we don’t always know what to do with the results. This pattern — impressive output, opaque process — becomes even more pronounced with research tasks.
[snip]The hard thing about this is that the results are good. Very good. I am an expert in the three tasks I gave AI in this post, and I did not see any factual errors in any of these outputs, though there were some minor formatting errors and choices I would have made differently. Of course, I can’t actually tell you if the documents are error-free without checking every detail. Sometimes that takes far less time than doing the work yourself, sometimes it takes a lot more. Sometimes the AI’s work is so sophisticated that you couldn’t check it if you tried. And that suggests another risk we don't talk about enough: every time we hand work to a wizard, we lose a chance to develop our own expertise, to build the very judgment we need to evaluate the wizard's work.[snip]This is the issue with wizards: We're getting something magical, but we're also becoming the audience rather than the magician, or even the magician's assistant. In the co-intelligence model, we guided, corrected, and collaborated. Increasingly, we prompt, wait, and verify… if we can.
I recently tested Grok on my family tree. I have spent enough time in the historical weeds and records to have a reasonably finely-tuned sense of error probabilities, generation by generation. Some lines you can have great confidence in and others you are working with a balance of probabilities. This epistemic uncertainty is compounded large numbers of researchers of highly varying sophistication such that there are often nodes of misplaced, but highly confident, consensus where there should be acknowledgement of uncertainty. In other words, an AI agent, engulfing data, would have a reasonably high probability of being deceived by the seeming historical record because of human error.
I ran multiple types of test. The most impressive one, by recollection, was starting with 4th Great-Grandfather John Bayless (1746-1826) and asking Grok to identify and describe John Bayless's child who was my direct ancestor and then I repeated the process for each generation down to the present. It got every link in the chain correct and provided an accurate (though of varying completeness) thumbnail sketch of the person and any issues it had encountered.
But there were also two quite bad experiences. I inverted and extended the above exercise. "Trace my ancestry back six generations on the Bayless line with a brief description of each individual identified." It got off track immediately after my grandfather and never got back on track.
The other interesting disaster was when I asked it for a detailed description of the life and circumstances of my grandfather Price Murray Bayless (1899-1970). I have done that work and was interested in whether it would find any clues I might have overlooked.
It came back with quite a story of Price Murray Bayless with details ranging from Arkansas to New Mexico, activities ranging from law enforcement to teaching to ranching, and life events ranging from 1899 to the present. It was such a dog's breakfast that it took me a couple of minutes to realize what had happened.
My grandfather Price Murray Bayless has a grandson (my cousin) who was named for him, Price Murray Bayless II. My cousin has always gone by his childhood nickname which sounds nothing like Price Murray but Price Murray is his legal document name. Grok was fusing the grandfather and grandson. Further, it was exaggerating some life activities and underemphasizing (to the point of error) others.
So - is Grok a useful tool for genealogical research? Sure. In knowledgeable hands and with a high degree of skepticism. Check its work. It will turn up things you didn't know because of its access to records but it will make wrong inferences as well.
Would GPT-5 Pro do better than Grok? Probably. They all have their relative dispositions, strengths and weaknesses. For all that there were major missteps with Grok, I was deliberately giving it unconsidered prompts and having it deal with a task (genealogy) requiring a lot of contextual knowledge as well as a lot of judgment. I was impressed with the outcome but also put on notice that it is a tool and that like all tools, it has to be fit for purpose. And is best used in the hands of a craftsman.
I like Mollick's metaphor. Use the tool without knowledge and skepticism and you are functioning as a wizard. You are conjuring an answer without knowing the answer.
Over the past few weeks, I have come to believe that co-intelligence is still important but that the nature of AI is starting to point in a different direction. We're moving from partners to audience, from collaboration to conjuring.
Mollick sum's up the conundrum well.
But I come back to the inescapable point that the results are good, at least in these cases. They are what I would expect from a graduate student working for a couple hours (or more, in the case of the re-analysis of my paper), except I got them in minutes.
As always, everywhere and for all time: New tools are near miraculous in the hands of those who know what they are doing. In the hands of everyone else, they are a force multiplier for good and for ill.
From a societal and productivity perspective, there is an obscure calculus that is playing. The new tool (technology) leads to dramatic improvements in productivity in the hands of a craftsman/master craftsman who knows what is fit for purpose and what are the limitations.
In the hands of everyone else there is a net benefit or not. The average person will occasionally use the tool for purposes it is not fit for. With results which might reduce productivity. And they might use the tool badly in a fashion that reduces productivity.
What is the net productivity benefit among those who are not skilled users? Positive, negative or neutral and to what degree. The add or subtract that from the positive productivity from among those who are knowledgeable and skilled users. And there you have the net productivity benefit. Or not.
It is, like all life processes, and evolutionary game. How quickly can we get how many people using the new tool how well in a fashion that magnifies productivity (possibly by orders of magnitude) while figuring out how to constrain and minimize the downside risks of negative producitivity impact from use among the unknowledgeable and unskilled.
No comments:
Post a Comment