{"id":664,"date":"2026-04-29T23:14:25","date_gmt":"2026-04-29T23:14:25","guid":{"rendered":"https:\/\/quantusintel.group\/osint\/blog\/2026\/04\/29\/turn-any-document-into-a-maltego-graph\/"},"modified":"2026-04-29T23:14:25","modified_gmt":"2026-04-29T23:14:25","slug":"turn-any-document-into-a-maltego-graph","status":"publish","type":"post","link":"https:\/\/quantusintel.group\/osint\/blog\/2026\/04\/29\/turn-any-document-into-a-maltego-graph\/","title":{"rendered":"Turn any document into a Maltego graph"},"content":{"rendered":"<p>A no-code walkthrough from messy text to a verifiable Maltego graph\u200a\u2014\u200aand the one prompt choice that decides whether you can trust the\u00a0output.<\/p>\n<figure><img data-opt-id=771569372  fetchpriority=\"high\" decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*XHZxv2SSkJHi2oUCi83dSA.png\" \/><\/figure>\n<p>You\u2019re staring at a 200-page court document, a leaked chat export, or an interview transcript. You need a relationship graph for analysis. You don\u2019t write Python, you don\u2019t have time to set up custom transforms, and you don\u2019t trust an AI to just \u201cfigure it\u00a0out.\u201d<\/p>\n<p>This is how to do it in Google AI Studio and more importantly, how to write the prompt so you can actually defend the output in front of an\u00a0editor.<\/p>\n<h3>The Dataset: Why\u00a0Friends?<\/h3>\n<p>I was inspired by <a href=\"https:\/\/www.linkedin.com\/posts\/tompjarvis_yesterday-i-delivered-a-training-on-network-activity-7432417264987058176-u0sM\/\">Tom Jarvis<\/a> and chose the Friends pilot episode script as the test dataset deliberately.<\/p>\n<p>First, everyone knows it, so you can intuitively verify whether the AI got the relationships right without being a domain expert. Second, it\u2019s rich with named entities: people, places, objects, explicit and implied relationships. Third, it has exactly the kind of messy, natural language that makes real-world OSINT sources hard to\u00a0process.<\/p>\n<figure><img data-opt-id=771569372  fetchpriority=\"high\" decoding=\"async\" alt=\"Friends episode transcript\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*drCnjimFI5uNFQsu1yCUmQ.png\" \/><figcaption>Friends pilot episode\u00a0script<\/figcaption><\/figure>\n<p>If the workflow works on Friends, it works on a court document, a leaked chat log, or a news transcript. The dataset is the proof of concept, not the\u00a0point.<\/p>\n<p>If you want to try and follow along, get the data<a href=\"https:\/\/adelaide.figshare.com\/articles\/dataset\/Cleaned_scripts\/7413590?file=13719956\"> right\u00a0here<\/a>.<\/p>\n<h3>Step 1: Setting Up Google AI\u00a0Studio<\/h3>\n<p>Go to <a href=\"https:\/\/aistudio.google.com\/\">aistudio.google.com<\/a> and log in with your Google account. You get a generous free tier, no setup, no API key, just start experimenting.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*9KZO8-th_Pil6uRv6Jliug.png\" \/><figcaption>Choose the model for your assignment<\/figcaption><\/figure>\n<p>In the top right corner, you\u2019ll see the model selector. The default is currently Gemini 3.1 Pro Preview, which is the most capable option but also the most expensive. Even if we don\u2019t go beyond the free tier, it will swallow a lot more tokens than a cheaper\u00a0model.<\/p>\n<p>We will use Gemini 3.1 Flash Lite Preview, which is excellent for these kinds of\u00a0tasks.<\/p>\n<p>Next, set up your system instructions. This is your system prompt, so the llm understands WHO it is and WHAT to do. You\u2019ll find these just below the model selector. Click on it, create a new instruction, and give it a name. I named mine \u201cMaltego\u201d since that\u2019s what I use it for. Named system prompts can be\u00a0reused.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*4O_qXMJps-Dt0Yk-A8SWFg.png\" \/><figcaption>System prompt with data model\u00a0rules<\/figcaption><\/figure>\n<p>Here\u2019s the prompt I\u00a0used:<\/p>\n<blockquote><p>You are a data analyst extracting structured data for Maltego network analysis. Go through this script and extract every interaction, action, or event between characters. Output as structured JSON: Source, Target, Action. Each row should be a specific event or action (e.g. \u2018kisses\u2019, \u2018argues_with\u2019, \u2018moves_in_with\u2019, \u2018confesses_feelings\u2019, \u2018invites_to_party\u2019), NOT static relationships like \u2018friend\u2019. The same two characters can have multiple rows with different actions.<\/p><\/blockquote>\n<p>The prompt is deliberately bare. A real job would scope entity types, constrain the action vocabulary, handle pronouns and nicknames, and extract verbatim evidence for validation. If you want the how to of production version, <em>comment \u2018data modeling\u2019 below.<\/em><\/p>\n<p>Now notice what the prompt is doing: it defines the relationship field as an <em>interaction, action, or event, <\/em>not a static label. That\u2019s a deliberate choice, and it connects directly to how much you\u2019re asking of the\u00a0model.<\/p>\n<p>Think of it as a spectrum of interpretive demand. On one end, you ask for action-based relationships: \u201ckisses,\u201d \u201cargues_with,\u201d \u201cshows_up_at_door.\u201d The model doesn\u2019t need to infer anything(almost). It just observes what happens in the text and reports it. Low demand, high consistency.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*-aJLNS7ysqfnhTU0tHmgXg.png\" \/><figcaption>Action-based prompts produce verifiable output. Semantic prompts produce interpretation you can\u2019t\u00a0audit.<\/figcaption><\/figure>\n<p>On the other end, you ask for semantic relationships: \u201cthey are friends,\u201d \u201cshe trusts him,\u201d \u201crivals.\u201d Now you\u2019re asking the model to apply context, draw on prior knowledge, and make an interpretive judgment. That\u2019s a much heavier ask, and the output becomes less reliable and harder to\u00a0verify.<\/p>\n<blockquote><p>For investigative work, this matters. The closer your relationship definition stays to observable actions in the text, the more you can trust the output. The moment you start asking the model to interpret meaning, you\u2019re introducing its assumptions into your analysis and those assumptions are invisible.<\/p><\/blockquote>\n<p>So before you write your schema, ask yourself: am I asking the model to observe, or to interpret? Both are valid, but you should make that choice consciously, understand the consequences by observing it in the explorative part of the data analysis and then reflect it in your prompt and datamodel.<\/p>\n<h3>Step 2: Demanding structured output<\/h3>\n<p>Structured output is king in data analysis. Instead of free-form text, the model returns data in a schema you define: JSON with specific fields, types, and relationships, ready for analysis. No commentary, no \u201chere\u2019s what I found\u201d: just the\u00a0data.<\/p>\n<p>Google AI Studio has this built in, and it works very well. You define the schema, feed in raw text, and get back clean JSON. And don\u2019t be alarmed by the file type JSON. This is nothing you have to work with, but this is an excellent file for further data processing.<\/p>\n<p>So lets enable structured output. Go to Tools and switch on Structured outputs. This forces the model to respect your schema, not just approximate it.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*VUkpolwQExlGDtnAI-haaw.png\" \/><figcaption>Choose Structured Outputs as a\u00a0tool.<\/figcaption><\/figure>\n<h3>Step 3: Validating The\u00a0Output<\/h3>\n<p>Now upload the Friends script or whatever document you are playing with. Click the plus icon or drag and drop your file, and hit run. No need to add a message as the system instructions tell the model what to\u00a0do.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*VRNgaUY0PEa7FDaLyHFUfw.png\" \/><figcaption>Upload or drag and drop the\u00a0file.<\/figcaption><\/figure>\n<p>The output comes back clean: no explanation, no summary, no \u201cI hope this helps.\u201d We now have beautiful structured data\u00a0&#x1f924;<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*SYQVbNt_Jy-PKhXZ_KpzTQ.png\" \/><figcaption>Structured output of Friends pilot\u00a0episode.<\/figcaption><\/figure>\n<p>Before you move on, run it a few times and compare the results. In my experience, giving these models a large document and expecting perfectly consistent output every time is optimistic at\u00a0best.<\/p>\n<p>The right approach is to start with a few pages, test the reliability, validate your data model, adjust it, and then build a more robust processing pipeline for larger analyses. I will cover this in a separate article if there is interest. Just comment <strong>\u201clarger dataset\u201d<\/strong>\u00a0below.<\/p>\n<p>One useful thing AI Studio shows you is token count and cost estimate. You can see exactly how many input and output tokens were used, and what it would cost at scale. For this exercise, it\u2019s free. But if you\u2019re processing hundreds of documents, you can run one and do the math before committing.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*Bp0jtRzJQBhekdEcqR1vUQ.png\" \/><figcaption><em>Token count and cost\u00a0estimate<\/em><\/figcaption><\/figure>\n<p>Once you\u2019re happy with the output, convert the JSON to a CSV. The easiest way is to paste it into any LLM and ask it to convert it. Then move to\u00a0Maltego.<\/p>\n<h3>Step 4: Importing Into\u00a0Maltego<\/h3>\n<p>Maltego doesn\u2019t have a native JSON import button, but there are two ways to bring this data\u00a0in.<\/p>\n<p><strong>Option A: Import CSV\u00a0directly<\/strong><\/p>\n<p>Maltego\u2019s built-in import tool accepts a CSV with source, target, and relationship columns. Go to Import &gt; Import Graph from Table, select your file, and Maltego will create entities and draw edges automatically.<\/p>\n<p><strong>Option B: Build a Python transform<\/strong><\/p>\n<p>If you\u2019re processing a lot of documents and want the whole flow to run from inside Maltego, you can wrap the AI Studio API call in a simple Python transform. That\u2019s out of scope here, but it\u2019s the logical next step if this becomes a regular workflow.<\/p>\n<p>For this walkthrough, we\u2019re going with <strong>option\u00a0A.<\/strong><\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*TTKLIRd0r8ZvUb2GFsh0zA.png\" \/><figcaption>Import the CSV file in\u00a0Maltego.<\/figcaption><\/figure>\n<p>In the import dialog, choose \u201cImport a third-party table\u201d and select your CSV. Then choose the sequential connectivity option.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*qh-CMAcvYI3SRgqNBY45BA.png\" \/><figcaption>Sequential connectivity option.<\/figcaption><\/figure>\n<p>Maltego will try to auto-map your columns and it will probably get it wrong. Unmap everything first, then map manually.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*x3V5yeH7rEeRw9wib1TB9g.png\" \/><figcaption>Unmap everything<\/figcaption><\/figure>\n<p>Now set column 1 as Person (your source), column 2 as Person (your\u00a0target).<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*EYvxKTaO5KwzvYZ1fbtiew.png\" \/><\/figure>\n<p><strong>Don\u2019t click Next yet.<\/strong> Go to the \u201cMap columns to links\u201d tab in the upper\u00a0right.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*WkIGYHENVjDus5SgKWB55A.png\" \/><figcaption>Mapping entity types as\u00a0person<\/figcaption><\/figure>\n<p>This is where you map column 3 (your action) as the link between column 1 and column 2. You choose your column 3 and then map it to number 1 person. This means that your source, your person that is in your column 1, is related to your target as column\u00a03.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*2YpLoCyNrpRLPlDJ6MKv3Q.png\" \/><figcaption>Mapping the\u00a0link.<\/figcaption><\/figure>\n<h3>Step 5: The\u00a0Graph<\/h3>\n<p>Once imported, you get a proper Maltego graph with entities as nodes, relationships as labeled\u00a0edges.<\/p>\n<figure><img data-opt-id=771569372  decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*dwMrwSqbtil8X0xm7CTIag.png\" \/><figcaption>Maltego Graph og SOURCE-TARGET-RELATIONSHIP analysis<\/figcaption><\/figure>\n<p>What makes this immediately useful is that you\u00a0can:<\/p>\n<ul>\n<li>Run further transforms on the entities (look up a person, enrich a location)<\/li>\n<li>Filter by relationship type<\/li>\n<li>Apply Maltego\u2019s built-in layout algorithms to spot clusters and central\u00a0nodes<\/li>\n<li>Cross-reference entities against your other Maltego data\u00a0sources<\/li>\n<\/ul>\n<h3>Three rules I keep coming back\u00a0to<\/h3>\n<p>Before you run this on your own dataset, there are three simple rules. None of them are about Maltego or Google AI Studio specifically. They are about how to think when an AI is doing the reading for you, and they are the difference between a graph that holds up under scrutiny and one that quietly invents the parts it could not\u00a0extract.<\/p>\n<p><strong>1. Define the relationship as observable action, not interpretation.<\/strong> The closer your relationship definition stays to what is visible in the text, the more you can trust the output. The moment you ask the model to interpret meaning, you import its assumptions, invisibly, into your analysis.<\/p>\n<p><strong>2. Choose your test dataset for verifiability, not novelty.<\/strong> If you cannot check whether the AI got it right on a dataset you already know, you have no way of knowing whether to trust the workflow when the stakes are\u00a0real.<\/p>\n<p><strong>3. Validate before you scale.<\/strong> Run a few pages, compare outputs across runs, fix the schema, then build the pipeline. Trusting a one-shot result on a 200-page document is how you publish\u00a0wrong.<\/p>\n<p><img data-opt-id=574357117  decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=29477bedea8c\" width=\"1\" height=\"1\" alt=\"\" \/><\/p>\n<hr \/>\n<p><a href=\"https:\/\/osintteam.blog\/turn-any-document-into-a-maltego-graph-29477bedea8c\">Turn any document into a Maltego graph<\/a> was originally published in <a href=\"https:\/\/osintteam.blog\/\">OSINT Team<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>","protected":false},"excerpt":{"rendered":"<p>A no-code walkthrough from messy text to a verifiable Maltego graph\u200a\u2014\u200aand the one prompt choice that decides whether you can trust the\u00a0output. You\u2019re staring at a 200-page court document, a leaked chat export, or an interview transcript. You need a relationship graph for analysis. You don\u2019t write Python, you don\u2019t have time to set up &#8230; <a title=\"Turn any document into a Maltego graph\" class=\"read-more\" href=\"https:\/\/quantusintel.group\/osint\/blog\/2026\/04\/29\/turn-any-document-into-a-maltego-graph\/\" aria-label=\"Read more about Turn any document into a Maltego graph\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":665,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-664","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/posts\/664","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/comments?post=664"}],"version-history":[{"count":0,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/posts\/664\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/media\/665"}],"wp:attachment":[{"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/media?parent=664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/categories?post=664"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/quantusintel.group\/osint\/wp-json\/wp\/v2\/tags?post=664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}