[co-authors: Jonathan Flood, Stephanie Urbano, Andrew Retizos, and Robert Lin]*
Editor’s Note: On March 16, 2022, HaystackID shared an educational webcast developed to highlight insight into the world of active learning (AL) workflows through a combination of proven experience, expert processes, and practical application anecdotes designed to inform, educate, and prepare attendees for the use of active learning. From understanding the critical elements of AL to refining workflows for the nuances of EU and UK data considerations, this presentation, led by industry-acknowledged technology-assisted review TAR and AL experts, will help attendees better employ people, processes, and technology to realize the promise of TAR and AL.
While the entire recorded presentation is available for on-demand viewing, provided for your convenience is a transcript of the presentation.
[Webcast Transcript] The Hierarchy of Technology-Assisted Review: From Active Learning to Optimal Results
+ Jonathan Flood
Jonathan is the Director of EU Operations at HaystackID. Based in Dublin, Jonathan is a thought leader who has worked with top-tier law firms in Ireland, in addition to advising and supporting vendors, financial institutions, government agencies, and regulatory bodies in matters ranging from compliance audits to complex litigation. A recognized technologist and eDiscovery educator, Jonathan has an extensive portfolio of industry certifications with leading platforms ranging from Relativity to Reveal.
+ Stephanie Urbano
Stephanie is the Director of EU and UK Managed Review at HaystackID. Based in London, Stephanie focuses on managing teams of reviewers, quality control (QC) experts, and review managers to provide clients and law firms with accurate, consistent, and efficient review services. With over fifteen years of industry experience and a decade in the Europe, Middle East, and Africa (EMEA) region, Stephanie has extensive knowledge of all types of matters, including internal investigations, regulatory investigations, and litigations.
+ Andrew Retizos
Andrew is a Senior Consultant on the HaystackID Client Services team. With more than fifteen years of experience in the legal field and a decade in eDiscovery and data management, Andrew is a Relativity Certified Administrator and certified in Reveal Review and Reveal AI. Andrew has managed corporate and law firm clients, consulting for and leading litigation, investigation, and review projects, balancing eDiscovery expertise and experience to support positive client outcomes.
+ Robert Lin
Robert serves as a Senior Manager on the HaystackID Client Services team. With an extensive technical background in supporting PKI and cybersecurity, Robert moved into the eDiscovery industry more than four years ago and has helped multi-national clients throughout Europe on projects encompassing the complete EDRM lifecycle. An expert in training and support for industry-leading platforms ranging from Relativity to Nuix, Robert is a Relativity Certified Administrator, has achieved Relativity Expert status, and is also a certified Reveal/Brainspace administrator and specialist.
Hello, everyone, and welcome to today’s webinar. We’ve got a great presentation lined up for you today, but before we get started, there are just a few general admin points to cover. First and foremost, please use the online question tool to post any questions that you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please let us know using that same questions tool, and a member of our admin team will be on hand to support you. And finally, just to note, this session is being recorded, and we’ll be sharing a copy of the recording with you via email in the coming days.
So, without further ado, I’d like to hand it over to our speakers to get us started.
Thanks, Lucy. Hello, good morning, afternoon, and evening to our worldwide audience today. I hope you have a great– you’re having a great week. My name is Jonathan Flood, and on behalf of the entire team at HaystackID, I would like to thank you for attending today’s presentation and discussion titled “The Hierarchy of Technology-Assisted Review: From Active Learning to Optimal Results.”
Today’s webcast is part of HaystackID’s regular series of educational presentations to ensure listeners are proactively prepared to achieve their Cybersecurity, Information Governance, and eDiscovery objectives. Our expert presenters for today’s webcast include individuals deeply involved in the discipline of eDiscovery, and the specifics of Technology-Assisted Review, supporting areas ranging from corporate investigations to cyber and legal discovery.
As subject matter experts, they have all extensive practical and current experience in developing and applying cutting-edge TAR active learning workflows to improve data-intensive inquiries, investigations, and litigation efficiency and accuracy.
Let me introduce myself as today’s moderator and presentation lead as we get started. As I mentioned, my name is Jonathan Flood, the Director of European Operations at HaystackID. I’m based out of Dublin. I work with top-tier law firms in Ireland, in addition to advising and supporting vendors, financial institutions, government agencies, regulatory bodies in matters ranging from compliance audits to complex litigation. As a technologist and an eDiscovery educator, I have an extensive portfolio of industry certifications with leading platforms, ranging from Relativity to Reveal, and I’m grateful to present and moderate today’s webcast with a group of industry acknowledged experts and colleagues.
Next, let me introduce Stephanie Urbano.
Hi all, I’m Stephanie. I’m the Director of Managed Review for the EU and the UK. I manage review teams and quality control experts along, with review managers, to deliver accurate, consistent, and efficient reviews. I’ve been in the UK for about 10 years now, and prior to that, I was running reviews in the States. So, I’ve been in many different regions and learned many different methods over the years, and I’ve compiled them all together for my time here at Haystack.
Welcome, Stephanie. I’d like to also introduce Andrew Retizos. Andrew is a Senior Consultant on the HaystackID eDiscovery Client Services team. Andrew’s got over 15 years of experience in the legal field and a decade of eDiscovery and data management. He’s also a Relativity Certified Administrator, certified in Reveal Review and Reveal AI, and Andrew has also managed corporate and law firm clients, consulting for and leading litigation, investigation, and review projects, balancing eDiscovery expertise and experience to support positive client outcomes. Good day, Andrew.
Good morning, good afternoon, everybody.
Thank you. Last, but certainly not least, I’m happy to introduce Robert Lin. Robert serves as a senior manager on the Haystack eDiscovery Client Services team. With an extensive technical background in supporting PKI and cybersecurity, Robert moved into the eDiscovery industry more than four years ago and has helped multinational clients throughout Europe on projects encompassing the entire EDRM lifecycle. An expert in training and support for industry-leading platforms ranging from Relativity to Nuix, Robert is a Relativity Certified Administrator, has achieved Relativity Expert status, and is also a certified Reveal/Brainspace administrator, and a specialist. Hi, Robert.
Hello, everyone. Nice to meet you all.
Today’s webcast presentation is being recorded for future on-demand viewing, and a copy of the presentation will be available for all attendees once the on-demand version is completed, we expect those items to be available on the HaystackID website soon after we complete today’s live presentation. At this time, let’s get started on today’s presentation and discuss Technology-Assisted Review.
So, I’ll kick off here with a bit of an agenda here just to cover the items that we’re going to speak about over the next 45, 50 minutes or so. We’ll start off by just talking about the elements of active learning. What we really mean by that is, what are the core elements of this? How do we start a proper active learning project? What are the key elements to that? We’ll move on to talking about the workflows, what that really means from a technical perspective, as well as the processes. We will then talk about some more regional-specific points about active learning, particularly what’s different about the EU and UK market, and what we’ve learned from our US cases, why it’s really good, but why there are also nuances and differences, and how that could work to our advantage. We’ll also talk about the review teams themselves and how the workflows really affect that and what it really means to build a proper team, and Stephanie, in particular, will have a lot of input on how we’ve done that in the past. And finally, we’ll talk about the results. I think that’s the key message here for our discussions, is what is the end goal, and how do we achieve it, how do you measure it? So, we’ll go through all these different talking points. We’ll include some war stories, which we are all scarred from over the years, but also, we’ve learned a lot about how these processes work and how they don’t work, and this is, effectively, what we’re going to share with our audience today. Next slide, please, and the next one.
So, we’re going to talk about the key pieces here about active learning. People have heard these words over the years, we’re talking about the people, the process, and the technology. What we’ve learned over the decades of experience we have between ourselves is no one of these things is important by itself. The key thing is that they’re all individually important, but they have to work together, and what we’ll talk about is how the people affect the process and vice versa, and how the technology is called on during the process and vice versa. So, it’s an intrinsically-linked matrix of people, process, and technology.
So, I’m going to kick it over to Steph here first to talk about people. Can you give us some of your insights into – with your experience as well, about what it means to you, what the people aspect of active learning means to you?
So, from a manager view perspective, the people that I work with closely are the data analysts and data consultants and the workflow consultants within our company at Haystack, and then alongside that, I’m also working very closely with the review team, which is comprised of a review manager, sometimes multiple review managers, depending on the size of the matter, QC-ers, and the reviewers themselves, and it’s really important in terms of each individual project that you’re working on, that not only do you have the right people internally working on it, you have the right reviewers working on it. So, you really want to make sure that you’re aligning with the client, with their colleagues in the company, as well as with the internal staffing or external staffing, that you’re getting the right people for the matter, who understand the matter itself, and who understand how technology works and how to work within the very system that you might be using for a particular case.
Yes, it’s a really key point there, I think, Stephanie, that we could probably summarize all of these individual roles into a more hierarchical term. They’re all experts. We rely on experts, and it doesn’t matter which person in the chain you are, if you’re not an expert in that particular part of the chain, it affects the process and it affects technology. That’s quite interesting.
Andrew, I’d like to get your input on how that relates then to the process. Obviously, the people need to be experts, but how do we get the experts to use the process, and how does that process affect this whole lifestyle?
Yes, one thing I did want to note about people, and not just people on our side, on the, on the working side of this, but also on the client-side, one of my experiences mostly – well, many of my experiences over the past 10 years with active learning is just being able to explain to the client what it is and what it does, and the value of that and the value that active learning brings to a project, and to be honest, 90% of the time, people are stuck in their old ways, and they say no, we want eyes on everything, and we want to make sure that we’re complying with the decision and whatnot. So, my experience on the people side of this is, yes, we have our experts on our side, but also we have to make sure our experts are consulting and teaching clients what they need to learn in order to provide them the best possible solution, best possible workflow, which is what we’re going to talk about in this next section, which is the process.
As a project manager, the biggest day for me on a project is really the scoping call, and those first few project planning calls with the client. If we don’t get that right, and this applies to active learning, as with any tool in eDiscovery, if we don’t get that project planning right and we don’t get that scoping call right, the project is probably not going to go very smoothly. So, in terms of process, the first step for me as a project manager, is to make sure that we understand as a team, both client and internally, what the scope of the project is, what are the goals? What is the scale? What is the timeline? How much data are we talking about? Do we understand what’s in this data, and what do we need to do to it, and how much time do we have to turn that around? All of that information is going to play into the kind of workflow we set up as a project management team, as a review team, in order to get the client what they need.
In different kinds of cases, different requests are going to affect the workflow that we set up, and Stephanie knows, once the scope of the request or once the scope of the review changes in any way, it is going to change how we attack the review, how we set up the active learning. So, that’s the first step of the process for us, is just getting everybody on the same page, understanding what needs to be done, then as experts and consultants, we set up a workflow that will leverage that active learning technology to most efficiently and effectively get to that goal that the client needs. And this folds a lot into the technology and the setup, and all these nitty-gritty details, which I don’t think we’re going to get into today, but things like setting up the data, pulling it down, preparing it, making sure that data that we’re working with is going to be applicable, and it’s useful for the active learning tool, because as much as active learning is a powerful tool, it’s only as powerful as we know how to use it, as our understanding of what can this technology do for us, and also understanding what can’t it do for us. And again, it goes back to consulting with the client and explaining to them, this technology is useful for data that has text, it’s not useful for data that is relying on images or doesn’t have texts, or media, or handwritten scanned documents.
That kind of stuff has to be talked about and understood from the beginning so that our workflow is accurate and efficient all the way through, and that has a direct effect on the quality of the active learning technology, the quality of the review, as well as the defensibility of the whole process, because when we set up a workflow, our goal is to not just set up a workflow of how to get everything reviewed, or how to get everything pushed through the active learning project, but also to build into the workflow QC and verification. That way, everything we do can be verified and is defensible on the back end as well, and we’ll get into more of those ideas of QC and verification later on in the presentation today. But in terms of the process, that’s what we’re looking at; understanding the scope and timeline and scale, creating a workflow that utilizes active learning to its most efficient possibility, and then building into the workflow QC and verification so that our work product is as defensible as possible all the way through.
Yes. They’re all great points, Andrew, and I think something that we didn’t put in the slide here, which is potentially equally as important to some of these points, and you touched on it, is educating the client, because we have some clients who have never done this before, and we have other clients who do this every day of the week, and as we’ve learned over the years, people make assumptions and we talk to clients like they know something, and they may be doing this for 10 years, but they may not know about this particular way that we’re doing things or a particular workflow that we’re using. So, it is critically important that we are consultative in our approach and we are meeting daily and having conversations with the client daily and updating all the time. And all of the processes are, effectively, could be null and void if the client doesn’t understand exactly what they’re asking for and how we’re going to deliver it for them. And it’s an extremely iterative process, and we’re constantly talking about how we can improve the output with them, how we can modify the workflow to achieve their goal in a better way or something. So, all very key points.
And with that, the thing that we talk about a lot here is the technology. And Robert, I’m going to pull you in this for one here, the important thing about the people and the experts is obviously that it continues into the technology realm, but from your perspective, let’s just talk about the element – the technology of active learning, we can apply how the people and process, or how they apply to the technology in a bit more detail later. But could you give us a bit of a high-level overview of the technology from the active learning perspective?
Sure. In fact, before I go into that, again, the most important is actually the people and the process. Without those two, the technology is just a tool. If the people don’t know how to use the tool, it obviously becomes useless.
So, exactly like you said, when we educate the client, the first thing I always say to them is, “Don’t be afraid, it’s a tool to help you, to make your work efficient and make everything simple for you to use”. And it is simple. There are so many technologies with active learning these days, and they’re all very good, and they made it simple so that you just go through a brief session and you’ll be able to use it very well.
It’s easy to set up as well. Again, the tool is there, you just need to make sure that you have the right data, you have the right people, you have the right process, and that’s it. The technology, these days, you literally only need to train maybe a few documents, one to five, but unlike before where you had to train thousands of documents. Now, you can train some and then you can carry on working as per normal. So, it is very, very adaptable to any situation, no matter if you’re looking at only pure email documents, or it could be PII, or you could even – I’ve even had clients that are in the construction industry as well.
And so, it really, really is up to the client, up to the project. You can use it for any kind of project, whether it’s big, whether it’s small, it doesn’t matter. It can be a few thousand documents, it can be a couple of million documents. Obviously, if you go into maybe 10 million documents, we have to think about it, but you can customize it.
And the most important thing about doing all of that is that you can actually improve your quality and improve your timing and reduces the project time that you have. There are so many examples where, even just traditionally, where a couple of years ago, in fact, one or two years ago we had clients that still review on paper. They literally take a marker and they go through the paper and they do it. That’s just wasting too much time.
We’ve all been there and that’s the thing, but with active learning, it cuts it down by so much. We had small projects, and when I say small, it could be – traditionally, it could last maybe one month, but put in active learning, you could literally finish it within two weeks or even one week. And the nicest thing about all of this is that you can prove that what you’re doing is right, your reviewers are right, your data is there. And so, you’re able to have concrete evidence to show to anybody that you have followed the right process, that everything is sorted out. You have graphs. You have tables. You have reports and everything.
So, the technology is there, but the most important is people, process. And once we have all of these three together – once we have those two, technology is out there for us to use and it’s very easy to use.
I joked about this in several meetings with clients and conferences in the past that we refer to technology-assisted review like it’s some really recent phenomenon, but if we look at the industry, as a whole, everything that we do is technology-assisted review. The review platform is a technology, OCR is a technology, redacting natives is a technology. All these little tools that we have, including active learning, they’re all just technology-assisted review methodologies, and we’ve become so used to review platforms and OCR and other tools that we don’t even consider them to be a decision. Of course, we OCR, of course, we use a review platform, of course, we do all these other things. And we’re slowly getting to the stage where active learning is becoming a ubiquitous tool that we use as part of any review, just like we use email threading, just like we use clustering and all those other tools. And we won’t get into the nitty-gritty of each of the actual tools here. We’re going to talk about active learning specifically.
Can you go to the next slide, Lucy? I just wanted to show the next slide where, effectively, this is the same information, but effectively how we view this is that it’s almost one circle. You can’t have one thing without the other here. We need to have excellent technology, which we know from experience which tools are excellent, which ones we recommend. We have excellent processes that we’ve developed over decades, and we have excellent people. Our operations team now is in excess of 100 people across the globe, and we’re able to turn around projects in times that, frankly, were inconceivable even two or three years ago, and being able to come up with new methodologies is really essential.
Let’s move on to the next section here. We spent quite a bit of time on the first slide there, so just kind of moving through these.
We’re going to talk a little bit about setting up the workflow itself. So, this is somewhat technical, but also we’ll talk about the people aspect in a bit of detail here. Next slide.
So, here we’ve got the people, process, technology again. This will be the theme through this whole slide. We actually initially started off by saying we’re not going to mention people, process, and technology, but I think it’s impossible to talk about active learning without really breaking it into these three sections, because they’re so ubiquitous with the industry and with how people perceive this.
So, again, I’m going to start off with Stephanie on the people side of things. Can you go into a bit of detail about the more critical aspects to – before we even get involved with the technology and even setting up the project. What’s your view on how the people side of the setup – how that works?
Absolutely. So, mainly what you need to start off – and I think Robert and Andrew have both touched on this a bit – is starting off with a set team who is going to work on your project, who knows the ins and outs of your project, and knows all the tools, who knows how to deal with the reviewers, who knows how to find the right reviewers, and who knows how to put all that together. So, once you finally have those initial conversations and you’ve hammered out what the goal of the matter is, whether it’s an internal investigation, a regulatory investigation, or a litigation, what that end goal is, you can bring the right people in.
And certainly, in terms of reviewers, you have reviewers who are experienced in various types of matters. So, we’ve had an IT matter before where not a lot of reviewers are super well versed in IT stuff, and so making sure you have the right reviewers on to read those documents and understand what all these internal conversations and the emails were, when it was focused on certain IT aspects that a layperson might not quite understand, it was really key in building that review team to know what was in those documents and what kind of reviewers with what kind of backgrounds we needed to put to understand those documents.
And that’s a key point about the reviewers as well, it may seem like the de facto decision in the past was to just get lawyers on these, because they want to get lawyers’ eyes on these quickly. As you’ve seen over the last while, it may actually be better for you to seek expertise in the particular subject matter and have reviewers perform a review who have a background in whatever the subject matter is.
Absolutely. And certainly, that can even be a combination of lawyers, so let’s say somebody did a technical degree as their first degree, but then decided to do a law conversion course and then entered the eDiscovery world. Right there, for that information technology project (that IT project), that would be an ideal reviewer to have on your side. It’s not only that they understand the legal aspect, they understand the background matter of the case as well.
The key piece there being – and again, sort of repetitive of what we said in the first slide, but I think this is… the trend here is that we’re going to repeat some of the same things but from a different perspective, depending on which part of the workflow we’re in.
Robert, let’s talk about the process here. When we’re setting up the review, from your perspective using the technology side of things, what are the three critical pieces from your perspective?
Well, the critical pieces are you need to build your fields in order, obviously, to support your view. And the active learning really only requires one particular field, it’s either a “yes” or a “no’, or it could be a “relevant/not relevant” field for you to even just begin. But the key point here is actually, do we know what data we want to review? Obviously, if we don’t know what we’re looking for, it’s very difficult.
You could have a set of data that you don’t know exactly what you’re looking for and you want to find the smoking gun, but the most important is the data needs to be clean. You need to prepare it.
Now, in this slide, here it says “text, image, and non-text”. So, basically, normally people set their data up using extract/detect, and that is fine, but there are times where more than extract/detect is needed. So, this is, again, depending on what are we looking for. Maybe we’re looking for the metadata, filenames, time etc. So, maybe we do need to include that within our review. Again, mostly, though, it’s just extract/detect.
Again, I’ve even had clients that want to look at – review images, but obviously imaging in active learning is very difficult, but the metadata within images are important. I had a case last year where it was a construction company, and they have tons and tons of images, obviously, maps, building construction plans etc. Obviously, you can’t put the image through a process, but you can get the metadata out of it, and there is tons of metadata out of that.
And so, setting the right data up for the reviewers and for the active learning is very, very important. And once you have that, you just index it, run through the classification index, and increment as you go along. If there are more documents, increment as you go along. But obviously, again, you need to clean that up, not that we need to clean that up, we need to constantly review the data that is going in. If there is going to be a lot of repetitive words or phrases that is going to interfere, why don’t we actually just remove that from the beginning, that will make the process so much easier. And so, that is part of the workflow, always constantly [inaudible]. So, these are the three things, in my opinion, that is very important to have a successful later project.
Using the right tool is kind of the critical thing here. We could set this all up really well and this has happened in the past with reviews where the client was leading the review, they weren’t quite sure about what they should or shouldn’t be using, but they just pressed ahead with the review, because they wanted to use technology, and they’d throw everything into a CAL review and 60% of the documents are images. And that’s not going to give you a good result. So, the process being really key to dividing the data up into the pools that you want to apply different technologies to, and there are lots of modern technologies like with audio and video, you’re able to transcribe those. So, maybe you want to perform different workflows in different functions of these things.
So, I just listed a couple of tools. I’m not going to name names or pick out particular providers, but we can talk about the concepts of the technologies. We know about active learning. We know that’s training an assistant to present relevant information to you, so you know that your text-heavy documents can go down the active learning… it may be interesting to look at your – if your search terms are good or bad, so you might want to look at clusters of data, apply your search terms and start pulling out interesting topics that you may not have known that were in the data, but your search terms have started to show these threads of conversations and different concepts that are interesting, but you didn’t have a search term for it. It can help you devise better search terms if you want to reduce the data that way. Search terms are still a tool that people need to use even if it’s an early case assessment prior to using CAL. It is a technology that can be used and should be considered in projects all the time. Entities can be very important, we’re talking about trying to find individuals or bank account details, or PII, GDPR stuff. Having entities automatically being extracted and having some view of that before you review is critical.
So, not everything about the technology side of these things needs to be applied across everything all the time. You can be very selective with what you do to help drive your efficiency. We don’t want to review every document, just like we don’t want to apply technology across every set of documents. And you can start to build storyboards, and you can start to build some high-level information that helps you and the legal team start to create these relevancy stories. What are we actually trying to find here? Not just is a document relevant enough, but actually start to build the story of this data before you’re finished the review, and you can start to give more informed information about what’s actually going on here.
One that we started to see a bit of use on lately is sentiment analysis as well, where we’re trying to see – particularly around employee behavior where somebody is communicating in a very aggressive way, or somebody is very upset, or somebody is depressed or whatever other emotion group you want to put the text into, it can help define… it’s not a legal thing that you can rely on. You can’t say this document is aggressive because the system said it’s aggressive, but it may help you find a trend in a conversation where somebody is starting to become very upset about their position and then all of a sudden, they quit. You’re trying to find the reason why or there’s some investigation.
So, these tools all have great use cases and, again, back to our previous point, having the experts who can look at your case, find out from you what it is that you’re trying to do, build a process for you and advise you what you should be doing and then apply the tools correctly, that’s the method, and it’s iterative. We do that, we go back to the beginning when we get new data or maybe they find some document in the case that changes what they think about it, and actually they need to look at some other information, or they need to collect more data.
It can be key to not just finding the relevant data early in a case, but it could be key to really finding out, are you actually answering the question? Are you asking the right question?
Jonathan, one other point I wanted to bring up in terms of people and experts being critical to the setup is that once active learning is running, having experts in active learning is actually – even just as important in terms of keeping with the review and seeing what’s happening with the review. A review has been going on for two weeks and you’re seeing some strange results, like everything instead of being divided into “this batch is more relevant and this set is less relevant”, everything is sort of just merging into the middle as the machine doesn’t know, it’s learning less as we go along, then there’s a problem.
So, your experts in the technology have to be aware of those things so that they can see those out and figure out what’s wrong. Is it the technology that’s wrong, or is it the reviewer that needs to be trained further? So, just things like that, that people with good experience in active learning, in the technology that you’re using is vital, because if you’re just trusting the technology to do it on its own, it could go wrong very easily and you wouldn’t know it until the end if you don’t know what you’re looking for.
Exactly. I think we, in planning this conversation, we call that avoiding pitfalls and you’re never going to avoid all the pitfalls in every case that you do, but having the people who understand them and can spot it a mile off and raise the red flag ahead of time is really, really important. Thanks for bringing that up, Andrew. Next slide, please, Lucy.
We’re going to talk a little bit about the differences here between what we’ve learned in the US and how it applies to the EU and the UK markets. I think Stephanie is going to start here in a bit. Can you move on, please?
So, one of the things that the biggest difference between the EU and UK compared to the US is the review team size. It’s very rare for this region to have 200 reviewers thrown onto a project. There’s just that not many. The pool is just not that big. It takes a really long time to build up a team to that size, and you’ll really have to go geographically diverse. You could possibly find 200 reviewers in the UK on a project, but if you bring in language components, if you bring in certain background expertise components, you’re going to have to go a little bit far-ranging. So, you might have some reviewers in the UK, some reviewers in France, some reviewers in Germany, some reviewers in one of the Nordic countries, really run the gamut across the EU and the UK. And these smaller team sizes and these geographically dispersed teams require – you pretty much require active learning to cull a large population down to make it workable here. You really need to apply as much technology as you can, so you can get away with a smaller team. It’s not only more economically efficient, it’s just more efficient in terms of getting a project done in the region as well.
And along with that, one of the huge examples that Andrew has touched on is knowing and getting all that team in line. So, if you’ve got reviewers in France coding a document one way, and reviewers in the UK coding a similar document the other way, you’re going to fall right in that middle, that 50%, you’re going to confuse the active learning system. It’s going to go, “Well, I don’t know, somebody coded it relevant, somebody coded it not relevant, and so I’m just going to stick all the similar stuff in the middle”, and you’re not really getting a good use out of the tools. So, you really need – with these reviewers placed in various regions – you really need to keep the review managers, the team leads, the QC-ers on top of the QC, and getting everything as consistent as possible.
Another thing to be aware of in the EU and the UK is sort of the regional work differences. So, in the US, two weeks’ vacation is pretty standard, but in the EU and UK, you’re looking at five weeks’ vacation minimum and people are really into having their holidays. It’s not something that somebody is going to go, “Well, I’ll give up my five weeks of holiday and I’ll work straight through”. So, it’s another reason why you have to really be aware of all the technology that you can use, because we had a project start around Christmas time one time, and all the reviewers were supposed to be based in the UK, and we had to go with a bit of a smaller team, because everybody had their Christmas plans, everybody was off to family, they were planning to take two to three weeks, so we ended up using active learning, so that we could get a really small team to get a project done over the Christmas holiday, and it worked out really well.
We ended up being able to—
I meant to bring it up in one of our previous slides, the new draft rules or the rules that are currently being tested by the UK Government, the UK judiciary which effectively makes it your obligation to use technology, and if you don’t use technology, I think it was 50,000 documents or more, whatever the number might be, that you run the risk of not being able to claim costs, or you’re running an inefficient review if you’re not using technology. So, therefore, the onus is now on the parties performing discovery to make sure they’re using it and make sure they’re using it properly. And if it’s your first time doing it, and it’s a massive case, you really don’t want to run the risk of getting in tons of reviewers, racking up huge review costs and at the end of it, have no real sense of what’s relevant or not.
Absolutely, sort of a complete mindset blip where it was instead of explaining why technology-assisted review or using active learning is defensible, instead it switched to explain why you’re not using this.
You’re going to give us some examples. We’re probably about five minutes behind where we wanted to be, so if you could give us the examples in the shorter form, that would be great.
Absolutely, I’ll be as brief as possible. So, I touched on one of the examples before already, which was the review that was happening over Christmas time and needing it to be done by early February. It was a pharmaceutical litigation here in the UK, it was 500,000 documents hit on the search terms that were used to sort of pull in the population. And obviously, that was not going to be doable with a team of, I think we had 30 reviewers on it, so we really needed to make use of active learning, and we actually were able to very defensibly cut that population in half. Despite both sides having agreed to use active learning, the other side sort of, “Nah”, and didn’t use it in the end. They ended up having to push the deadline back and push the deadline back for their discovery, while the side that we were working for was able to say, “Hey, we’re ready to go, we can turn over the documents right now if you want”. And it sort of really bolstered them as being prepared, and it really helped as well, when we finally did get the other side’s documents, our client had us stick them into our system, use the active learning model that we had used on the initial documents that we reviewed, and pull all of the key and relevant documents out of that other side’s production set really early on, so that we could build case narratives and find all the documents that supported our end client’s view of the case.
We also have used it many times in various internal investigations. So, it’s really useful to have a small team of reviewers, even as few as five, and you can use, not just active learning, but also the various other technology-assisted review tools. So, clustering, timeline analysis, all those sorts of things to sort of delve into the documents and start pulling out relevant documents and key documents, and keeping this log of what you’re finding, especially in terms of key, feeding that up to the law firm and the client for their input just to keep everybody on the same page, and then kicking in active learning once you have a wide enough sort of corpus of these documents that you said, “Hey, these are relevant, these are key, these are not relevant”, and getting that run through the system so that without even having to delve necessarily always into clusters or are building targeted searches out, the machine can push those into review, and you can get an idea of the documents—
Sorry to interrupt, Stephanie, but now that we’ve started to explore what will be part of our workflows in the future once we can productize these in better ways, we want to try and reuse the information that we’ve gathered from some reviews. We’re reviewing the same types of project over and over again, albeit from different client backgrounds and different datasets, but some of the core information about those reviews are, effectively, the same. Like if it’s financial fraud, then you’re kind of looking for the same kind of language where people are talking about ways of getting around regulation and things like that. We’re starting to build models in other tools where those models become reusable, so we can train on a set of data, build as many models as we want. We can have 100 models that are created on a particular dataset. Then we get a similar case into the client, we can say, “Hey, we might be able to save you a bit of time here, we’ve got 15 models that may apply to you. We can apply them from the beginning, they may not be useful to you, but we could start to show you what the outcome of these models are, and maybe they’ll start to show you…” like for internal investigations, for example, or DSARs, or things like that, you’ll be able to identify documents much more quickly and, effectively, the model will start to build your relevancy set immediately.
So, just something else to add into the mix here of the technology-assisted review stuff and where we’re seeing it go and I think where the next maybe 12 to 18 months or so will become a little bit more normalized and our clients will start to use a bit more.
Lucy, can you move on to the next slide, please?
Next, we’ll talk about review teams and the workflows themselves. This is a bit more focusing on the QC aspect of this and why that’s so important, so next slide, please.
Over to you, Andrew and Stephanie, I believe we will let you both back and forth on this. So, Stephanie, you can take it away.
So, basically, again, I touched on this a little bit on the last slide, starting to QC immediately from day one. You may not feel that “Oh, I don’t know enough about the documents, I shouldn’t start QC right away”, but you can really see those sorts of big anomalies. You can say, “OK, from what we know right now, we say this document is correct, we need to look at this other set of documents and get them in line”. Along with that is the regular escalation to the law firm and client.
Again, probably not from day one, but definitely from day two, get those documents fed up to LR, send up samples of what’s been tagged as relevant, what’s been tagged as not relevant because you want to make sure the review team is actually interpreting the protocol appropriately. You stick 15 lawyers in a room, for those gray line documents, eight might go one way, seven might go another, and that’s when you really need the law firm’s guidance to go, OK, how do you want the team to interpret this.
So, getting the samplings. Even though you have the Q&A log going up on a daily basis, there are always questions that the team hasn’t thought to ask that we could sit there and think, “Oh yes, we’ve 100% got this right, we know what we’re doing”, so if we’re not sending up those random samples as well, we can’t be assured we’ve asked every question we need to ask. Getting those key documents fed up at the same time that you’re feeding through the Q&A documents means that the law firm, and the end client, can then have a gauge of what’s being tagged as key. They can get closer insights into the documents, they can know what either might bolster their case or not be great for the case. If it’s an internal investigation, they might know, “OK, well, actually, we need to look deeper into this, or woe, there’s this whole other thing we hadn’t even thought was in the documents yet”.
So, getting all that up escalated is sort of part of the feedback loop. So, the feedback loop starts with the QC-ers, team leads, and review managers keeping in very close contact with the first level reviewers, making sure that the first level reviewers are talking to them about documents, making sure they’re escalating the documents through conversations, through the chats, through any questions that they might have.
Then from there, escalating back up to the law firm, getting their input on everything. And again, maybe not necessarily daily calls with the law firm, but certainly very frequent in the beginning and as the team gets more used to the document, as the law firm gets more comfortable with knowing that the reviewers understand the protocol the way they should, those can maybe be downscaled a bit, maybe once a week as opposed to three times a week.
I’m just going to bring Andrew in there. So, from your perspective, the review manager’s side of this is very clear in keeping control of this entire orchestra. You’ve got people doing various different bits and pieces here, and you want to keep control of them. From your perspective in project management and dealing with the client on a day-to-day about that back and forth, how do you – what’s your key point for each of these points? QC being obviously something you’ve mentioned before, it’s critically important. How do you feed the QC back to the client?
Well, absolutely. Stephanie touched on most of the points I was going to make and, basically, my point is there always should be an open channel of communication all the way from the first level reviewers to the subject matter experts, to the lead attorneys handling the case. Because the accuracy of the active learning is highly dependent on how refined each reviewer’s understanding of the protocol is, that’s the gist of it. It’s that your machine is going to be as accurate – the more accurate your reviewers are.
So, Stephanie touched on most of the points I wanted to make in terms of that dialogue and the communication always being open and making sure the clients and the law firms understand the reports that we make, whether they’re daily reports or weekly reports, overturn reports, why are you seeing so much conflict between reviewer and machine, things like that. When we make those kinds of reports, we want to make sure the client is in full understanding of what’s happening and whether it’s a reviewer issue, is it a technical issue, and how are we going to resolve those issues going forward, either way, whether technically or because of personnel.
It’s interesting, we’ve had several occasions where we kind of have to – one, you’re developing a very good relationship with the client over these matters, and once you’ve done a few cases with a client, they trust you implicitly in everything that you suggest, they start to know everything about it and they can start to give good feedback sessions. But I think one of the things that’s critical to this and maybe there are people who are attending this who are thinking about taking on a project and using active learning. I think one of the key things that you will hopefully find, and this is one of our – I think is one of the great strengths of Haystack, in general, is that we are not afraid to suggest something, or to maybe push back on the client and say, “Hang on a second, we appreciate what you’re trying to achieve with this particular suggestion here, but actually if you do that, maybe you’re going to affect the quality or maybe we’re going to affect how the sampling goes”.
Because what I sort of felt over the years was that you might start off with a good intention, the active learning project, just the fear starts to creep in towards the end and then the client inevitably starts to push that, well, maybe we’ll just keep reviewing and maybe we’ll keep going, and then all of a sudden, you’ve reviewed 90% of the documents and you haven’t really made all the savings.
Now, you still generated all that relevant data early in the process, but something that I think is really critical to this, which we’ll talk a bit more in-depth on the next set of slides, but is for those who are a bit afraid of this is just how important it can be to use active learning even if you’re not going to do a full AL project and stop reviewing at a certain point, how critical it is to be able to prove out your work, and the QC aspect is not just QC of reviewers, but using the results of an active learning project to go back and QC the documents you’ve already reviewed and say, “Hang on, the system thinks that these 50 documents are not responsive, your reviewers said they were responsive, are you sure, because maybe one of these…” at the time, they may not have seemed like they were important, but towards the end of the review they may actually be way more important because 100,000 documents worth of knowledge. I think it’s critical.
The whole flow here from QC through to these Q&A logs are, effectively, all people and process. I haven’t had Robert speak too much on technology on this particular slide, but this bit is really what forms a really strong active learning review, and it’s what lets the technology really shine, and we’ll talk about that on the next slide. So, Lucy, can you jump forward, please.
Now, we’ll talk about the results. This is where we really find out, have we done a good job. Next slide, please.
I’ll start off here with Robert. We’re through a review, we’re finished reviewing whatever percentage of documents that we got through. Talk to us a little bit about how we turn that around with technology and say, “Right, this is what we’ve done”. How do we quantify the review to a client?
The most important thing to know, in fact, is to go back to the beginning, actually, ask the most important question, have we achieved our objectives?
Now, obviously, to know that, throughout the process, we should have already done the QC, we should have already had our goal. So, we should have already know, are we supposed to review every single document. As Stephanie has said, the cases in EU are normally pretty small and they like to go through every single document. So, have we gone through all the documents? Have we gone through the themes that we wanted to go through? Have we coded correctly? Have we coded in the right period of time? So, these are the questions that we need to start asking even before we got to the very end. In fact, if we can do it every single day, but obviously that might be too much, but we need to know what we’re looking for.
So, again, it’s back to the beginning, are we meeting our objective. And how do we know whether we met our objective? And that is in the various tools that we have in there, we run through what is known as the Product Validation. This is, for example, like the illusion test that we’re seeing, this is the graph that we see. For example, in Relativity, they have it as the curve that they have, or have we set up widgets to measure the amount of documents that our reviewers have gone through, and what kinds of decisions did they make.
These are the tools in the various tool sets we have, we need to set those up. And we need to again, try and measure this before we even get to the end. And obviously, the end, we’ll then review those again to see matching the decisions that our reviewers have made against the machine. Use the graphs, use the widgets, use the reports and actually, again go back and ask the very important question, have we met our objectives?
Perfection isn’t the objective here, it’s are we meeting the goals that we set out with the many meetings we had at the beginning?
Yes, that is correct. And again, we run through all of those validation tools to find the false positives, to find the false negatives. This is, again, not to say, OK, the reviewers have done a bad job. No, it’s not that. It’s actually to find, are we meeting our goals. That’s the key point.
And from there, then we ask ourselves the question, so what is going to be next? Do we need another round of this? Is the data that we have not sufficient? Are we looking at the wrong set of data? Because again, if the machine is not finding not a lot of relative documents for us, if active learning is not finding a lot of positive rates for us and our reviewers match that kind of thought, then we know it’s not the machine, it’s not the people, but it’s actually the data.
On the other hand, if the data is giving us a lot of high positives, but our reviewers are tagging it wrongly, then we need to go back. Maybe the communication was wrong. Maybe the reviewers have the wrong expectations of what they’re supposed to look at. So, we need to go through all those graphs, go through all those reports and actually ask the same questions again and again, are we meeting the objectives?
That’s, effectively, I would like to say the mostly joyful point that we get to at the end of a review where we can say, we’re done. OK, we’re done, fine, we move on to the next project. And I think the key thing, and it’s been a theme through all of the slides that we’ve talked about here today is quality control. I don’t think we can overstate how important quality control is and that comes from not just the reviewers, not just the quality control reviewers, but also the review management, also the client services and the client.
I’ve had projects in the past where they’ve waited until literally the last day prior to production to start doing QC, and they realize, “Oh, hang on, the privilege was wrong, or we misunderstood the category and actually relevance needs to change”. So, it is of absolute critical importance, and we’ve all said it at various points. It needs to happen at the beginning, it needs to happen throughout, and it needs to happen at the very end, and we can use all the various tools available to us to ease the burden of QC, because it used to be that people would review everything again to make sure, just to make sure. But when do you stop doing that?
And I think one of the key elements of active learning is that you can really narrow down the scope of QC and hone in one those documents where if the system disagrees with the reviewer or the system thinks that it’s extremely likely to be relevant but your search terms didn’t bring it up, so maybe your search terms were wrong. So, there are so any aspects to this in so many ways, and that is truly a reflection of how expert the people are, how well defined the process is, and how well you use the technology and together that brings our hierarch of technology-assisted review is really, really understanding those three elements, as disparate as they are, understanding those elements and making sure they work together harmoniously.
So, with that, I think we are done with our presentation. The next slide is, I think, just asking for questions, and I only see one, but if anybody has any questions, I know we’re out of time here, but we’ll spend another couple of minutes here answering some questions if you have any. So, I’ll give a minute or two for people to post them.
We do have one question from Ethan. I’ll read the question out here. It says, “Jonathan mentioned portable models for TAR matters, those are indeed the hot topic for providers like Reveal and [LTNY], though much of the published research coming out over the last several years [inaudible] data scientists in ACM presentations this year at [ICAIL] etc suggest that while not totally useless as a seed, they perform worse than simply doing a round of actual active learning. Thoughts”.
Yes, that point is true to an extent, and I think where we’re seeing the value of the models is not necessarily about a relevance review. It could help understanding larger topics or rather more broad topics like if you’re trying to look for conversations about data theft, or if you’re trying to identify, does this dataset contain conversations about these models that we’ve created. You may not be able to comply with a set of search terms that are valid enough to bring back those results immediately, so they aren’t a silver bullet, like none of these technologies by themselves are silver bullets. But they fit into a workflow and a process somewhere, and depending on your project, your project may have no requirement for a model. In fact, it may never even be suggested as an option, but you might be doing a review where we’ve just done one that is identical to it and there may be some value to presenting that first, training it, and saying, “Are these documents what you’re looking for?” and if they’re not, no harm done. The analytics isn’t the be-all and the end-all, it’s just part of a decision process, it’s part of a workflow, it’s part of a product.
And ultimately, we go back to the people part of this, the people and the experts can help advise you whether or not that’s a viable solution, and we’re not trying to force every single technology for you to use, but we’re trying to suggest the one that may make the most sense for the dataset that you have or for the project that you have.
There are some clients we have who have repeat matters, like they effectively do the same case over and over again, and those models can be extremely useful for clients like that and datasets with, effectively, the same people’s data go into multiple matters over and over again. So, it’s a case of trying to be very efficient with what you have, and if you’ve already reviewed the document, you don’t want to review it again, cost saving exercises etc.
So, hopefully, I answered your question, Ethan.
He says, “Thanks, it was a great way of framing, they are indeed not the same, [inaudible] TAR but are a complementary tool”.
I think that sort of sums up our approach to technology, in general. It’s a complement of tools, and we, as experts, can help guide our clients in choosing which tools are best for the particular project.
I don’t see any other questions there. So, with that, have I covered everything and would the other panelists like to say any closing remarks? No, everyone is good.
I think you covered everything, Jonathan.
I just have some closing remarks before we sign off. Obviously, I’d like to thank the entire team for the information and insight. I think it was a great conversation we’ve had here. We also want to thank all who took the time out of their schedule to attend today’s webcast. We know how valuable your time is and appreciate you sharing it with us today. We hope you will have an opportunity to attend our next monthly webcast, it’s scheduled for April 20th this year. The webcast will feature an expert presentation and discussion on information governance, support for private equity and hedge funds in particular. We hope you can attend.
You can learn more about this upcoming webcast and review our extensive library of on-demand webcasts on our website at HaystackID.com.
Thank you again for attending and have a great day.