Legal Robot will publish this report quarterly, the next being on or around October 1st, 2017.
On January 12, 2017, Legal Robot publicly committed to implementing principles for Algorithmic Transparency. This is our second report since making that commitment. We are proud to report we have made progress on general interpretability of deep learning models in NLP and plan to publish the results to the research community.
In our effort to raise the awareness around bias and impact of machine learning models on society, we have assembled a required reading list for employees and others that work on our algorithms. This will not be a static list, so we published it to GitHub in order to facilitate discussion and suggestions.
Most complex predictions in our app have a button to visualize and examine the details of the result, however we don’t provide this for basic operations like sentence segmentation, part-of-speech tagging, and other NLP operations that are fairly well understood by the NLP community. Where appropriate, we also include statistical measures like precision, recall, and F1 score, as well as the size, source, and scope of the underlying dataset, and details about the design of the algorithm used for the prediction. Of course, we don’t expect everyone to be able to interpret this technical data, so we also allow anyone to share the results with our team for more explanation.
Other people can also ask questions over email to [email protected], even if they are not using Legal Robot. These questions are tracked separately from our normal support requests.
Many of our processes at Legal Robot use deep neural networks to process language. Neural networks can be very complex which can make them seem incomprehensible. However, just because an algorithm seems like a black box (and is treated that way by many people using it) does not mean it cannot be explained.
To begin with, we do not use any 3rd party machine learning APIs at Legal Robot. This is mainly so we can control where data processing occurs. Rather than passing sensitive data to a 3rd party as many “AI” companies do, we actually build our own algorithms so we can open up the internals for further analysis and explanation.
Some of the techniques we use yield dense vectors (basically a long string of seemingly incomprehensible numbers, like [0.78524 , 0.42504, 0.60494, …]) that we use to teach an algorithm what a particular type of clause looks like (statistically speaking). However, we are working on methods to make these dense vectors more interpretable, much the same way that deep learning techniques can yield semi-interpretable layer visualizations in computer vision. We think these can provide some utility for users to understand what is happening inside the “black box.” We are focusing on these areas over the next few releases and intend to publish our results to the research community.
Every model created by Legal Robot is traceable to the specific dataset. Last quarter, we overhauled our datasets to include sourcing detail. We removed data from our training sets in cases where we could not trace the original source, who collected the data, or how they chose the targets. This resulted in about 8% reduction in the number of samples in our training sets, but we can now trace exactly which samples contributed to a model that was used for a specific prediction as well as how and why those samples were collected.
All of our models, algorithms, and datasets are now versioned and recorded, providing a full audit trail. We have not yet set a policy or provided a mechanism to view or download the audit trail, but are planning to release this feature soon.
We are working on a structured approach to analyzing bias to capture both known and unknown biases. In addition to this high-level approach, we are investigating lower level techniques like attribution to detect and evaluate bias. This quarter, we started to use automated bias analysis on some of our models, but there is still much work to do by the research community.
In the world of information security, this quarter was once again incredibly active with both the WannaCry and Petya malware incidents impacting upwards of 200,000 machines and 1,500 legal entities, respectively. Thankfully, our systems were not impacted by these issues, but the legal industry was not immune; apparently Petya disrupted systems at DLA Piper, the third largest law firm in the US. With these incidents in mind and the ever-increasing concerns around security, we will begin publishing statistics on our bug bounty program, links to disclosed bug reports, and detailed incident reports for serious security issues.
Since this is our first report containing these statistics, we are including the entire history of our bug bounty program since inception. There is a notable spike around August 2016, when we publicly launched our bug bounty program and attracted additional attention from the ethical hacking community.
|Month||New||Triaged||Needs More Info||Resolved||Informative||Duplicate||Not Applicable||Spam|
We intend to disclose all reports, once resolved. However, we also respect the wishes of security researchers that are working with other organizations to resolve related issues. This quarter, we publicly disclosed the following reports:
We require all members of the Legal Robot community to abide by our Code of Conduct. As of the date of this report, we have not received any reports alleging violations of our code of conduct.
For more information around what inspired this statement go to https://www.canarywatch.org.
As of July 1st, 2017:
Special note should be taken if this transparency report is not updated by the expected date at the top of the page, or if this section is modified or removed from the page.
The canary scheme is not infallible. Although signing the declaration makes it difficult for a third party to produce this declaration, it does not prevent them from using force or other means, like blackmail or compromising the signers’ laptops, to coerce us to produce false declarations.
Legal Robot has not received any “take down” notices or other removal requests under the Digital Millennium Copyright Act (“DMCA”) or any other regulation like Article 12 of Directive 95/46/EC, or the newer Article 17 of the General Data Protection Regulation (“GDPR”), commonly known as the “right to be forgotten”.
The news quotes below show this report could not have been created prior to July 1st, 2017.
-----BEGIN PGP SIGNATURE----- wsFcBAEBCAAQBQJZWEA9CRCY0PbwMF7zeAAAUNMQAFVEOfILB0hrzyrFCOtMAh2m ljqt5sePNXd/wj4vLDmwM06EySOIsjD+Z+y1tcSeJGMXqBmrLZsHP8kqTBhb/kYh lzOY7oqtrIXHK5x8zROgpxrs5K8FDo6cZ4Q7KHddTnUEs5R3o084K8jHszUm33dw /+a2x8pA7Pspc5GvzK2t75OrJHrT+9UehauN6tKhQsOzGG72zYP6NfW5auaLe2c+ 6GdgF883fmlZEZ/UJLUxhGckL+0ET2/jJ+ZnQ720T3V8JB9RoRWHXHMCdBmlCLe1 UnXCrTXIPMQ5bUinfoOYzJc2RRbevo3vEMSkrZtt27H2n+grqJBIzzGIono+YOuq o1zxyyhErGRWu17dUkZQ1gw0JFTy3Kr4vaW6xvHEhH8jIkyMERf4hCV3iT73E5on wqjyYNh3bksmIls2hOUGLbZBnOZIZe2aZAsBZj9QNgPchLfOtuDpQ/BSLMLuSDbo wC+yXApCwXsvMopHXM8yDE9zp3iMfMjdNWzU61ftqGoiun8jSaK1inJfYM4JxzVF 9VPty0lxIiq0NQGJODBUDjhnKZehp4wXxftAmzYih2YXK9+J/X3zCA9nBBTIk823 QRiczRoL7RCqEe1zRhcDSSl52I45kBAgyjzCdbTehLS6GB6/kdJ1D5ixlTKNjE9i yVabPFUjiqlv2z9NsPnP =6hzp -----END PGP SIGNATURE-----