Data Scientist
163k – 204kSan Francisco, CAHybrid4+ YOE
Summary
Early member of the data team defining how the company uses data to improve user security. Works across product insights, business strategy, data pipelines, and security research.
About the role
What you'll do
- Contribute to specific data science projects and initiatives at Semgrep; discovering each department’s most pressing data problems, and proactively identifying the most critical areas to focus your efforts
- Bring your wide knowledge of data-science approaches to each problem you solve: the first day you might build a dashboard to track Board level metrics for the Engineering team, the second you might apply multivariate regression to identify important product features, the third you might apply active-learning techniques to guide data collection and labeling
- Iteratively tackle problems as a series of experiments, proving the value of your work with proof-of-concept to ever more refined results
- Convince your peers of your conclusions with clear data visualizations and well-reasoned explanation
- Help grow your team through the recruitment and hiring of top data talent
Example projects
- Build a client-facing dashboard showing scan time metrics over time to show how the product is improving
- Work together with Product leadership to identify the correct north-star metrics to measure Product usage and what features to build next
- Partner with the rule-writing team to identify the most impactful rules and languages to focus on in real-time
- Build out cleaned/medallion Silver and Gold tables in our Data Lakehouse for internal engineering and product teams to self-serve their analytics needs
- Build an S3 → Snowflake data pipeline and processing engine to improve Repo contributor count metrics for Billing team
- Build a statistical model that analyzes pseudonymous usage data to recommend the next features built into the Semgrep open-source tool
- Consume infrastructure observation metrics to identify and address potential Semgrep.dev registry outages before they occur
- Recruit varied and disjoint data into a “North Star” metric for the performance of the Semgrep open-source tool over time
- Craft a security-rule-recommendation decision tree, using codebase features like languages, frameworks, code sentiment, and commit-message sentiment, to deliver targeted, high-value static-analysis rules to users
Requirements
- 4+ years of experience in data and strategy fields
- Knowledge of data-science approaches; this may include machine-learning algorithms, optimization methods or symbolic artificial-intelligence, but should also include statistical methods and “good-enough” heuristics — and the taste to know when to use each
- Experience clearly visualizing information and experimental results across the full company stack: Board-level, leadership team, and individual team leads
- Sufficient familiarity with production data-processing pipelines to construct them working together with generalist infrastructure engineers; tools we use include S3, FiveTran, DBT, Snowflake, Metabase, Retool, Sagemaker/JupyterNotebook (Python)
- Aptitude delivering technical projects via rapid iterative development
- Experience working on a small team in a fast-paced environment and are willing to experiment with different approaches before settling on the best and most elegant solution given time constraints
- Excellent, proactive communication, both verbal and written
Compensation
- The estimated starting annual salary range for this position is $163,000 - $204,000 USD
- In addition to base salary, total compensation may include equity, variable compensation, and benefits
Skills
PythonMachine LearningStatistical AnalysisData VisualizationSnowflakeDBTS3FivetranMetabaseSageMaker
Similar roles at this salary range
All Data Science jobs →Senior Data Scientist
First Data Scientist on Plaid's Embedded Insights team building analytics and measurement frameworks for ML models. Partners with product, engineering, and ML teams to drive data-informed decisions and evaluate model performance.
191k – 263kSan Francisco, CA +2Data ScienceHybrid5+ YOESQLDBT
Senior Data Scientist, Risk
Staff Data Scientist on the Risk team driving experimentation, statistical inference, and causal analysis to inform strategic decisions. Requires 7-10 years of DS experience, strong SQL/Python, and proven impact influencing leadership.
186k – 230kSan Francisco, CA +1Data ScienceHybrid7+ YOESQLPython